Apache config and .htaccess - logging

From Helpful
Jump to: navigation, search
Related to web development, hosting, and such: (See also the webdev category)
jQuery: Introduction, some basics, examples · plugin notes · unsorted

Server stuff:

Dynamic server stuff:

These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.


Formats

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

See this for the meaning of those percent-fields


The "Common Log Format" used by apache (which seems to call it
common
), and imitated by others, is:
%h %l %u %t \"%r\" %>s %b

...which looks like:

127.0.0.1 - - [17/Apr/2008:12:23:32 +0200] "GET /foo.txt HTTP/1.1" 200 117


The CLF-with-VirtualHost consists of the vhost name followed by the CLF fields:

%v %h %l %u %t \"%r\" %>s %b

...which looks like:

www.example.com 127.0.0.1 - - [17/Apr/2008:12:23:32 +0200] "GET /foo.txt HTTP/1.1" 200 117



The (NCSA) 'extended/combined log format' (apache seems to call it
combined
) is basic CLF plus two extra header values at the end, referer and user-agent:
%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"

...which looks like (line-broken for readability)

127.0.0.1 - - [17/Apr/2008:12:23:32 +0200] "GET /foo.txt HTTP/1.1" 200 117 
     "http://www.example.com/start.html" "Opera/9.20 (Windows NT 6.0; U; en)"


You also see that that in in virtualhost form - apache seems to call it
vhost
:
%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\" 

Which looks like

www.example.com 127.0.0.1 - - [17/Apr/2008:12:23:32 +0200] "GET /foo.txt HTTP/1.1" 200 117 
     "http://www.example.com/start.html" "Opera/9.20 (Windows NT 6.0; U; en)"


There are other additions, very usually at the end of one of the above.



Other notes:

  • I've seen some odd variations, e.g. logging the vhost name instead of the client host name/IP - probably a contortion to make some analysis program happy (one that only understood CLF)
  • When you use virtual hosts and log into a single log file, you don't want to use CLF (or combined) because you can't tell the vhosts apart
    • You can log each vhost into a distinctly named log file, or make logging prepend the vhost name
      • (you could even use basic scripting to convert between many CLF ogs and one big vhost log. ...if you have the vhost names handy)
    • What you choose is mostly about convenience. Would you split the logs anyway? Do you keep them for automated analysis?
    • There are two constraints you may wish to consider (particularly when running many vhosts, or just a busy site):
      • Apache doesn't seem to like logs >2GB. You can use [Logging#logrotate|logrotate] or something similar to avoid that, but for extremely busy sites, a little splitting up can be good.
      • the host system's (configured) limit of file descriptors per process



Separated logs/statistics, optional filtering

Typical logging is done via mod_log_config


Most people use:

  • CustomFormat is usually easier to use than LogFormat+TransferLog
    • Specifies where to log (file or pipe)
    • specifies format to use (name from LogFormat, or literal string)

(You can get the same functionality via LogFormat plus TransferLog but that's more verbose


On multiple logs

You could log different things in different places, for example:

# main log in common log format  
CustomLog logs/access_log common
 
# e.g. for an easy pie chart of user agents
CustomLog logs/agent_log "%{User-agent}i"
#(not so useful while debugging, though, since you don't know what visits these were)

On multiple logs and vhosts

Specifying any logging within a vhost replaces the global setting.

Only when all vhosts should do exactly the same is it useful to only use the global setting (or as a fallback)

And you'll want to specify complete logging behaviour in each vhost. (For ease of management, using file includes can be pretty useful)

On (multiple logs and) filtering

You can selectively filter things to log.

For example, you can use blacklist-style logic to avoid logging local use:

SetEnvIf Remote_Addr "127\.0\.0\.1" dontlog
CustomLog logs/access_log common env=!dontlog

More interesting are uses like:

SetEnvIf Request_URI \.gif$ gif-image
CustomLog gif.log common env=gif-image

On pipes

Apache can pipe into a program instead of writing to a file.

Its main use is to log to another target, e.g. a database, or centralized/unified logging server, or some custom tool.


The program you specify

  • is spawned per apache child
    • so you typically have many instances and you need to avoid races
    • respawned if they stop/crash (for reliability)
  • inherit the userid of that process
which is significant in that this may be root
  • will run via a shell (/bin/sh -c) if you use a single pipe, without it when specifying a double pipe
shell-less may be a little more predictable/cleaner around restarts


Example uses:

  • apache does not rotate logs by default. It does come with a utility to do this:
CustomLog "|/usr/local/apache/bin/rotatelogs /var/log/access_log 86400" common
# or (basically equivalent)
CustomLog "||/usr/local/apache/bin/rotatelogs /var/log/access_log 86400" common

See also