Apache config and .htaccess - logging
Related to web development, lower level hosting, and such: (See also the webdev category)
Lower levels
|
These are primarily notes It won't be complete in any sense. It exists to contain fragments of useful information. |
Formats
This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me) |
See this for the meaning of those percent-fields
The "Common Log Format" used by apache (which seems to call it common), and imitated by others, is:
%h %l %u %t \"%r\" %>s %b
...which looks like:
127.0.0.1 - - [17/Apr/2008:12:23:32 +0200] "GET /foo.txt HTTP/1.1" 200 117
The CLF-with-VirtualHost consists of the vhost name followed by the CLF fields:
%v %h %l %u %t \"%r\" %>s %b
...which looks like:
www.example.com 127.0.0.1 - - [17/Apr/2008:12:23:32 +0200] "GET /foo.txt HTTP/1.1" 200 117
The (NCSA) 'extended/combined log format' (apache seems to call it combined) is basic CLF plus two extra header values at the end, referer and user-agent:
%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"
...which looks like (line-broken for readability)
127.0.0.1 - - [17/Apr/2008:12:23:32 +0200] "GET /foo.txt HTTP/1.1" 200 117 "http://www.example.com/start.html" "Opera/9.20 (Windows NT 6.0; U; en)"
You also see that that in in virtualhost form - apache seems to call it vhost:
%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"
Which looks like
www.example.com 127.0.0.1 - - [17/Apr/2008:12:23:32 +0200] "GET /foo.txt HTTP/1.1" 200 117 "http://www.example.com/start.html" "Opera/9.20 (Windows NT 6.0; U; en)"
There are other additions, very usually at the end of one of the above.
Other notes:
- I've seen some odd variations, e.g. logging the vhost name instead of the client host name/IP - probably a contortion to make some analysis program happy (one that only understood CLF)
- When you use virtual hosts and log into a single log file, you don't want to use CLF (or combined) because you can't tell the vhosts apart
- You can log each vhost into a distinctly named log file, or make logging prepend the vhost name
- (you could even use basic scripting to convert between many CLF ogs and one big vhost log. ...if you have the vhost names handy)
- What you choose is mostly about convenience. Would you split the logs anyway? Do you keep them for automated analysis?
- There are two constraints you may wish to consider (particularly when running many vhosts, or just a busy site):
- Apache doesn't seem to like logs >2GB. You can use [Logging#logrotate|logrotate] or something similar to avoid that, but for extremely busy sites, a little splitting up can be good.
- the host system's (configured) limit of file descriptors per process
- You can log each vhost into a distinctly named log file, or make logging prepend the vhost name
Separated logs/statistics, optional filtering
Typical logging is done via mod_log_config
Most people use:
- CustomFormat is usually easier to use than LogFormat+TransferLog
- Specifies where to log (file or pipe)
- specifies format to use (name from LogFormat, or literal string)
(You can get the same functionality via LogFormat plus TransferLog but that's more verbose
On multiple logs
You could log different things in different places, for example:
# main log in common log format
CustomLog logs/access_log common
# e.g. for an easy pie chart of user agents
CustomLog logs/agent_log "%{User-agent}i"
#(not so useful while debugging, though, since you don't know what visits these were)
On multiple logs and vhosts
Specifying any logging within a vhost replaces the global setting.
Only when all vhosts should do exactly the same is it useful to only use the global setting (or as a fallback)
And you'll want to specify complete logging behaviour in each vhost. (For ease of management, using file includes can be pretty useful)
On (multiple logs and) filtering
You can selectively filter things to log.
For example, you can use blacklist-style logic to avoid logging local use:
SetEnvIf Remote_Addr "127\.0\.0\.1" dontlog
CustomLog logs/access_log common env=!dontlog
More interesting are uses like:
SetEnvIf Request_URI \.gif$ gif-image
CustomLog gif.log common env=gif-image
Don't log specific requests
This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me) |
You can use the same sort of filtering (see previous section) to marking specific requests (at the apache side), then override default logging with custom logging (if you haven't already), and add that as a condition
For example:
SetEnvIf Request_URI "^/api/count$" dontlog SetEnvIf Remote_Addr "127\.0\.0\.1" dontlog
And
CustomLog logs/access_log common env=!dontlog
See also:
- https://httpd.apache.org/docs/2.4/logs.html (look for Conditional Logs)
Don't log at all
On pipes
Apache can pipe into a program instead of writing to a file.
Its main use is to log to another target, e.g. a database, or centralized/unified logging server, or some custom tool. I've used it to count traffic per vhost towards munin.
The program you specify
- is spawned per apache child
- so you typically have many instances and you need to avoid races
- respawned if they stop/crash (for reliability)
- inherit the userid of that process
- which is significant in that this may be root
- will run via a shell (/bin/sh -c) if you use a single pipe, without it when specifying a double pipe
- shell-less may be a little more predictable/cleaner around restarts
Example uses:
- the apache_vhosts munin plugin uses this to summarize use per vhost
- apache does not rotate logs by default. It does come with a utility to do this:
CustomLog "|/usr/local/apache/bin/rotatelogs /var/log/access_log 86400" common # or (basically equivalent) CustomLog "||/usr/local/apache/bin/rotatelogs /var/log/access_log 86400" common