Logging

Linux-related notes

Shell, admin, and both:

Shell - command line and bash notes · shell login - profiles and scripts ·· find and xargs and parallel · screen and tmux ·· Shell and process nitty gritty ·· Isolating shell environments ·· Shell flow control notes

Linux admin - disk and filesystem · Linux networking · Init systems and service management (upstart notes, systemd notes) · users and permissions · Debugging · security enhanced linux · PAM notes · health and statistics · Machine Check Events · kernel modules · YP notes · unsorted and muck

Logging and graphing - Logging · RRDtool and munin notes

Network admin - Firewalling and other packet stuff ·

Remote desktops

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Logging serves various goals.

It can be a handy while debugging, to see state at various stages, particularly in situations where attaching a debugger is impractical, and particularly in cases where even a simple print won't end up in an obvious place, or even anywhere at all (e.g. web services, background daemons).

You can argue whether logging states help or interfere with code readability, mostly because it can be used both in clarifying and in rather opaque ways.

Apps that do neat logging are also liked by sysadmins that watch their systems -- though most will use way of reducing information overload anyway. Often this means configurable and redirectable filtering, summaries, and actions taken only on specific errors - very usually by some additional log analysis application.

Log levels/severity

As random reference and collection.

The below is a union of the various choices; most logging systems use a subset of these, usually half a dozen at most (Using all of the below is probably too complex to use consistently or read well.):

TRACE, is often often used to report small steps in possibly excruciating detail - more so than DEBUG. Usually under a condition variable, but is often only used when a bug is suspected/known)

DEBUG, often used to see whether a particular state or point in code is reached, what value a variable had at some point, or such. Not likely to be emitted by a production version, and likely to be dropped by syslogging by default.

INFO: Pretty neutral, often used for "doesn't call for action, but may be interesting to someone reading the logs, for details on how the app is doing its thing, but"

NOTICE: something that is probably an expectable part of day-to-day operations, but may merit some attention if you can spare it. If not present, INFO takes the role fairly well.)

WARNING: "this may signal a problem or something that may lead to a problem (and perhaps it is avoidable or recoverable)"

ERROR: often used for "a real problem happened", which may be temporary but probably meant a user didn't see what they expected, and quite possibly an error instead. Often used in a "shit happened, maybe you can avoid it next time if you see this information" way.

ALERT (e.g. in syslog), something that strongly suggests some attention/action, probably by the sysadmin)

CRITICAL/FATAL, often signals that a process or (part of or whole) application failed and stopped. Typically demands attention.

EMERGENCY: if present it often signals that the system has become unstable, or will soon become so soon if the issue behind this is ignored)

syslogging

Sysloggers are daemons that you can expect to be installed and running on all *nix systems.

All use local sockets (e.g. on /dev/log), most (but not all) also allow you to listen on network sockets.

Network-daemon sysloggers (UDP port 514) can be useful to log from embedded systems, to centralize logging for many servers. The protocol is often the BSD syslog protocol (RFC 3164), or the newer IETF syslog protocol (from 2007?).

Note that sockets make it easy to avoid race conditions when many threads or many processes want to log to the same target. Logging libraries may not be set up to be safe this way. Various logging systems are network-based for largely this reason.

Useful for quick and dirty debugging, when you need output to any old watchable place, and installing/learning a serious logging system would be more work.

At most you have to tweak its configuration to make it easier to watch what you want to watch.

Note that sysloggers usually buffer log messages to be nice in terms of IO (and resulting latency), which is usually what you want. It does mean that a system crashing fast enough may not be able to log why. Various sysloggers can be told to buffer less, or not at all, but you ought to do so only when you know why that's bad for general use.

Facility

Syslog defines a bunch of facilities, which you can take as 'one of properties that a syslogger can use to send to separate files':

LOG_KERN - kernel messages
specific subsystems
- LOG_CRON
- LOG_MAIL
- LOG_LPR (printing)
- LOG_NEWS
- LOG_UUCP
LOG_DAEMON - system daemons
LOG_AUTH - authentication/authorization related
LOG_USER - applications or processes (often the default for syslog calls)
LOG_LOCAL0
LOG_LOCAL1
LOG_LOCAL2
LOG_LOCAL3
LOG_LOCAL4
LOG_LOCAL5
LOG_LOCAL6
LOG_LOCAL7

Modern sysloggers often allow you to split off messages according to facility (a little more efficient than doing it based on content patterns).

On servers that you host a few major apps on, you could consider using one of LOCAL0..LOCAL7. This mostly makes sense when they're configurable and not hardcoded into the apps.

logger is a command line interface to syslog, which can be handy to test such a configuration, for example:

logger -p local0.notice Command line test to LOCAL0

In applications, you often need to to specify some constant to do so. For example, in python (see also [1]):

syslog.syslog(syslog.LOG_LOCAL0, 'Python test to LOCAL0')

various files in /var/log

account/pacct

lastlog, tallylog, faillog

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

utmp

utmp, wtmp, btmp are binary files keeping track (mainly) of logins.

utmp is current status, system boot time (used by uptime), recording user logins at which terminals, logouts, system events etc.

wtmp is non-current utmp entries

btmp is failed login attempts

If you didn't have something like fail2ban/denyhosts, in particular btmp may grow large. You can basically just truncate this file, e.g. sudo truncate --size 0 /var/log/btmp

In linux they're at /var/run/utmp, /var/log/wtmp, and /var/log/btmp

These are used by programs like

who, w, users, lslogins (current logins, so utmp)
last (which dnon-current logins, so wtmp)
lastb (failed logins, so btmp)

utmpdump - utmp, wtmp to readable format (also from that format)
lslogins (wtmp, btmp)

Note that you can hand a file to last, like last -f /var/log/btmp, which is useful for rotated/backed up files.

Software

Sysloggers

syslog-ng

Allows splitting into various files based on facility, program, regexp (faster substring too?(verify)), and more.

http://wiki.egee-see.org/index.php/SyslogNG http://www.softpanorama.org/Logs/Syslog_ng/filter_functions_with_examples.shtml http://www.softpanorama.org/Logs/Syslog_ng/configuration_examples.shtml

rsyslog

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Filter styles

"traditional" severity and facility based selectors, e.g.

auth,authpriv.*                 /var/log/auth.log
local0.*                        /var/log/local0.log
mail.*                          -/var/log/mail.log
# all facilities, all severities:
*.*                             -/var/log/syslog
# filter out specific things:
mail.none;authpriv.none;cron.none -/var/log/messages

Traditional is often faster, and rules are brief.

However, you can't match by program name, message inspection, etc. For those you'll want one of the two more flexible (but often slower) ones:

property-based

:programname,startswith,"postgres"   -/var/log/postgres.log

:msg, contains, "problem"            -/var/log/problems.log

RainerScript-based

if $programname == 'popa3d' and $syslogseverity <= '6' then /var/log/popa3d.log
if $programname == 'popa3d' and $syslogseverity <= '6' then stop

Other hints

stop is useful in cases like "I want matches for this to be in this log and absolutely nowhere else".

It used to be ~, which is now deprecated

Problems

not logging to specific files

often a permission issue, and then often that it's owned root instead of the syslog user (e.g. because a previous logger system did things differently)

if you use logrotate, check it (and e.g. it's parent-directory details)

Could not open output pipe '/dev/xconsole':: No such file or directory

You probably have a distro without xconsole.

You could remove this config line, or change it to go to /dev/console.

action suspended

Generally, it means that it can't put messages where it wants to send them.

There are many possible reasons for that.

Often enough this will be a log file this is not writable for rsyslog, e.g. because logrotate changed permissions.

But to guess less, the module it probably mentions gives some indication of what, e.g.

omfile is a file writer[2]

One cause is the xconsole issue.

but also, check the log for Permission denied on actual log files (this might be due to logrotate?)

omfwd is syslog over the network[3]

A more complete list is here

Older versions of rsyslog wouldn't tell you much more than the module. Newer ones should tell you more.

See also

metalog

No network listener. -->

log4j and family

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

logcheck

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Simple tool to read, filter, and then act on specific log entires.

Sysadmins use it to get briefer summaries, automatic notifications, and such.

logrotate

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Anything that isn't rotated by the logger can be done later with tools like logrotate.

Is typically run from cron.daily, which in most cases is fast enough.

Conditions of rotation

Whether it will consider a file for rotating is based on

time since logrotate was last run on a given file

that minimum interval is specified in the config - daily / weekly / monthly / yearly

the knowledge of when it was last run comes from its own state file ( /var/lib/logrotate/status)

size (note you can use k, M, and G)

Which combination you want depends on whether you want regular intervals (can be useful when looking back a specific amount of time), and/or to avoid huge log files (makes grepping for errors just now easier/faster), and/or avoid a lot of tiny files.

What to do with the log file

There are two details to address here:

open files remember file handle, not name. Most programs will happily keep logging to what is now the old file.
who creates the new file, and ensures the permissions and ownership is right?

A minimal logrotate moves the log file, and does not create a new one.

This because whenever possible, you want the according daemon to truncate / reopen its logs - leaving it responsible for owner and permissions. But that's only possible if you can tell it to.

signalling often via a prerotate and/or postrotate. These signal a section of scripting ended by an endscript

for rsyslog, the command will be something like rsyslog rotate

If instead you want logrotate to create a new file with the old name (note: some systems make this the default logrotate behaviour, others do not) with the same name after rotation, you'll want create mode owner group. This may make it more work to get it to have the the right ownership and permissions.

Also, if you can't tell it to reopen logs, you may need to completely restart the program to do so. This is sometimes acceptable, and sometimes not at all.

So another option is to copy the logfile's contents, then truncate that source file (see copytruncate). While slower (copying contents can involve a lot of data), the file is never gone and the daemon (or e.g. tail -F) can be entirely naive of what logrotate is doing. When reopening logs is impossible and restarting the program unacceptable, this tends to be your only option.

What to do with the old logs

TODO: figure out whether this only applies whenver we are actually rotating (e.g. it won't delete old a huge backlog according to new settings until it's time for the next one)

What to do with the rotated log, and with older log files:

olddir dirname - place rotated logs in directory called this. Assumed to be relative to the directory where the log is, unless you specify an absolute path. It seems it won't be created for you, so it has to exist.

dateext - instead of making old versions .1, .2 and such, use -YYYYMMDD suffix

you can control the format with dateformat strftimeStyle.

Note that it must be lexically sortable by time (hence yyyymmdd), since this is also used to determine which files are oldest and can be deleted

this style can block rotation for a bit, in that logrotate won't overwrite rotations, so if using yyyymmdd (and no time) this won't rotate more than once per day (verify)

nocompress, e.g. for easier greppability.

compress - compress old log files.
delaycompress - don't compress the most recent backup (e.g. log, log.1, log.2.gz, log.3.gz). Makes it easier/faster to grep recent logs while still saving a bunch of disk space in the long term
You can also control what compression program is used

notifempty - don't rotate log files if they are 0 bytes large. If you do want to know that a certain time saw no log messages (e.g. in some automatic statistics system), you may want the opposite, ifempty

rotate n - keep no more than n rotated files (for example, daily plus rotate 7 means seven days of backlog.
- Note that particularly size-based rotation can mean less effective backlog, because it may rotate faster (and still in 7 files)
- Not specifying rotate means old logs are not kept (meaning logrotate only really truncates logs?)(verify)
- for eternal logs, choose a very large value. Seems to be a 32-bit signed int, so rotate 2147483647 is the largest value it'll take. (that many days or 1M logs is, well, more than you'll ever need)

maxage n - remove rotated log files once they reach an age, in days. Arguably a little redundant.

Mailing

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

logs can be mailed

whenever they become older than maxage
when they are rotated
when they would be deleted (rotate / maxage)

mail address to mail to

maillast (defailt behaviour) - send logs when they will be removed (because of rotate). Can be useful to archive old logs elsewhere (...though mail is not the most reliable method of doing so).

mailfirst - send the just-rotated file. Useful to read a recent, e.g. daily log.

Example

Note: while debugging, particularly if it doesn't seem to ever rotate anything, you can use logrotate -dv conffile -v means it'll give verbose output, -d is debug mode - won't actually change anything, just says what it would do (also won't run prerotate/postrotate)

python logging

See Python notes - logging

Considerations

The arguments for logging frameworks

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

print() as logging is simple, and effective for debugging while writing the code. In-the-field debugging of the "can you send me the log file so I can figure out what's going on?" sort? Not so much. You want logging instead because with print you can't control output without editing code, you can't control the amount of detail, and it won't be saved unless you explicitly do it. In some cases you may even have hold up the app because stdout is buffered.

A well designed logging system can help answer most of the following:

what happened?
where did it happen?
when did it happen?
how important is it?
who wants to know?
what do you want to do about it?

The last two are often not very well addressed in custom/makeshift solutions, and using a logging system can mean you have a framework to do most of these, now or later, based on the configuration you hand it.

There are a few other things you might want to consider:

most logging systems have a separate log configuration, meaning that admins can change the logging setup without having to recompile the program.

A log is not analogous with a file, it is a data sink you may eventually want routed and filtered various ways.

That said, even good logging frameworks tend not to scale down to simpler cases very well. That is, they are often unnecessarily complex to set up when all you want is mostly "send stuff to a single file." This can be prohibitive to adoption.

Still, frameworks tend to look more complex than they are, because they allow various things that you'll probably never all do at once, or only in unusually complex setups.

Logging to a central log server, database, etc

Logging

Contents

Log levels/severity

syslogging

Facility

various files in /var/log

account/pacct

lastlog, tallylog, faillog

utmp

Software

Sysloggers

syslog-ng

rsyslog

Filter styles

Other hints

Problems

not logging to specific files

Could not open output pipe '/dev/xconsole':: No such file or directory

action suspended

metalog

log4j and family

logcheck

logrotate

Conditions of rotation

What to do with the log file

What to do with the old logs

Mailing

Example

See also

python logging

Considerations

The arguments for logging frameworks

Logging to a central log server, database, etc

Navigation menu