Linux admin notes - health and statistics

From Helpful
Jump to: navigation, search

Shell, admin, and both:

Shell - command line and bash notes · shell login - profiles and scripts ·· find and xargs and parallel · screen and tmux
Linux admin - disk and filesystem · users and permissions · Debugging · security enhanced linux · health and statistics · kernel modules · YP notes · unsorted and muck
Logging and graphing - Logging · RRDtool and munin notes
Network admin - Firewalling and other packet stuff ·


Remote desktops
VNC notes
XDMCP notes



These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.


Reading (linux) system use and health

Some of these utilities are fairly standard to most unices, some of them report information specifically from recent linux kernels, and some OSes have better utilities than these

Reading top

top is a basic overview of CPU, memory, swap, and process statistics. It's a little verbose, and not everything is very important.

Example output:

top - 21:18:31 up 18 days,  1:16, 13 users,  load average: 2.61, 2.47, 2.01
Tasks: 124 total,   4 running, 120 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  3.0%sy, 97.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    773500k total,   760236k used,    13264k free,    36976k buffers
Swap:  5992224k total,   689324k used,  5302900k free,   265284k cached

  PID USER      PR  NI  VIRT  RES S %CPU %MEM    TIME+  COMMAND
11192 root      30   5 59136  55m R 43.6  5.5   0:03.30 cc1plus 
11199 root      26   5 12896 9244 R 24.8  0.9   0:00.25 cc1plus 
11193 root      22   5  4972 3332 S  1.0  0.3   0:00.02 i686-pc-linux-gnu-g++
11197 root      15   0  2100 1140 R  1.0  0.1   0:00.07 top
11198 root      24   5  2144  884 S  1.0  0.1   0:00.01 i686-pc-linux-gnu-g++
    1 root      15   0  1496  432 S  0.0  0.0   0:07.40 init [3]
    2 root      34  19     0    0 R  0.0  0.0   0:06.24 [ksoftirqd/0]
...and so on

Process lines

The process lines (the lines after PID USER PR etc.) are a bunch of common information. A few notes:

  • columns are configurable. I usually simplify things.
  • you can kill and renice from within top (keys: k, r)


CPU:

  • %CPU is calculated by top, the percentage of active running time in the last interval
note: samples /proc regularly, so short-lived processes may not be counted at all (atop can be better)
also doesn't show when it's stalled or spinlocking, versus doing useful work
  • PR and NI:
    • Higher PRiority numbers means lower actual priority
    • the value is the basic PRiority (often 20) plus the process's NIceness, which lets users say "I'm in no hurry, other processes can get more CPU time if they want it"
  • Time+ is cumulative CPU time spent.


  • Process state is
usually one of
S (sleeping)
R (runnable)
D (uninterruptable sleep)
others are
Z (exited, but parent has not cleaned it up yet)
T (stopped (as in paused), by job control (SIGSTOP/SIGCONT signals), or something like ptrace)
Much of S/R/D is about resources other than CPU
runnable means resources are there, and it's scheduled to be run - so either running right now, or soon
running means it is scheduled
uninterruptable sleep (D) is waiting for a resource, often device IO within the kernel itself
interruptable sleep (S) can be done voluntarily, or by the kernel, and often means waiting on an event

Memory:

  • VIRT refers to mapped memory, all memory it could address without error. This includes shared memory, libraries, mmaps, and memory that was reserved but never actually used (promised by allocation, but never backed because it was never used). If there's a lot of the last, VIRT may mean very little about real memory use.
  • RES is how much of a process is RESident in RAM. This is a good indication of how much it uses at all, except when your system is trashing: swapped-out memory does not count towards this.
  • SWAP (not there by default; press 'f', and 'p' in that screen): Amount of memory swapped out. VIRT is often roughly RES+SWAP because most other things are relatively small(verify)
  • %MEM is the percentage of physical memory the task uses (not sure what exactly counts towards it(verify))


Header

That header can be seen as roughly three parts: 'CPU and IO status', 'memory status' and 'swap status':

Overall CPU and IO

top - 21:18:31 up 18 days,  1:16, 13 users,  load average: 2.61, 2.47, 2.01
Tasks: 124 total,   4 running, 120 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  3.0%sy, 97.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    773500k total,   760236k used,    13264k free,    36976k buffers
Swap:  5992224k total,   689324k used,  5302900k free,   265284k cached


The CPU usage division is probably most interesting:

  • user cpu time is programs using CPU time in a nice and scheduled way (and have a niceness of ≤0)
  • system time is the kernel doing things programs asked it to. This has some priority.
  • nice cpu time are programs spending CPU time, but have a niceness of >0 (those that will back off to let processes with lower niceness use more CPU time)
  • wait time typically means the system is waiting for IO. If this happens the system is often either doing hard IO work, or swapping like crazy. Some wait time is implicit in IO work, a lot of it is is bad in that it indicates an IO bottleneck. If it's because of heavy swapping it is often avoidable.
  • hi and si: 'hard interrupts' and 'soft interrupts'. They represent driver time, networking and a few other things. They are rarely higher than a few percent.

Zombie processes are not important, unless there are many. They are processes that are finished, are not using resources anymore but have not yet been cleaned up by the process that started them.

Overall memory

top - 21:18:31 up 18 days,  1:16, 13 users,  load average: 2.61, 2.47, 2.01
Tasks: 124 total,   4 running, 120 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  3.0%sy, 97.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    773500k total,   760236k used,    13264k free,    36976k buffers
Swap:  5992224k total,   689324k used,  5302900k free,   265284k cached

Total is the user-usable RAM: the amount of physical memory minus a bit of kernel memory.

Used is physical RAM used by applications, basically resident + cache + buffers.

Free means "going to waste", not being used by anything. It will generally only be high right after bootup and right after memory hogging process just stopped and the OS cache hasn't seen anything to use that space for.


Buffers is primarily the disk 'write cache' (yay confusing teminology). Write caches are more implication and necessity than read caches, and it's usually low, so you can completely ignore this unless it isn't low.

Cached is a few things, but typically primarily the OS disk cache: things that the system thinks you may need again soon, often mostly filesystem metadata and data. It's helpful because it's faster than disk, and avoids IO. Memory used for this cache is counted as used in the 'free memory' figure, which is why 'free' is not very useful - cached data can will move out of the way for allocation very fast, so almost all memory in cache is effectively usable memory.

mmap()ped files seem to count towards cached. This can sometimes be both a lot and fairly constant, such as with large mmapped logs.


What I want instead of 'Free' is Free+Cached, which is basically the 'usable RAM' figure. You can eyeball it from top's figures, or let
free
tell you basically this figure. The figure from its output that is most interesting is the one in bold below:
             total       used       free     shared    buffers     cached
Mem:        773500     760236      13264          0      36976     265284
-/+ buffers/cache:     457976     315524
Swap:      5992224     689324    5302900



Swap

top - 21:18:31 up 18 days,  1:16, 13 users,  load average: 2.61, 2.47, 2.01
Tasks: 124 total,   4 running, 120 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  3.0%sy, 97.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    773500k total,   760236k used,    13264k free,    36976k buffers
Swap:  5992224k total,   689324k used,  5302900k free,   265284k cached

Swap total reflects the collective size of your enabled swap partitions. Usage is just that, and there is always some use, usually things that have not been used, such as allocated memory that has never been accessed, or parts of large executables that have never been used (It only makes sense to swap this out, as it gives more memory to active processes). (See also swappiness)

When the used swap is high, you have an active memory hog, too little memory -- or sometimes a program that likes to allocate a lot of memory without using it (linux considers this swapped, because it wants to guarantee it can back that allocation, but doesn't want to use RAM for this).

You can approximate the difference by seeing whether the swap figures change all the time. A better indicator is something like vmstat: if it reports si and so ('swapped in', 'swapped out') as 0 most of the time, the size you see in swap is probably not actively used)}}

Continuous swapping will make your computer sluggish. If this isn't a rare occurence, it may be wise to invest in a little more memory, or look whether you have memory hogs that can be configured to color within the lines a bit better.



topgrep

Yes, there's a perl script. I think this one's just as simple:

#!/bin/bash
# will implicitly use the current user's top settings (particularly the columns)
[ -z "$1" ] && exit  #quit if there were no arguments
top -d 1 -b -n 1 | sed -n -e '/PID/p' -e "/$1/p" | grep -v "\/$1\/" | grep -v topgrep | grep -v 'grep -v'

ps

ps
can be considered a non-interactive equivalent to top.


ps is convenient for some quick checks like:

ps faux | grep smbd    # is service running?
ps faux | grep ssh     # show connected sessions


and also for scripting, as you can control the output format. For example, consider:

#!/bin/bash
# script to renice all of a user's processes. You'll often need root rights.
USER=${1:?Missing username}
NICENESS=${2:-10}   # that's 10, - is the bash syntax for default-if-missing
renice $NICENESS -p `ps --no-headers -U $USER -o pid`


#!/bin/bash
# continuously list processes in uninterruptible sleep (see what hits the disk)
# (note: D+ means foreground)
while true
do
  ps -eo stat,user,comm,pid | egrep '^D'
  sleep 0.1
done


#!/bin/bash
# script to summarize users that use CPU and memory
ps --no-headers -axeo user,%cpu,%mem | \
  awk '{usercpu[$1]+=$2; usermem[$1]+=$3} 
       END { for (u in usercpu) { if (usercpu[u]>5 || usermem[u]>5) 
             printf("%15s using  %4d%% CPU  and %4d%% resident memory\n", 
                        u, usercpu[u], usermem[u]) }  }'
      postgres using     0% CPU  and    9% resident memory
      www-data using     5% CPU  and   30% resident memory
          root using     9% CPU  and    3% resident memory

Load average

tl;dr:

  • decent indication of sustained load on a node
  • not useful for anything short-term changes
  • not averages
  • (and not load exactly how you were thinking either)


"Load" is an estimate of how many processes are actively doing stuff, specifically:

  • using CPU, or
  • waiting to be schedyuled CPU, or
  • in uninterruptable sleep (often disk, sometimes network or other)

"average" - not actually. It's an expontentially dampened thing - think lowpass. This is relevant in that the 1, 5, 15 figures are not at all "average over that many minutes" but actually how fast these figures adapt to the real load. Which is useful, just not an average.


When you see something like
load average: 1.69, 1.70, 1.73
you can guess that there are probably two sustained processes actively using the CPU (and likely sharing its speed), but since it's under 2.0, one or both are probably not active all the time.


If the number is high, you can usually assume there are many things fighting for CPU or disk. Keep in mind that if you have a many-core processor, more processes can run alongside each other perfectly fine

...though you can't tell whether they're happily working alongside, or throttling your disk system.


When swapping and particularly when trashing, the load factor may spike simply because many things are waiting, while the kernel spends a lot of IO time swapping things in and out.

CPU use types

Wait time

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

The state of uninterruptable sleep (shown as status 'D' in top and often in ps) is usually more specifically IO wait time, a.k.a. 'wait time', 'iowait', 'wait'.


Iowait which is the time a process spends sitting around waiting for system to fulfill a blocking IO request. (More specifically, TASK_UNINTERRUPTIBLE is a deeper sleep than interruptible sleep (the latter still allows signaling). Uninterruptible sleep is generally only used when userspace signals should not be handled, such as while a process is waiting on device IO.)


The most common cause is platter-disk operations, because their seek time is relatively large. This wait for IO is as (un)avoidable as the disk access itself is.

When there are multiple programs using a disk, the average seek time can only get bigger. Random accesses also makes for longer waiting.


It may be longer than strictly necessary, in particular when actively swapping (or even trashing), which can be improved by making programs not be hogs, and/or by sticking in more memory (...the latter can have a double improvement: less/no swapping, and memory to spare means more commonly-requested disk data stays cached in memory ,meaning less actual disk IO for that data).

There are other sources of wait time, such as high-volume networking, and any other device that handles a lot of data, or does a lot small interactions.


IOWait is a decent measure of disk slowness -- but in itself it purely is a CPU-based metric. In particular around RAID does it not have a direct meaning of 'disk acess is being stupid'.



Past observing that you have IO wait time, you may want to find out what process and what device is so busy.

To see whether it's disk IO and to see which disk, you can use something like the following (linux kernels 2.6 and later):

iostat -x 2

This is also a better metric than iowait of disk utilisation - the reads and writes/second are basically the IOPS figure, which you can compare against expectation of your disk (or array).

(If you do not have the iostat utility, it'll probably be in a package called sysstat.)

If there is one continuously busy drive, that's easy to read off by data moved.

Some things aren't much data but still a lot of work, such as doing doing thousands of fstat()s (e.g. scanning a large directory tree). This is possibly most visible in the await column, which shows the average time (milliseconds) that disk calls stay queued (waiting, seeking, reading/writing and everything around it). Roughly, if it's ≤the drive's seek time then the drive probably does one thing at a time and is keeping up with the requests. If it's regularly more than, say, twice the drive's seek time, then things are probably regularly waiting on a head that is seeking back and forth. Platter drives are often around ~7ms, so a dozen's okay, a hundred's starting to be a lot.


Finding a specific processes is a little harder, because the wait time may not be in the process that causes it. For example, on linux you'll often see the pdflush kernel process (which buffers and flushes data to disk) showing wait time in its process state, but the process that flushed a lot of data to it may or may not be doing so at the moment (offloading iowait is sort of the point of pdflush).

For this reason and others you'll probably want to observe the system over some short time. One example:

while [ 1 ]; do (sleep .3; ps -lyfe | egrep '^D'); done

(you probably want to put that into a script since it's not trivial to type) (This is similar to using watch except the screen won't be cleared)


You can try to inspect whether a process is doing a lot of IO, and/or see what it's doing, with something like strace -p pid, or sometimes more usefully, use its -c option for a count-n-summarize to see whether it's indeed potentially IO-significant calls that a process is mostly doing (such as open, read, write, recv, poll, stat, send, and others).


Other system / kernel details

vmstat will give you an overall view of what the kernel is up to, including memory usage, disk blocks use, swap activity, context switches, and more. For an example, for sums over a 3-second interval:

vmstat 3

There are some other reports it can give, see its man page.

A related, slightly nicer looking third party app you may want to look at is dstat.



I personally like to type less than that. I imitate the psgrep perl script mentioned here and there with a much more basic, fragile, and close-to-what-I-generally-want:

#!/bin/bash
[ -z "$1" ] && exit  #quit when there are no arguments
ps ax | sed -n -e '1 p' -e "/$1/p" | grep -v "\/$1\/" | grep -v psgrep | grep -v 'grep -v'

This will look for the text fragment you type (the first word, it ignores the rest), and give you ps's headers for reference.

The grepping is a hacky, hacky attempt at removing the matching grep and sed processes that are actually this psgrep script itself. The '1 p' unconditionally lets through ps's header.


You can do something similar for top (can be handy e.g. for memory statistics), using its batch mode.

#!/bin/bash
[ -z "$1" ] && exit  #quit if there were no arguments
top -b -n 1 | sed -n -e '/COMMAND/p' -e "/$1/p" | grep -v "\/$1\/" | grep -v topgrep | grep -v 'grep -v'

This too is quite hacky, and note the format top uses depends on its .toprc file. The header line is included by grepping for COMMAND, which I figured is the field most likely to be present.


Selecting and stopping: pidof, kill, killall

When you don't have a GUI or shell way of killing a program, you'll have to use slightly harsher means when you want to kill a program. The old fashioned way is to get its process id with pidof and then use kill or, failing that, kill -9.

The difference is that the former defaults to the TERM signal (15) which can be received by the process which can choose to shut down nicely - in fact, signals are used for more than termination. (Particular HUP (hangup, 1) has been used as a 'reload configuration files' signal). The signal just mentioned 9, KILL, is untrappable and instructs the kernel to kill the process somewhat harshly. It is the surest way to kill, but means no cleanup in terms of child processes, IO, and such ((verify) which), so should only be used if the default TERM didn't work.


Kill takes a process id, which you can get from top, ps and others., or more directly via pidof. Using killall enables you to use a process name instead of a PID. Summarizing:

# pidof firefox-bin
8222 8209 8208 8204
# kill 8222 8209 8208 8204
# kill `pidof firefox-bin`
# killall firefox-bin

With killall, you do have to match the actual name. For example, if firefox is a script that runs an actual executable called firefox-bin, then killall firefox won't do you much good.


As a regular user you can only kill your own processes, as root you can kill anything but some system processes.


Processes will not die while they are in IOwait. This usually doesn't matter, unless it is blocked on a single call for a very long time. You'll want to take away the thing they are blocked on before they'll die. This may not be a simple task.

File/filesystem usage

(keep in mind you may need to run these as superuser to be particularly informative)

fuser list PIDs that have a particular resource open (reading, writing, executing, mmaped; these and more are appended as a letter), e.g.

  • "figure out which filesystem this path is on, then list all processes which have files open on that filesystem". Useful to see what prevents a umount
fuser -m -v /mnt/data4
You can ask fuser to kill the implied processes.
e.g.
fuser -mik /mnt/data4
the -i, for interactive, is used to avoid accidentally killing way too much by accidentally implying the root filesystem.
...or you could use kill/killall manually.


  • "check for a whole bunch of specific files", e.g. stuff that processes have open in /tmp
find /tmp | xargs fuser -v    # or lazier:
fuser -v /tmp/* /tmp/*/*


  • "Which has this directory open", e.g. your homedir
fuser -v ~


  • "who has TCP port 80, and in a human-readable form, please" (keep in mind you probably get no results without a sudo)
fuser -v -n tcp 80



lsof lists open files. Because of unix's "everything is files" philosophy, this includes sockets, directories, FIFOs, memory-mapped files, and more. It can be used to inspect program (mis)behaviour and such:

  • lsof /data4
    rather similar to fuser -m
  • lsof -u samba
    lists open files for user samba (something fuser cannot do)
  • lsof -c bash
    lists everything related to running bash processes
  • lsof -p 18817
    lists all things opened by a certain process
  • lsof -i
    "Alright, what's networking up to?" Netstat is probably more interesting for this, but looking by port (
    lsof -i :22
    ) and host (
    lsof -i@192.168.5.5
    ) is easy enough (see the man page for more details).
  • watch -n 0.1 "lsof -n -- /data /data2 | grep smbd | egrep -i '\b(DIR|REG)\b'"
    : "keep tracking the files and directories that samba keeps open under the /data and /data2 directories"
  • or just a summary of which programs use how many handles:
    lsof | cut -f 1 -d ' ' | sort | uniq -c | sort -n

(Note that different *nix-style systems have different options on lsof)


vmstat gives a summary about processes, memory, swapping, block IO, interrupts, context switches, CPU and more. Good to inspect how a taxed system is being taxed.

For example,
vmstat 2
: show averages every two seconds

It can also show certain fine grained statistics, given kernel support (see the man page).



Things like iotop, and features within atop and htop can be used to show IO speed of processes, and/or totals.

On iotop:

  • needs to be run as root (or have NET_ADMIN capability), which can be impractical
  • iotop -o
    shows just processes with non-zero IO
  • iotop -a
    show cumulative amounts
  • OSError: Netlink error: Invalid argument (22) basically means your kernel doesn't have the support(verify). If you're on centos or rhel, this means 5.6 or later.(verify)

Networking

  • ifconfig to see (or configure) the network interfaces
  • netstat will list various things about networking and can show e.g.
    • open connections (no parameter)
    • listens and open connections (-a)
    • udp and/or tcp (-u, -t) since you often don't care about all the unix sockets
    • routing table (-r) (see also route)
    • interface summary (-i)
    • statistics (-s)

I use -pnaut (programname, noresolve, listen+connections, udp, tcp).

  • ss is similar to netstat


  • arp (arp -n to avoid resolves) to see the ARP table
  • route (route -n to avoid resolves) to see the routing table
  • iptables to change the IP filtering/nat/mangling tables (see also iptables). Possibly interesting to you are:
    • iptables-save, which produces file-saveable text (and is also handy to see all of the iptables state), and
    • iptables-restore, which reinstates a file saved through iptables-save.


  • iwconfig to see (or configure) the wireless network interfaces
    • (Other general wireless tools: iwevent, iwspy, iwlist, iwpriv)
    • (Other specific wireless tools: wlanconfig, etc.)

Kernel, drivers

  • lsmod lists currently loaded kernel modules (see also modprobe, insmod, rmmod)
  • lspci lists PCI devices. Using -v is a litte more informative. (see also setpci)
  • lsusb lists USB busses and devices on them

Drives and space

df
tells you what storage you can get at, and how much space is left on each.


In contrast:

  • /etc/mtab lists things that are mounted, more completely than df does, because df reports only things meant for storage, so which excludes things like proc, udev/devfs, usbfs, and whatnot.
  • To see an exhaustive list of things that the system knows could be mounted, see /etc/fstab (see also fstab).
  • To see swap partition/file use,
    cat /proc/swaps
    will do, which is basically what swapon -s does.


df notes:

  • The
    -h
    option is useful to see human-readable sizes.
  • df -B MiB (or MB) makes df report everything in megabytes, which can be useful when you're watching for differences on the order of megabytes per second (e.g. watch -d df -B MiB)
  • ext2, ext3, and ext4, figures not add up exactly, because 5% of the space is reserved (short story: this is a good thing for general use, though in WORM situations it can make sense to set it to 0%).


To see where the big stuff is in the directory tree, use du, detailed elsewhere. (There are also fancier graphical programs for this, such as baobab, that give better visual overview)

RAM health

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

EDAC

Recorded statistics

See Network tools. The somewhat wider tools like cacti also store system statistics.

Tools

Monit

monit is a service that periodically checks a configurable set of properties of a system or process on it.

It makes it fairly easy to check

  • whether common serevices like websites or mail are up - at protocol level
  • practical things like "does the SSL certificate expire within X days"
  • resource use (check for high/unusual load on cpu, network, disk)
  • disk space check
  • network interface link, link speed
  • "if bad the last X checks" to avoid false-positive spam
  • file changes (uid/gid/permission/checksum)

Actions bwyond alerts, e.g.

  • is process still running? If not, restart (e.g. used in docker)
  • execute, e.g. "if log is big, run logrotate", 'if lost IP, restart interface'

See also https://mmonit.com/monit/ for a detailed overview


I is primarily set up to send alerts and show overall status, not so much for presenting pretty graphs.

There's a status exportin XML (though not JSON in bare monit)


Monit is free open source software.

M/Monit is a paid-for wrapper that gives prettier graphs, and gives overview for many hosts. See also https://mmonit.com/ The pricing is good for datacenters and such, though not for home users.

See also

http://tldp.org/LDP/sag/html/index.html