Linux-related notes

Shell, admin, and both:

Shell - command line and bash notes · shell login - profiles and scripts ·· find and xargs and parallel · screen and tmux ·· Shell and process nitty gritty ·· Isolating shell environments ·· Shell flow control notes

Linux admin - disk and filesystem · Linux networking · Init systems and service management (upstart notes, systemd notes) · users and permissions · Debugging · security enhanced linux · PAM notes · health and statistics · Machine Check Events · kernel modules · YP notes · unsorted and muck

Logging and graphing - Logging · RRDtool and munin notes

Network admin - Firewalling and other packet stuff ·

Remote desktops

dotfiles

semaphores

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

SysV semaphores and POSIX semaphores are kernel-managed objects that are useful to threaded programs, for synchronization, some resource management, IPC, and such.

The below is about SysV semaphores, not about POSIX (named) semaphores (verify)

Trouble

SysV semaphores aren't really owned by a process, which is why ipcs -p doesn't report them, and why it is not hard for programs to leak semaphores. Leaking in that they will stick around after the originating program closes or crashes, which can lead to the kernel eventually running out and denying other programs semaphores.

Usually, problems related to semaphores/mutexes are related to the semget() call failing.

You may see

"No space left on device", which is the standard error string for ENOSPC, which SysV IPC seems to reuse for "can't give out any more"
"Invalid argument"
"Identifier removed" (an apache message?(verify))
"couldn't grab the accept mutex" (an apache message?(verify))

Inspecting and removing

For SysV semaphores (which are considered old style, relative to the new POSIX semaphores)

Probably most interesting for inspection is

ipcs -s -t

The -t isn't necessary, but seeing user and time of latest use is often useful. If you know for sure you can delete them, you can remove semaphores using:

ipcrm -s semid

Some interesting commands (note: the report format/details can vary between systems):

ipcs -s - semaphore use (list)
ipcs -s -u - semaphore use (just a count)

ipcs -s - inspect current use of semaphores
- ipcs -s -c - mention creators (user/group)
- ipcs -s -t - mention time of last use
- ipcs -s -i semid - More details on a particular semaphore set

ipcs -s -l - report limits
- the same values as in /proc/sys/kernel/sem
- and fetchabe/settable via /sbin/sysctl -a

Because we had a structural problem with semaphore leaks, we made a script that removes all of the current user's semaphores:

# removes all semaphores belonging to the current user
# or, if root, all of them -- which you probably do NOT want
for id in `ipcs -s |cut -d ' ' -f 2|grep "^[0-9]"`; do
   echo Deleting semid: $id;
   ipcrm -s $id;
done

You do NOT want to run this as root, because that will removing all semaphores, which will probably break a few services.

In our case, switching to the offending users (using su) and cleaning all of theirs was easiest.

I later made a script that parsed the output of ipcs -s -t and threw away only old semaphores and skipped any owned by non-users (like apache).

Semaphore limit details

Very short version: If you're running out of semaphore sets, increase the last number. Unless you've got programs that use a lot of semaphores (most just use a few), the other values are fine.

I decided, fairly arbitrarily, on

250 32000 64 8192

Setting

You can read/set using

sysctl kernel.sem
sysctl -w kernel.sem="250 32000 64 8192"

For a persistent change, you'll want to add/edit /etc/sysctl.conf to mention something like:

kernel.sem = 250 32000 64 8192

Meaning of the values

When tweaking these, it helps to know some details.

Short story: If you need more semaphores, you usually want to increase the number of sets (SEMMNI), and probably scale the system-wide maximum (SEMMNS) along to avoid the case where it could hand out more sets but would reject allocations based on the system max.

The four values are, respectively:

SEMMSL – maximum number of semaphores per set or array
- Each program tends to want more than one semaphore, so so the kernel hands out a bunch of them at the same time (called a set or array). Most programs don't use many, so you rarely need to increase this.
- The max is 65535 (verify)

SEMMNS – maximum number of semaphores system–wide.
- You could say that making this SEMMSL*SEMMSI is most sensible, but since very few processes will use anywhere near the maximum per set, you can often set this to a lowish percentage of that number. In practice, actively using more than 10K is a huge number and typically points to some program seriously misbehaving.

SEMOPM – maximum number of operations allowed for one semop call
- It can make sense to make this the same order of magnitude as SEMMSL, but it only really matters for programs heavily using semaphores(verify))

SEMMNI – maximum number of semaphore sets to hand out
- Probably the most interesting value to increase. A value like 128 can prove a bit conservative, a few hundred more is usually enough, a few thousand should fit fit most less-usual needs.
- The max is 65535 (verify)

An old linux default seems to have been

250     32000   32      128

My home system's defaults seem to have been:

250     32000   32     4096

Other notes

What is the cost?

Generally nothing to worry about.(verify)

Files that are part of boot

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

init is the root of all processes, started by the kernel (often from /sbin/init). (This structure/convention is taken from SysV. It is not the only one around.)

Init delegates most of its work to other scripts, most of them via runlevel changes (there are other things it reacts to, such as Ctrl-Alt-Del, any UPS interaction, and more).

Init behaves according to /etc/inittab

It seems that inittab typically ties runlevel switches to /etc/init.d/rc, with the actual runlevel as its argument. Any further conventions tend to be part of that script.

Such further conventions include:

looking for a directory like /etc/rc3.d (whatever the runlevel argument is)
- look for K* files for services to kill in this runlevel (sorted by the next two digits in the filename), then...
- look for S* files for services to start in this runlevel

rc[2345].d scripts (so muli-user runlevels) may run /etc/rc.local at the end

Also seen on some systems:

/etc/rc.sysinit
/etc/rc.single (stuff for runlevel 1, a.k.a. S)
/etc/rc.multi (stuff for regular runlevels, usually 2 through 5)
/etc/rc.shutdown, for halts (0) and reboots (6)

rc scripts are good for some early system setup, though keep in mind that modern subsystems like udev are the proper place for most device configuration, particularly if they are hot-pluggable by nature.

Reading in passwords

Reading in passwords is usually done by disabling stdin echo, waiting a bit and/or testing whether that worked, then reading, and re-enabling echo.

In scripts, you can use:

read -s -p "Password: " VARIABLE

Where -s asks for silence on stdin, and -p "Password: " is slightly shorter than also having a echo -n "Password"

Modules

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

changing hostname

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Kernel panic diagnosis

SysRq

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

SysRq is a key combination that the linux kernel intercepts at any time - even after a panic.

It's probably most useful when the system appears frozen, but is not panicked.

Depending on the type of lockup, you can sometimes recover.

More often, you might still be able to tell a filesystem to sync so that it's left in a cleaner state before you take off the power. I've also had it be useful where a RAID controller panicked and system commands wouldn't even listen to a reboot anymore.

It needs to be enabled to work, and can be disabled. (how to easily check?)

SysRq is usually on the same physical button as PrintScreen.

The basic key combo (note that sysrq is used as a modifier key):

in text mode: Alt+SysRq command
in graphical mode: Ctrl+Alt+SysRq command (to avoid it causing a print-screen)

Read this if you don't have a SysRq key

There is also a way to do it without keyboard access, e.g. from ssh:

echo 1 > /proc/sys/kernel/sysrq
echo b > /proc/sysrq-trigger

Command is a letter. Assuming you have a QWERTY keyboard (and it is position/keycode, not letter on the physical key, that is important). Some of them:

r – keyboard to xlate mode (in case it was raw, as e.g. X uses) Why?

If you think you can recover it (e.g. it it's trashing but not frozen)

f -- force oom_kill
0 through 9 - set log level
t -- print tasks
n -- reset niceness of high-priority tasks

If you do think it's really frozen:

s -- sync (write contents of disc cache to disk)
e – sends SIGTERM to all processes except init.
i – sends SIGKILL to all processes except init
u – remounts all the filesystems readonly (basically a measure to help you reboot safely)

b -- reboot

login messages

message of the day

Usually read from /etc/motd

May be re-written automatically at boot, possibly more often, and during some updates.

May have some more management. For example, ubuntu has /etc/update-motd.d/ which it uses to generate /etc/motd. You can tweak it to your own needs, or disable it.

Last login

...comes from SSH itself. It's somewhat useful for security, but if you want it gone, you can configure sshd with:

PrintLastLog no

Random data

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

In concept, the entropy pool is a bunch of bits that are impossible to predict by an attacker, because they are collected from the computer's environment. This makes it useful for a number of cryptographic purposes.

Such an entropy pool, if available at all, is managed by the kernel, and updated when it finds data (e.g based on keyboard hits, mouse use, IRQ timings) that passes proper-randomness tests.

Since an entropy pool values quality over quantity, and there are few good sources that are available on all computers, you should never count on the entropy pool being fast.

you should also never count on the entropy pool being fast - or, if your purpose relies on it, to currently have random data at all.

the entropy pool is typically at least a few hundred bits large, up to a few thousand when there are good sources of randomness.

On headless servers, there may be none at all, and the entropy pool may replenish much more slowly than on computers you sit at - so slowly that e.g. keypair generation (which needs randomness) may be very slow, or fail with a timeout, if you do it on a server.

Related devices

It depends on OS, and on a point in time.

For example, on linux

Historically it may have been true that /dev/random was the entropy pool and /dev/urandom a PRNG based on some entropy (but not necessarily very secure), and that the former was very slow at best, and the latter (bits per second, and maybe 10MByte/s).

and that frandom[1] and erandom were non-standard things written to be faster variants of urandom.

These days, /dev/random and /dev/urandom are probably backed by the same CSPRNG (internally still backed by an entropy pool) that can generate a few hundred MByte/s

If you're doing cryptographic things, it's worth doing a little more reading, but actually that CSPRNG is good enough for most things.(verify)

There seems to be no way to get at the entropy pool directly,(verify) but there never was much point. High quality entropy is great to seed a PRNG, but that is provided, and you don't need a lot of bits. Also, people doing /dev/random of=/dev/hda to wipe their drive just had the wrong expectations.

Signal handling

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Keep in mind:

Signal names are more meaningful than their numbers. Not everything uses the same enumeration (e.g. Solaris)

Kernel/OSes may also differ in
- default handlers
- which signals can be caught or are unconditional

Unless specified otherwise, any signal may
- be ignored, i.e. discarded
- be blocked, telling the system to keep in and deliver it later (in most cases, you would only block for a short while, e.g. while doing signal-related housekeeping)
- have its handler replaced
- have a handler in addition to the default one

process groups affect where signals go

Terminal, job management

SIGHUP, 'Hangup'
- sent automatically if a hangup/disconnect is detected on the controlling terminal, or a controlling process dies
- default handler is to close. Ignoring the signal is what nohup does

SIGCHLD
- sent by a process to its parent when it terminates, is stopped, or resumed
- default is to ignore it

SIGINT, interrupt
- sent when user sends interrupt - usually meaning the Ctrl-C key combo

SIGQUIT
- sent when user sends quit signal, usually the Ctrl-D key combo (to something listening on stdin?)

SIGKILL
- quit without clean-up operations
- a process in uninterruptible sleep
- a zombie does not react to KILL; it must be reaped
- cannot be blocked, handled or ignored

SIGTERM, 'terminate',
- asks program to quit cleanly. That is, programs can register this handler to do exactly that
- default signal sent by kill, killall
- rebooting will often move from SIGTERM to SIGKILL if TERM doesn't work quit enough

SIGABRT
- typically sent to itself (possibly via abort()) as emergency termination.
- Mostly just makes it easier to jump to cleanup/kill code from anywhere in your program

SIGALRM, alarm clock signals
- used for timers; you can ask for this signal to be delivered in the future, using alarm() or setitimer()
- see also SIGVTALRM, under deprecated

SIGTSTP, 'terminal stop' (Optional in POSIX)
- default handler saves process state so that a SIGCONT can continue it, but gets to CPU until then
- May also be caught by default SIGTTIN and SIGTTOU handlers
- Keep in mind that once suspended, a process will only handle SIGCONT or SIGKILL (verify)

SIGSTOP (Optional in POSIX)
- Like SIGTSTP, but handler cannot be changed (verify)
- Keep in mind that once suspended, a process will only handle SIGCONT or SIGKILL (verify)

SIGTTIN (Optional in POSIX)
- sent when a program tries to read from stdin but is not part of the foreground group
- default handler sends SIGTSTP to self? (or hust have that same effect?) (verify)

SIGTTOU (Optional in POSIX)
- sent when a program tries to write to terminal but is not part of the foreground group
- default handler sends SIGTSTP to self (or hust have that same effect?) (verify)

SIGCONT (Optional in POSIX)
- sent to continue a process suspended via SIGTSTP, SIGSTOP, SIGTTIN, SIGTTOU

Note on foreground groups: One terminal can serve multiple process groups, but only one is process group is in the foreground

Memory:

SIGSEGV
- sent by the kernel when it notices a memory access that was not part of the process' mapped space. Typically a bug in the offending process's code, related to memory allocation or pointer abuse.

SIGBUS
- sent by the kernel when an address is mapped by does not translate to a valid part of memory hardware (or mapped IO or such)
- similar to SIGSEGV, but lower-level. May e.g. happen when the disk underlying swap has failed.(verify)

IO:

SIGPIPE
- A process is sent this when it writes into a pipe that is close. In other words, if two processes are connected via a pipe and the consumer process dies, the producer process is sent this.
- Source of the "Broken pipe" message

See also SIGIO and SIGURG in the non-POSIX list

Others

Linux reserves SIGRTMIN through SIGRTMAX for real-time (actual value of both seems to vary, particularly SIGRTMIN)

SIGUSR1 and SIGUSR2 - usable by users

SIGFPE, 'floating point error'
- usually caught and handled internally

SIGILL - illegal instruction. Not cleared when caught(verify)

SIGTRAP - trace trap, mostly used in debuggers. (Optional in POSIX)

SIGSYS - bad arguments to system call. (Optional in POSIX)

SIGEMT, 'emulate'. (Optional in POSIX)

SIGCLD - Child status change. (Optional in POSIX)

Not POSIX, or deprecated in POSIX:

SIGWINCH
- used to signals window size change. Apparently not used much, but resizable terminals may.

SIGIO, a.k.a. SIGPOLL(verify)
- Only sent if O_ASYNC is used on a file descriptor
- notification about a file descriptor: that it is ready to receive, that it has new data, or that there is an error

SIGURG
- sent when urgent data arrives on a file descriptor. Mostly used for out-of-band data
- default handler ignores(verify)

SIGPWR
- used to signal that we're on short-term emergency power (e.g. triggerd by UPS software)
- Useful to signal daemons to clean up (and possibly shut down)

SIGVTALRM, 'virtual timer'
- like SIGALRM, but sent some amount of CPU time in the future (instead of wall-clock time), and excluding system code(verify)
- see also SIGALRM, above

SIGPROF - profiling timer expired.
- like SIGVTALRM, but counts CPU time, including system code (verify)

SIGXCPU - exceeded CPU limit (resource limiting)

SIGXFSZ - exceeded file size limit (resource limiting)

SIGCANCEL - Seems to be used internally in pthreads, to help cancel threads.

SIGLOST - signals a resource (e.g. record-lock) is lost (meaning?) (verify)

SIGLWP - used by threading (verify)

SIGFREEZE - Possibly solaris-specific? (verify)

SIGTHAW - seems meant as "we have just resumed the system, this is your chance to do housekeeping before resuming regular operations" Possibly solaris-specific? (verify)

SIGWAITING - Possibly solaris-specific?

Unsorted

SIGTHR - thread interrupt (verify)

SIGINFO

http://www.lindevdoc.org/wiki/Category:Signals http://www.tutorialspoint.com/unix/unix-signals-traps.htm

Loop devices

A loop device uses a file and presents a block device.

As the OS is used to thinking only about block devices as containing filesystem, you need loop devices when you want to mount a filesystem that is stored in a file, be it

an encrypted filesystem-in-a-file,

images of a hard drive, CD, DVD, floppy, or such

Loop devices on various systems:

In linux, the devices are /dev/loop0 and so on. Management is done via losetup (util-linux package)

In BSD and many derivations, this is called a virtual node device and often at /dev/vnd0, /dev/rvnd0 or /dev/svnd0. Management is done via vnconfig.
- except for FreeBSD, which merged the functionality into the memory disk driver (md). Management is done via mdconfig (verify)

Solaris calls it lofi, and places the devices at /dev/lofi/1 and so on. Management via lofiadm

OSX internalizes the functionality. You don't have to manage it.

Windows doesn't natively support this - though there are many available programs for the case of CD/DVD images

Linux loop devices

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Mounting an image, or partition from an image, requires use of a /dev/loop* device to expose a file as block device.

The least-bother way to allocate a loop device is to let mount figure out a free one.

There are roughly two variants of this, because

USB sticks usually do not have partitions, and directly contain a filesystem

hard drives often contain a partition table and multiple partitions, as do CDs, in a different way

...there is an extra step in figuring that out.

Partitionless

To make mount figure out a loop device and mount the image:

mount -o loop -t iso9660 image.iso /mnt/myimage

In older versions you would do this explicitly:

mount -o loop=/dev/loop3 -t iso9660 image.iso /mnt/myimage

What it's doing (and what you could also do more manually) amounts to:

figure out unused loop device from output of

losetup

Associate loop3 with a specific image

losetup /dev/loop3 /images/image.iso

Now /dev/loop3 acts like a block device, and is backed by that iso, so you can mount it:

mount /dev/loop3 /mnt/isoimage

Once you're done, you'll probably want to detach the file from the loop device. It seems newer versions of umount (since 2.6.25(verify)) can do this for you:

umount -d /dev/loop3

In older versions you would do it explicitly:

umount /mnt/isoimage
losetup -d /dev/loop3

Partitions in image file

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

If your image is a whole-disk image coming from a partitioned disk, you have a few options:

The more automatic option is to make losetup effectively do a --partscan (-P), e.g.

losetup -f --show -P sdcard_image.img

This makes it add devices for each partition it finds, e.g. prints /dev/loop3 and makes /dev/loop3p1 through /dev/loop3p6 (whatever's in the partition table)

The same, slightly more manually (and with you choosing a free loop device):

losetup /dev/loop3 /images/sdcard_image.iso
partprobe /dev/loop3   # makes the kernel aware of the partitions, 
                       # sets up partition-specific block devices (e.g. /dev/loop3p5)
fdisk /dev/loop3       # to figure out the partition device name you want
mount /dev/loop3p5 /mnt/sd_part/

And, when you're done:

umount /mnt/sd_part/
losetup -D /dev/loop3    # -D seems preferred, TODO: figure out exact difference to -d

Another option is using kpartx, which is focused on that scanning part

may need to be installed, though

and can also -a to add, -d to remove

You could even figure out the offset of your filesystem, e.g. from reading the partition table. This is more annoying to do, but you can do more without involving the kernel or specific tools.

Encrypted loopback

Wacom tablet notes

Linux admin notes - unsorted and muck

Contents

dotfiles

semaphores

Trouble

Inspecting and removing

Semaphore limit details

Other notes

Files that are part of boot

Reading in passwords

Modules

changing hostname

Kernel panic diagnosis

SysRq

login messages

message of the day

Last login

Random data

Signal handling

Loop devices

Linux loop devices

Partitionless

Partitions in image file

Encrypted loopback

Wacom tablet notes

Navigation menu