Shell and process nitty gritty
Linux-related notes
Shell, admin, and both:
|
On terminals
Process stuff
Creating processes
fork()
A fork of a process makes a nearly-identical copy of the calling process, including (verify)
- copy of process credentials
- copy of nice value
- copy of environment
- copy of current working directory
- copy of stack
- copy of memory pages (now often implemented as copy-on-write to save memory(verify))
- copy of open file descriptors (including e.g. sockets)
- generally you want one of the two processes to forget these
- copy of resource limits
...but...
- distinct process ID (...since it's a new process)
- distinct parent process ID (the fork parent)
- distinct process statistics like times, resource utilizations
- distinct pending signals
- distinct locks (verify)
- distinct returned value from the fork() call (0 in the child; child PID in the parent or -1 on failure)
- ...which is the easiest way for your code to tell the difference.
You also probably care about the implications on process groups and sessions - see process relations
vfork() and clone()
To understand the comparisons, first note the nature of fork and exec:
- fork - creates a duplicate process
- in the old days meaning its memory was fully copied (memory duplication)
- these days (in linux) it's shared, copy-on-write - still some extra memory management, but little to no memory duplication.
- ...but the below first discusses the old situation.
- exec - replaces process image, creating its own separate address space
The typical way for one process to launch a child process is to fork() to get a new process, then exec() only in the child to replace that copied process with what you actually want to run.
Is fork-and-exec the only way to create processes?
Historically, basically yes.
It's largely a "this is just how we make it work at the time" thing, seemingly because both fork() and exec() were useful things on their own, and the combination works out as "create distinct child process" with basically zero extra code. [1]
Also, fork() is sometimes useful without exec, to quickly set up a nearly-identical process,
which is part of the reason for the copy of the memory space.
Still, the fork-and-exec combination is much more common than either fork() or exec() alone(verify).
In a fork-and-exec, the fork classically implied allocating the same amount of memory (which counts towards your commit limit) and copies lots of data (which takes some time).
When you know the very next call will be exec (which implies you will immediately forget these memory contents even exist), both allocation and copy are unnecessary, so wasted resources.
It's particularly stupid when large, long-running process occasionally need small helper processes.
It turns out that software will sometimes get around huge copies by setting up a separate program whose only job it is to be small and do the forks-and-exec for the larger one you.
This is sort of convoluted, extra management in general, and more so if you care about process relations and security.
We can do better.
vfork() was an idea from POSIX (and apparently how BSD does it) to address the above:
- It explicitly defines that the child shares memory space (yet the child should never touch it), and
- that the parent is paused until the child exits or calls exec().
This is an improvement over classic fork, in that it skips the allocation and copy. (It still means a little more for the OS to keep track of, but it's mostly just copying the page tables and those are intentionally kept simple and small)
Given that (particularly in a CoW system) there's not a lot of practical difference between fork() and vfork(), it then matters that vfork() is more implied contact, something you could mess up, so ends up being a little harder to use correctly[2] than fork() is.
And the thing is that since then, a lot of OSes allow copy-on-write (CoW) memory, which basically means "I'll point to the exact same memory as long as you only read it, but when you try to write to it, only then will the kernel will make a copy of just that one small page", a very "only allocate what you actually need" solution.
A copy-on-write system roughly gives you what vfork does, but for free because it's doing CoW always anyway. (plus a fork-without-exec gets to write to what amounts to its own data)
There is also posix_spawn, but like vfork it is harder to use correctly(verify)
So there seems to be an argument that sticking with fork, roughly "if we keep it to one case, the behaviour and responsibilities and exceptions stay easier to understand" (verify))
In low-level practice a bunch of responsibilities, and code, is shared between clone(), fork(), vfork(), and posix_spawn(), and e.g. the syscall underneath clone is used in linux's implementation of fork(), vfork(), pthread_create(), and is used in OS containers like in docker(verify), just because it's the common code and the rest is varied (kernel-space) bookkeeping.
clone()
- linux-specific (not POSIX like fork, vfork, posix_spawn)
- is like (v)fork in that it shares its memory space (and file descriptors, signal handlers, and others?)
- unlike (v)fork in that it jumps to a given function
You could see it as an alternative to getting two processes to share memory, but it's rarely used for that(verify) because without e.g. threading logic you're going to make a mess fast.
https://stackoverflow.com/questions/4856255/the-difference-between-fork-vfork-exec-and-clone
https://stackoverflow.com/questions/4259629/what-is-the-difference-between-fork-and-vfork
Process relations
Particularly around shell logins, there is frequently a relation among processes.
Let's throw around some terms, then figure out how they work:
- Every process has a parent.
- Sessions and groups:
- A session contains process groups,
- a process group contains processes
- Also, a process may contain threads - but how tasks are modeled is more OS-specific and another topic.
A process has various extra fields (this is mostly POSIX stuff)
- PID - process ID
- PPID - parent process ID
- PGID - process group ID
- SID - session ID
Which you can inspect in top, ps, and such, e.g.
ps ax -o pid,ppid,pgid,sid,comm
Some platforms(verify) let a process fetch the PGID of the relevant session's controlling terminal (TPGID), which some such tools will also show.
Parent and child processes
Creating a new process makes you its parent.
Every process has a parent process ID (PPID) set, so there is basically one main tree of processes.
There are roughly two main ways that can need attention:
Orphans
Orphan processes are those still running after their parent has stopped.
This can be done intentionally, as it often is for daemons largely because it lets you avoid some implied behaviour from process group logic.
That same behaviour can be useful to avoid unintentional orphans.
It is typically init's job to adopt these processes as its own children.
Zombies
A zombie process, a.k.a. defunct process is one which has exit()ed (so won't respond to KILL, it's already dead) but has not been wait()ed for by its parent process.
Calling wait(), also known as reaping, does little more than fetch the exit code and reason for termination - but needs to happen.
Zombies take no process resources, but are still an entry in the kernel's process table (so still occupy a PID, and if never reaped will eventually exhaust PIDs and make you unable to create new processes. This is hard to do accidentally, though easy to do intentionally).
It's normal to see zombies for a moment, or for a handful of seconds on a trashing system because processes that are the relevant parents are not getting scheduled quickly, but in general, if there are persistent zombies then usually the parent has forgotten to to reap them, and that's usually a bug.
Sessions and process groups
Practical ways this matters
shell execution, job control
Forking a daemon
For context, skim creating processes above
What forking a daemon adds to that is that a daemon should be a process that dissociates itself from the terminal that started it.
A double-fork is preferable for thoroughness, for some fairly OS-internal reasons:
After the first fork, and a setsid() to intentionally start your own session, you are what is called a session leader.
Which is mostly right, but not entirely, in that if you happen to open() a file descriptor which is a terminal, the OS will make it your new controlling terminal -- and we didn't want one, particularly not accidentally (in part because that can be a new source of signals like SIGINT from a Ctrl-C).
The second fork gets rid of that pesky session leader status, and this implied behaviour.
Also, forking twice helps reparent it to init sooner - a single fork will implicitly get it re-parented by init, yes, but only once the process that started it exits.
When you double-fork that exit is part of the procedure.
That said, in a lot of practical cases this is mostly style points, or good habit, than strictly necessary(verify).
When you want a process to fork off as a daemon, you often want to consider most of:
- fork() so the parent can exit and you get control of the shell again, and ensures the process is not a process group leader
- setsid() to become a process group and session group leader (setsid would fail if you are already a process group leader).
- This will remove yourself from the initially controlling terminal, so that you are now fully disconnected from the terminal you were started from, its job management, and its user.
- fork() again, so that your parent (the session group leader) can exit.
- The new process is not a session group leader, which means it cannot accidentally acquire a controlling terminal later (which it would if it opened a file that happened to be a terminal).
And somewhat more optionally, to be nice/thorough:
- chdir("/") to avoid keeping the current working directory (whatever that was) occupied, which could potentially blocking things like umounts or rmdirs. You could instead change to a directory that should not go away while the daemon is running.
- umask(077) or whatever value is useful to you. Mostly to know what it is, as we don't know what umask we inherited.
- you may close() the inherited stdin (fd 0), stdout (fd 1), and stderr (fd 2) streams; they are generally not so useful since you often don't know where these are redirected to, and being a daemon, you should probably want to rely purely on logfiles for output. (verify)
See also:
Signals
The status of processes is part of where signals go.
For example:
- Ctrl-C goes to the foreground process group (all processes in it)
- a modem hangup signal goes to the session leader (which is a single process)
- this is roughly why SIGHUP is frequently used to tell the controller of a pool of workers to do something