Virtual memory: Difference between revisions
m (→oom_kill) |
m (→Swappiness) |
||
(2 intermediate revisions by the same user not shown) | |||
Line 194: | Line 194: | ||
<!-- | <!-- | ||
As mentioned, swapping/paging has the effect that the VMM can have a pool of virtual memory that could be backed from RAM ''and'' disk. | As mentioned, swapping/paging has the effect that | ||
the VMM can have a pool of virtual memory that could be backed from RAM ''and'' from disk. | |||
''"Can you choose to map or allocate more total memory than would all fit into RAM at the same time?"'' | ''"Can you choose to map or allocate more total memory than would all fit into RAM at the same time?"'' | ||
Yes | Yes. | ||
And a small degree of this is even ''common''. | |||
Using disk for memory seems like a bad idea, because disks are significantly slower than RAM in both bandwidth and latency. | |||
''Especially'' with platter days, but is still true in the SSD days. | |||
Which is why the VMM will always prefer to use RAM when it has it. | |||
This "...and also disk" can be considered overcommit of RAM, | |||
though note this is ''not'' the only meaning the term overcommit (or even the usual one), see below. | |||
Line 538: | Line 543: | ||
--> | --> | ||
====On memory scarcity==== | ====On memory scarcity==== | ||
Line 595: | Line 599: | ||
<!-- | <!-- | ||
There used to be advice like "Your swap file needs to be 1.5x RAM size", and tables. | There used to be advice like "Your swap file needs to be 1.5x RAM size", and tables to go along. | ||
The tables's values varying wildly shows just how arbitrary this is. | |||
That they are usually 20 years old more so. | |||
It depends significantly with the amount of RAM you have. | |||
Line 716: | Line 723: | ||
<!-- | <!-- | ||
There is an aggressiveness with with an OS will swap out allocated-but-inactive pages to disk. | There is an aggressiveness with with an OS will swap out allocated-but-inactive pages to disk. | ||
Line 723: | Line 731: | ||
Linux calls this ''swappiness''. | Linux calls this ''swappiness''. | ||
Higher swappiness mean the general tendency to swap out is higher | Higher swappiness mean the general tendency to swap out is higher. | ||
This general swappiness is combined with other (often more volatile) information, | |||
including the system's currently mapped ratio, | |||
a measure of how much trouble the kernel has recently had freeing up memory, | |||
and some per-process (per-page) statistics. | |||
Latest revision as of 18:06, 22 April 2024
The lower-level parts of computers
General: Computer power consumption · Computer noises Memory: Some understanding of memory hardware · CPU cache · Flash memory · Virtual memory · Memory mapped IO and files · RAM disk · Memory limits on 32-bit and 64-bit machines Related: Network wiring notes - Power over Ethernet · 19" rack sizes Unsorted: GPU, GPGPU, OpenCL, CUDA notes · Computer booting
|
Intro
Swapping / paging; trashing
Overcommitting RAM with disk
On memory scarcity
"How large should my page/swap space be?"
Linux
Swappiness
oom_kill
oom_kill is linux kernel code that starts killing processes when there is enough memory scarcity that memory allocations cannot happen within reasonable time (because that usually means we are already trashing).
Killing processes sounds like a poor solution.
But consider that an OS can deal with completely running out of memory in roughly three ways:
- deny all memory allocations until the scarcity stops.
- This isn't very useful because
- it will affect every program until scarcity stops
- if the cause is one flaky program - and it usually is just one - then the scarcity may not stop
- programs that do not actually check every memory allocation will probably crash.
- programs that do such checks well may have no option but to stop completely (maybe pause)
- So in the best case, random applications will stop doing useful things - probably crash, and in the worst case your system will crash.
- delay memory allocations until they can be satisfied
- This isn't very useful because
- this pauses all programs that need memory (they cannot be scheduled until we can give them the memory they ask for) until scarcity stops
- again, there is often no reason for this scarcity to stop
- so typically means a large-scale system freeze (indistinguishable from a system crash in the practical sense of "it doesn't actually do anything")
- killing the misbehaving application to end the memory scarcity.
- This makes a bunch of assumptions that have to be true -- but it lets the system recover
- assumes there is a single misbehaving process (not always true, e.g. two programs allocating most of RAM would be fine individually, and needs an admin to configure them better)
- ...usually the process with the most allocated memory, though oom_kill logic tries to be smarter than that.
- assumes that the system has had enough memory for normal operation up to now, and that there is probably one haywire process (misbehaving or misconfigured, e.g. (pre-)allocates more memory than you have)
- this could misfire on badly configured systems (e.g. multiple daemons all configured to use all RAM, or having no swap, leaving nothing to catch incidental variation)
- assumes there is a single misbehaving process (not always true, e.g. two programs allocating most of RAM would be fine individually, and needs an admin to configure them better)
Keep in mind that
- oom_kill is sort of a worst-case fallback
- generally
- if you feel the need to rely on the OOM, don't.
- if you feel the wish to overcommit, don't
- oom_kill is meant to deal with pathological cases of misbehaviour
- but even then might pick some random daemon rather than the real offender, because in some cases the real offender is hard to define
- note that you can isolate likely offenders via cgroups now (also meaning that swapping happens per cgroup)
- and apparently oom_kill is now cgroups-aware
- oom_kill does not always save you.
- It seems that if your system is trashing heavily already, it may not be able to act fast enough.
- (and possibly go overboard once things do catch up)
- You may wish to disable oom_kill when you are developing
- ...or at least equate an oom_kill in your logs as a fatal bug in the software that caused it.
- If you don't have oom_kill, you may still be able to get reboot instead, by setting the following sysctls:
vm.panic_on_oom=1
and a nonzero kernel.panic (seconds to show the message before rebooting)
kernel.panic=10
See also
Page faults
See also