Virtual memory: Difference between revisions

Revision as of 13:26, 14 July 2023

The lower-level parts of computers

General: Computer power consumption · Computer noises

Memory: Some understanding of memory hardware · CPU cache · Flash memory · Virtual memory · Memory mapped IO and files · RAM disk · Memory limits on 32-bit and 64-bit machines

Unsorted: GPU, GPGPU, OpenCL, CUDA notes · Computer booting

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

'Virtual memory' ended up doing a number of different things, which for the most part can be explained separately.

Intro

Overcommitting RAM with disk: Swapping / paging; trashing

Page faults

Swappiness

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Practical notes

Linux

"How large should my page/swap space be?"

On memory scarcity

oom_kill

oom_kill is linux kernel code that starts killing processes when there is enough memory scarcity that memory allocations cannot happen within reasonable time - as this is good indication that it's gotten to the point that we are trashing.

Killing processes sounds like a poor solution.

But consider that an OS can deal with completely running out of memory in roughly three ways:

deny all memory allocations until the scarcity stops.

This isn't very useful because

it will affect every program until scarcity stops

if the cause is one flaky program - and it usually is just one - then the scarcity may not stop

programs that do not actually check every memory allocation will probably crash.

programs that do such checks well may have no option but to stop completely (maybe pause)

So in the best case, random applications will stop doing useful things - probably crash, and in the worst case your system will crash.

delay memory allocations until they can be satisfied

This isn't very useful because

this pauses all programs that need memory (they cannot be scheduled until we can give them the memory they ask for) until scarcity stops

again, there is often no reason for this scarcity to stop

so typically means a large-scale system freeze (indistinguishable from a system crash in the practical sense of "it doesn't actually do anything")

killing the misbehaving application to end the memory scarcity.

This makes a bunch of assumptions that have to be true -- but it lets the system recover

assumes there is a single misbehaving process (not always true, e.g. two programs allocating most of RAM would be fine individually, and needs an admin to configure them better)

...usually the process with the most allocated memory, though oom_kill logic tries to be smarter than that.

assumes that the system has had enough memory for normal operation up to now, and that there is probably one haywire process (misbehaving or misconfigured, e.g. (pre-)allocates more memory than you have)

this could misfire on badly configured systems (e.g. multiple daemons all configured to use all RAM, or having no swap, leaving nothing to catch incidental variation)

Keep in mind that

oom_kill is sort of a worst-case fallback

generally

if you feel the need to rely on the OOM, don't.

if you feel the wish to overcommit, don't

oom_kill is meant to deal with pathological cases of misbehaviour

but even then might pick some random daemon rather than the real offender, because in some cases the real offender is hard to define

Tweak likely offenders, tweak your system.

note that you can isolate likely offenders via cgroups now.

and apparently oom_kill is now cgroups-aware

oom_kill does not always save you.

It seems that if your system is trashing heavily already, it may not be able to act fast enough.

(and possibly go overboard once things do catch up)

You may wish to disable oom_kill when you are developing

...or at least equate an oom_kill in your logs as a fatal bug in the software that caused it.

If you don't have oom_kill, you may still be able to get reboot instead, by setting the following sysctls:

vm.panic_on_oom=1

and a nonzero kernel.panic (seconds to show the message before rebooting)

kernel.panic=10

@@ Line 992: / Line 992: @@
 -->
+===Copy on write===
+<!--
+Copy on write allows multiple ''logical''ly distinct chunks to be backed by the same ''physical'' chunk.
+It's something you can do ''transparently'' when both allocation and access involve a layer of indirection that knows about this.
+: ...because that's the only way it won't be easily subverted. In particular, it means that a write won't accidentally alter data underlying multiple copies.
+: when a write happens, it will un-shared, a.k.a. get ts own physical copy (hence the name copy-on-write), in a just-in-time way
+Copy-on-write is often banking on the idea that that write may be rare for a lot of data,
+meaning you can transparently allocate a lot less storage,
+potentially
+: saving storage cost,
+: saving some up-front work (e.g. a linux fork() doesn't have to copy all allocated memory)
+: saving time when making the original copy.
+: having more fast storage (e.g. RAM) to use in general
+In linux there's even a daemon (ksmd[https://www.kernel.org/doc/Documentation/vm/ksm.txt][https://www.kernel.org/doc/html/v4.19/admin-guide/mm/ksm.html], initially made for VMs)
+that goes looking through program memory (where the program has set MADV_MERGEABLE)
+for identical identical pages, to make it copy-on-write.
+This
+* applies to RAM because processes always see only [[virtual memory]] addresses anyway
+:: see e.g. linux [[fork()]] making a copy-on-write memory space
+* applies to databases because they do their own management anyway
+* applies to filesystems because they do their own management anyway
+:: see e.g. LVM snapshots, ZFS snapshots
+In these examples, there is barely a way to subvert it if you wanted.
+Software examples can get more interesting
+* Qt uses copy-on-write, which affects the need for locking
+* C++98's string class was designed to allow implementations to back it with copy on write, but this was messy enough that C++11 dropped this
+''''Other meanings'''
+ZFS's writes have been described as copy-on-write, but this has a different meaning.
+They mean 'when writing to a block, we always first allocate and write a new block with a copy of the data and alterations as you asked',
+and ensures the new block is written to and becomes current ''before'' retiring the old one.
+While ''technically'' this is the same "sharing backed data for a while, until you can't",
+that time is actually intentionally very short,
+and the entire thing is ''only'' really there to avoid [[write hole]] problems (because yeah, this is slower than direct alteration).
+The concept appears in programming sometimes.
+For example, copy-on-write ''strings'' exist, the idea being you can share backing and save some space.
+It turns out there are often barely worth it,
+and they make correctness and concurrency a lot harder (consider e.g. affecting ongoing iterators).
+e.g. C++ basically outlawed them after a while.
+-->
 ===Glossary===