Virtual memory: Difference between revisions

Revision as of 13:38, 15 July 2023

The lower-level parts of computers

General: Computer power consumption · Computer noises

Memory: Some understanding of memory hardware · CPU cache · Flash memory · Virtual memory · Memory mapped IO and files · RAM disk · Memory limits on 32-bit and 64-bit machines

Unsorted: GPU, GPGPU, OpenCL, CUDA notes · Computer booting

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

'Virtual memory' ended up doing a number of different things. For the most part, you can explain those things separately.

Intro

Swapping / paging; trashing

Overcommitting RAM with disk

Page faults

Swappiness

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Practical notes

Linux

"How large should my page/swap space be?"

On memory scarcity

oom_kill

oom_kill is linux kernel code that starts killing processes when there is enough memory scarcity that memory allocations cannot happen within reasonable time - as this is good indication that it's gotten to the point that we are trashing.

Killing processes sounds like a poor solution.

But consider that an OS can deal with completely running out of memory in roughly three ways:

deny all memory allocations until the scarcity stops.

This isn't very useful because

it will affect every program until scarcity stops

if the cause is one flaky program - and it usually is just one - then the scarcity may not stop

programs that do not actually check every memory allocation will probably crash.

programs that do such checks well may have no option but to stop completely (maybe pause)

So in the best case, random applications will stop doing useful things - probably crash, and in the worst case your system will crash.

delay memory allocations until they can be satisfied

This isn't very useful because

this pauses all programs that need memory (they cannot be scheduled until we can give them the memory they ask for) until scarcity stops

again, there is often no reason for this scarcity to stop

so typically means a large-scale system freeze (indistinguishable from a system crash in the practical sense of "it doesn't actually do anything")

killing the misbehaving application to end the memory scarcity.

This makes a bunch of assumptions that have to be true -- but it lets the system recover

assumes there is a single misbehaving process (not always true, e.g. two programs allocating most of RAM would be fine individually, and needs an admin to configure them better)

...usually the process with the most allocated memory, though oom_kill logic tries to be smarter than that.

assumes that the system has had enough memory for normal operation up to now, and that there is probably one haywire process (misbehaving or misconfigured, e.g. (pre-)allocates more memory than you have)

this could misfire on badly configured systems (e.g. multiple daemons all configured to use all RAM, or having no swap, leaving nothing to catch incidental variation)

Keep in mind that

oom_kill is sort of a worst-case fallback

generally

if you feel the need to rely on the OOM, don't.

if you feel the wish to overcommit, don't

oom_kill is meant to deal with pathological cases of misbehaviour

but even then might pick some random daemon rather than the real offender, because in some cases the real offender is hard to define

note that you can isolate likely offenders via cgroups now (also meaning that swapping happens per cgroup)

and apparently oom_kill is now cgroups-aware

oom_kill does not always save you.

It seems that if your system is trashing heavily already, it may not be able to act fast enough.

(and possibly go overboard once things do catch up)

You may wish to disable oom_kill when you are developing

...or at least equate an oom_kill in your logs as a fatal bug in the software that caused it.

If you don't have oom_kill, you may still be able to get reboot instead, by setting the following sysctls:

vm.panic_on_oom=1

and a nonzero kernel.panic (seconds to show the message before rebooting)

kernel.panic=10

@@ Line 10: / Line 10: @@
 ===Intro===
 <!--
-A '''virtual memory system''' is one in which running code never deals ''directly'' with physical addresses.
+{{comment|(Note: this is a broad-strokes introduction that simplifies and ignores a lot of historical evolution of how we got where we are and ''why'' - a bunch of which I know I don't know)}}.
-Instead,
-each task gets its own address space.
-some sort of translation, between the addresses that the OS/programs see, and the physical addresses and memory that actually goes to, via a lookup table.
+In olden times, everyone could access all memory directly. You all used the same memory space so memory management was a more cooperative thing -- which is a pain and one of various reasons you would run one thing at a time (with few exceptions).
-No matter the addresses used within each task, they can't clash in physical memory (or rather, ''won't'' overlap until the OS specifically allows it - see shared memory).
-There are a handful of reasons this can be useful. {{comment|(Note: this is a broad-strokes introduction that simplifies and ignores a lot of historical evolution of how we got where we are and ''why'' - a bunch of which I know I don't know)}}.
+We now typically have a a '''virtual memory system''',
+where running code never deals ''directly'' with physical addresses.
-The larger among these ideas is '''protected memory''': that lookup can easily say "that is not allocated to you, ''denied''", meaning a task can never accidentally access memory it doesn't own. (once upon a time any program could access any memory, but this has practical issues)
+It just means there's something inbetween - mostly there to be a little cleverer for you.
+So now, each task gets its own address space.
+and ''something'' is doing translation between the addresses that the program sees, and the physical addresses and memory that actually goes to.
+The low level implementation (like the fact that hardware is actually assisting this) is ''interesting'', but usually less relevant.
+No matter the addresses used within each task, they can't clash in physical memory (or rather, ''won't'' overlap until you ask for it an the OS specifically allows it - see [[shared memory]]).
+There are a handful of reasons this can be useful.
+The larger among these ideas is '''protected memory''': that lookup can easily say "that is not allocated to you, ''denied''", meaning a task can never accidentally access memory it doesn't own.
 This is useful for stability, in that a user task can't bring down a system task accidentally. Misbehaving tasks will fail in isolation.
-It's also great for security, in that tasks can't do it intentionally - you can't read what anyone else is doing.
+It's also great for security, in that tasks can't reach out what others are doing ''intentionally'' either.
-{{comment|(Note that you can have protection ''without'' virtual addresses, if you keep track of what belongs to a task. A few embedded systems opt for this because it can be a little simpler (and a little faster) without that extra step of indirection. Yet in general you get and want both.)}}
+{{comment|(Note that you can have protection ''without'' virtual addresses, if you keep track of what belongs to a task.
+You can also have virtual addresses without protection.   A few embedded systems opt for this because it can be a little simpler (and a little faster) without that an extra step of work or indirection. Yet on general purpose computes you want and get both.)}}
-Another reason is that processes (and most programmers) don't have to think about other tasks, the OS, or their management.  Say, in the DOS days, you all used the same memory space so memory management was a more cooperative thing -- which is a pain and one of various reasons you would run one thing at a time (with few exceptions).
+Another reason is that processes (and most programmers) don't have to think about other tasks, the OS, or their management.
 There are other details, like
-* an OS can effectively unify underlying changes over time, varying hardware lookup/protection implementations, with extensions and variations even in the same CPU architecture/family.
+* having this be a separate system means the OS can unify underlying changes over time, and abstract out the changing ways hardware implements it (with extensions and variations even in the same CPU architecture/family).
-* it can make fragments of RAM look contiguous to a process, which makes life much easier for programmers, and has negligible effect on speed (because of the RA in RAM).
+* it can make fragments of RAM look contiguous to a process, which makes life much easier for programmers, and has only a small effect on speed (because of the RA in RAM).
 : generally the VMM does try to minimise fragmentation where possible, because too much can trash the fixed-size TLB
-* on many systems, the first page in a virtual address space is marked unreadable, which is how null pointer references can be caught more easily/efficiently than on systems without MMU/MPUs.
+* on many systems, the first page in a virtual address space is marked unreadable, which is how null pointer references can be caught
+:: and why that happens to be easier/more efficient on systems ''with'' a MMU/MPUs.
 * In practice it matters that physical/virtual mapping is something a cache system can understand. There are other solutions that are messier.
@@ Line 117: / Line 130: @@
 -->
 ===Swapping / paging; trashing===