Virtual memory: Difference between revisions
m (→Intro) |
mNo edit summary |
||
Line 6: | Line 6: | ||
===Intro=== | ===Intro=== | ||
<!-- | <!-- | ||
{{comment|(Note: this is a broad-strokes introduction that simplifies and ignores a lot of historical evolution of how we got where we are and ''why'' - a bunch of which I know I don't know)}}. | {{comment|(Note: this is a broad-strokes introduction that simplifies and ignores a lot of historical evolution of how we got where we are and ''why'' - plus a bunch of which ''I know I don't know yet'')}}. | ||
'Virtual memory' describes an abstraction that we ended up using for a number of different things. | 'Virtual memory' describes an abstraction that we ended up using for a number of different things. | ||
For the ''most'' part, you can explain those | For the ''most'' part, you can explain those reasons separately, | ||
though they got entangled over time (in ways that ''mostly'' operating system programmers need to worry about). | |||
At low level, memory access is " | At low level, memory access is "set an address, do a request, get back result". | ||
In olden times, | In olden times, | ||
this described hardware that did nothing more than that {{comment|(in some cases you even needed to do that yourself: set a value on address pins, flip the pin that meant a request, and read out data on some other pins)}}, | |||
and | and the point is that there was ''nothing'' that keeps you from doing any request you want. | ||
Because you all used the same memory space, | Because you all used the same memory space, memory management was a... cooperative thing where everything needed to play nice. | ||
memory management was a | But that was hard, and beyond conventions to what parts were operating system and you wouldn't touch, | ||
there were no standards to multiple processes running concurrently, unless they actively knew about each other. | |||
Which was fine, because multitasking wasn't a buzzword yet. | |||
We ran one thing at a time, and the exceptions to that were clever about how they did that. | |||
To skip a ''lot'' of history {{comment|(the variants on the way are a mess to actually get into)}}, | |||
what we have now is a '''virtual memory system''', | |||
where | |||
* each task gets its own address space. | |||
* there is something managing these assignments parts of memory to tasks | |||
* our running code ''never'' deals ''directly'' with physical addresses. | |||
* and when a request is made, ''something'' is doing translation between the addresses that the program sees, and the physical addresses and memory that actually goes to. | |||
{{comment|The low level implementation is also interesting, in that there there is hardware assisting this setup - things would be terribly slow if it weren't. At the same time, these details are also largely irrelevant, in that it's always there, and fully transparent even to programmers)}} | |||
There are a handful of reasons this addresses-per-task idea is useful. | |||
One of them is just convenience. | |||
If the OS tells you where to go, | |||
you avoid overwriting other tasks accidentally. | |||
There | Arguably the more important one is '''protected memory''': | ||
if that lookup can easily say "that was never allocated to you, ''denied''", | |||
meaning a task can never accidentally ''or'' intentionally access memory it doesn't own. | |||
{{comment|(There is no overlap in ownership until this is intentional, you specifically ask for it, and the OS specifically allows it - a.k.a. [[shared memory]].)}} | |||
This is useful for stability, | |||
that | in that a user task can't bring down a system task accidentally, | ||
as was easy in the "everyone can trample over everyone" days. | |||
Misbehaving tasks will ''probably'' fail in isolation. | |||
Misbehaving tasks will fail in isolation | |||
It's also great for security, in that tasks can't ''intentionally'' access what any other task is doing. | It's also great for security, | ||
in that tasks can't ''intentionally'' access what any other task is doing. | |||
Revision as of 13:38, 9 March 2024
The lower-level parts of computers
General: Computer power consumption · Computer noises Memory: Some understanding of memory hardware · CPU cache · Flash memory · Virtual memory · Memory mapped IO and files · RAM disk · Memory limits on 32-bit and 64-bit machines Related: Network wiring notes - Power over Ethernet · 19" rack sizes Unsorted: GPU, GPGPU, OpenCL, CUDA notes · Computer booting
|
Intro
Swapping / paging; trashing
Overcommitting RAM with disk
On memory scarcity
"How large should my page/swap space be?"
Linux
Swappiness
oom_kill
oom_kill is linux kernel code that starts killing processes when there is enough memory scarcity that memory allocations cannot happen within reasonable time - as this is good indication that it's gotten to the point that we are trashing.
Killing processes sounds like a poor solution.
But consider that an OS can deal with completely running out of memory in roughly three ways:
- deny all memory allocations until the scarcity stops.
- This isn't very useful because
- it will affect every program until scarcity stops
- if the cause is one flaky program - and it usually is just one - then the scarcity may not stop
- programs that do not actually check every memory allocation will probably crash.
- programs that do such checks well may have no option but to stop completely (maybe pause)
- So in the best case, random applications will stop doing useful things - probably crash, and in the worst case your system will crash.
- delay memory allocations until they can be satisfied
- This isn't very useful because
- this pauses all programs that need memory (they cannot be scheduled until we can give them the memory they ask for) until scarcity stops
- again, there is often no reason for this scarcity to stop
- so typically means a large-scale system freeze (indistinguishable from a system crash in the practical sense of "it doesn't actually do anything")
- killing the misbehaving application to end the memory scarcity.
- This makes a bunch of assumptions that have to be true -- but it lets the system recover
- assumes there is a single misbehaving process (not always true, e.g. two programs allocating most of RAM would be fine individually, and needs an admin to configure them better)
- ...usually the process with the most allocated memory, though oom_kill logic tries to be smarter than that.
- assumes that the system has had enough memory for normal operation up to now, and that there is probably one haywire process (misbehaving or misconfigured, e.g. (pre-)allocates more memory than you have)
- this could misfire on badly configured systems (e.g. multiple daemons all configured to use all RAM, or having no swap, leaving nothing to catch incidental variation)
- assumes there is a single misbehaving process (not always true, e.g. two programs allocating most of RAM would be fine individually, and needs an admin to configure them better)
Keep in mind that
- oom_kill is sort of a worst-case fallback
- generally
- if you feel the need to rely on the OOM, don't.
- if you feel the wish to overcommit, don't
- oom_kill is meant to deal with pathological cases of misbehaviour
- but even then might pick some random daemon rather than the real offender, because in some cases the real offender is hard to define
- note that you can isolate likely offenders via cgroups now (also meaning that swapping happens per cgroup)
- and apparently oom_kill is now cgroups-aware
- oom_kill does not always save you.
- It seems that if your system is trashing heavily already, it may not be able to act fast enough.
- (and possibly go overboard once things do catch up)
- You may wish to disable oom_kill when you are developing
- ...or at least equate an oom_kill in your logs as a fatal bug in the software that caused it.
- If you don't have oom_kill, you may still be able to get reboot instead, by setting the following sysctls:
vm.panic_on_oom=1
and a nonzero kernel.panic (seconds to show the message before rebooting)
kernel.panic=10
See also
Page faults
See also