Virtual memory: Difference between revisions

From Helpful
Jump to navigation Jump to search
Line 531: Line 531:


* https://serverfault.com/questions/362589/effects-of-configuring-vm-overcommit-memory
* https://serverfault.com/questions/362589/effects-of-configuring-vm-overcommit-memory
-->
====Linux====
<!--
It seems that *nix swapping logic is smart enough to do basic RAID-like spreading among its swap devices, meaning that a swap partition on every disks that isn't actively used (e.g. by by something important like a database) is probably useful.
Swap used for hibernation can only come from a swap partition, not a swap file {{comment|(largely because that depends too much on whether the underlying filesystem is mounted)}}.
Linux allows overcommit, but real world cases vary.
It depends on three things:
* swap space
* RAM size
* overcommit_ratio (defaults to 50%)
When swap space is, say, half of RAM size,
On servers/workstations with at least dozens of GBs of RAM,
this will easily mean overcommit_ratio should be 80-90 for userspace to be able to use most/all RAM.
If the commit limit is lower than RAM, the rest goes (mostly) to caches and buffers.
Which, note is often useful, and sometimes it can even be preferable to effectively have a little dedicated cache. 
-->
=====Swappiness=====
{{stub}}
<!--
There is an aggressiveness with with an OS will swap out allocated-but-inactive pages to disk.
This is often controllable.
Linux calls this ''swappiness''.
Higher swappiness mean the general tendency to swap out is higher - though other, more volatile information is used too, including the system's currently mapped ratio, a measure of how much trouble the kernel has recently had freeing up memory.
Swapping out is always done with cost/benefit considerations.
The cost is mainly the time spent,
the benefit is largely giving more RAM to programs and caches (then also doing some swapping now rather than later).
(note that linux swaps less aggressively than windows to start with - at least with default settings)
There are always pages that are inactive simply because programs very rarely use it ([[80/20]]-like access patterns).
But if you have plenty of free RAM it might not even swap ''those'' out, because benefit is estimated to be low.
: I had 48GB and 256GB workstations at work and people rarely got them to swap ''anything''.
It's a gliding scale. To illustrate this point, consider the difference between:
* using more RAM than we have - we will probably swap in response to every allocation
: or worse, in the case of trashing: we are swapping purely to avoid crashing the system
: Under high memory strain, cost of ''everything'' is high, because we're not swapping to free RAM for easier future use, we're swapping to not crash the system.
* Swapping at any other time is mainly about pro-actively freeing up RAM for near-future use.
: being IO we otherwise have to concentrate to during the next large allocation request
:: arguing for ''higher'' swappiness, because it will effectively spread that work over time,
These are entirely different cases.
* The former clobbers caches, the latter builds it up
* the former ''is'' memory strain, the latter ''may'' lessen it in the future
: (if the peak use is still sensible, and won't trash itself along with everything else)
Arguments for '''lower''' swappiness:
* Delays putting things on slower disk until RAM is necessary for something else
** ...avoiding IO (also lets drives spin down, which can matters to laptop users)
** (on the flipside, when you want to allocate memory ''and'' the system needs to swap out things first to provide that memory, it means more work, IO, and sluggishness concentrated at that time)
* apps are more likely to stay in memory (particularly larger ones). Over-aggressive swapout (e.g. inactivity because you went for coffee) is less likely, meaning it is slightly less likely that you have to wait for a few seconds of churning swap-in when you continue working
: not swapping out GUI programs makes them ''feel'' faster even if they don'
* When your computer has more memory than you actively use, there will be less IO caused by swapping inactive pages out and in again (but there are other factors that ''also'' make swapping less likely in such cases)
Arguments for '''higher''' swappiness seem to include{{verify}}:
* When there is low memory pressure, caches is what makes (repetitive) disk access faster.
* keeps memory free
** spreads swap activity over time, useful when it is predictably useful later
** free memory is usable by the OS page cache
* swapping out rarely used pages means new applications and new allocations are served faster by RAM
: because it's less likely we have to swap other things out at allocation time
* allocation-greedy apps will not cause swapping so quickly, and are served more quickly themselves
'''On caches'''
Swappiness applies mostly to process's memory, and not to kernel constructs like the OS [[page cache]] (and [[dentry cache]], and [[inode cache]]).
That means that swapping things out increases the amount of OS page cache we have.
From a perspective of data caching, you can see swappiness as one knob that (somewhat indirectly) controls how likely data will sit in a process, OS cache, or swapped out.
Consider for example the case of large databases (often following some 80/20-ish locality patterns).
If you can make the database cache data in its own process memory, you may want lower swappiness, since that makes it more likely that needed data is still in memory.
If you ''disable'' that in-process caching of tables, then might get almost the same effect, because the space freed is instead left of the OS page cache, which may then store all the file data you read most - which can be entirely the same thing (if you have no other major programs on the host).
{{comment|(In some cases (often  mainly 'when nothing else clobbers it'), the OS page cache is a simple and great solution. Consider how a file server will automatically focus on the most common files, transparently hand it to multiple processes, etc.
Sure, for some cases you design something smarter, e.g. a LRU memcache.
And of course this cache is bad to count on when other things on the server start vying for the same cache (and clobbering it as far as you're concerned).
This also starts to matter when you fit a lot of different programs onto the same server so they start vying for limited memory.
'''Server versus workstation'''
There is some difference between server and workstation.
Or rather, a system that is more or less likely to touch on the same data repeatedly,
hence value caches. A file server typically will, other servers frequently will.
Desktop tends to see relatively random disk access so cache doesn't matter much.
Instead, you may care to avoid GUI program swapped out much,
by having ''nothing'' swap out even when approaching memory pressure.
This seems like micromanaging for a very specific case (you're off as badly at actual memory pressure, and off as well when you have a lot of free RAM), but it might sometimes apply.
'''Actual tweaking'''
There is also:
* vm.swappiness -
* vm.vfs_cache_pressure -
In linux you can use proc or sysctl to check and set swappiness
cat /proc/sys/vm/swappiness
sysctl vm.swappiness
...shows you the current swappiness (a number between 0 and 100), and you can set it with something like:
echo 60 >  /proc/sys/vm/swappiness
sysctl -w vm.swappiness=60
This is '''not'' a percentage, as some people think. It's a fudgy value, and hasn't meant the same thing for all iterations of the code behind this.
Some kernels do little swapping for values in the range 0-60 (or 0-80, but 60 seems the more common tipping point).
It seems gentler tweaking is in the 20-60 range.
A value of 100 or something near it tends to make for very aggressive swapping.
* 0 doesn't disable, but should be pretty rare until memory pressure (which probably makes oom_kill likelier to trigger)
* Close to 100 is very aggressive.
1 is enabled by very light
up to 10
Note that the meaning of the value was never very settled, and has changed with kernels versions {{comment|(for example, (particularly later) 2.6 kernels swap out more easily under the same values than 2.4)}}.
* If you swap to SSD, you might lower swappiness to make it live longer
: but memory use peaks will affect it more than swappiness
People report that
* interaction with a garbage collector (e.g. JVM's) might lead to regular swapping
: so argue for lower swappiness
* servers:
: 10 may ''may'' make sense e.g. on database servers to focus on caches
: on a dedicated machine, if what you keep in apps may instead be in OS cache it may matter little
:
* desktops:
: around &le;10 starts introducing choppiness and pauses (probably because it concentrates swapping IO to during allocation requests)
* VMs make things more interesting
* containers too make things more interesting
See also:
* http://lwn.net/Articles/83588/
* https://lwn.net/Articles/690079/
* https://askubuntu.com/questions/184217/why-most-people-recommend-to-reduce-swappiness-to-10-20


-->
-->

Revision as of 20:32, 20 January 2024

The lower-level parts of computers

General: Computer power consumption · Computer noises

Memory: Some understanding of memory hardware · CPU cache · Flash memory · Virtual memory · Memory mapped IO and files · RAM disk · Memory limits on 32-bit and 64-bit machines

Related: Network wiring notes - Power over Ethernet · 19" rack sizes

Unsorted: GPU, GPGPU, OpenCL, CUDA notes · Computer booting



This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Intro

Swapping / paging; trashing

Overcommitting RAM with disk

"How large should my page/swap space be?"

On memory scarcity

oom_kill

oom_kill is linux kernel code that starts killing processes when there is enough memory scarcity that memory allocations cannot happen within reasonable time - as this is good indication that it's gotten to the point that we are trashing.


Killing processes sounds like a poor solution.

But consider that an OS can deal with completely running out of memory in roughly three ways:

  • deny all memory allocations until the scarcity stops.
This isn't very useful because
it will affect every program until scarcity stops
if the cause is one flaky program - and it usually is just one - then the scarcity may not stop
programs that do not actually check every memory allocation will probably crash.
programs that do such checks well may have no option but to stop completely (maybe pause)
So in the best case, random applications will stop doing useful things - probably crash, and in the worst case your system will crash.
  • delay memory allocations until they can be satisfied
This isn't very useful because
this pauses all programs that need memory (they cannot be scheduled until we can give them the memory they ask for) until scarcity stops
again, there is often no reason for this scarcity to stop
so typically means a large-scale system freeze (indistinguishable from a system crash in the practical sense of "it doesn't actually do anything")
  • killing the misbehaving application to end the memory scarcity.
This makes a bunch of assumptions that have to be true -- but it lets the system recover
assumes there is a single misbehaving process (not always true, e.g. two programs allocating most of RAM would be fine individually, and needs an admin to configure them better)
...usually the process with the most allocated memory, though oom_kill logic tries to be smarter than that.
assumes that the system has had enough memory for normal operation up to now, and that there is probably one haywire process (misbehaving or misconfigured, e.g. (pre-)allocates more memory than you have)
this could misfire on badly configured systems (e.g. multiple daemons all configured to use all RAM, or having no swap, leaving nothing to catch incidental variation)


Keep in mind that

  • oom_kill is sort of a worst-case fallback
generally
if you feel the need to rely on the OOM, don't.
if you feel the wish to overcommit, don't
oom_kill is meant to deal with pathological cases of misbehaviour
but even then might pick some random daemon rather than the real offender, because in some cases the real offender is hard to define
note that you can isolate likely offenders via cgroups now (also meaning that swapping happens per cgroup)
and apparently oom_kill is now cgroups-aware
  • oom_kill does not always save you.
It seems that if your system is trashing heavily already, it may not be able to act fast enough.
(and possibly go overboard once things do catch up)
  • You may wish to disable oom_kill when you are developing
...or at least equate an oom_kill in your logs as a fatal bug in the software that caused it.
  • If you don't have oom_kill, you may still be able to get reboot instead, by setting the following sysctls:
vm.panic_on_oom=1

and a nonzero kernel.panic (seconds to show the message before rebooting)

kernel.panic=10


See also



Page faults

See also

Copy on write

Glossary