CPU cache notes
|The lower-level parts of computers
|This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)|
CPU caches put a little faster-but-costlier SRAM (or similar) between CPU (registers are even faster) and main RAM (slowish, often DRAM).
CPU caches will mirror fragments of main RAM. Whenever accesses towards main RAM can be served from cache, they are served faster.
Today [Computer_/_Speed_notes order of 1 to 10ns instead of order of 100ns], but the idea has been worth implementing in CPUs since they ran at a dozen MHz or so(verify).
These caches are entirely transparent, in that a user or even programmer should not have to care about how it does its thing, and you could completely ignore their presence, and arguably shouldn't be able to control what it does at all.
As a programmer, you may like a general idea of how they work, because designing for caches in general can help speed for longer.
Optimizing for specific CPU's cache constructions, while possible, is often often barely worth it, and may even prove counterproductive for other CPUs, or even the same brand's a few years later. If you remember just one thing, 'small data is a little likelier to stay in cache', and even that is less true if there are a lot of programs vying for CPU time.
It can also give slightly better spatial locality for individual programs.
Other things, like branch locality can help, but is largely up to the compiler.
A few things, like that arrays have sequential locality that e.g. trees do not, are more down to algorithm choice, but usually out of your hands.
And, in high level reflective OO style languages, you may have little control anyway.
Avoiding caches getting flushed more then necessary help, as can avoiding cache contention - so it helps to know what that is and why it happens. And see when.