Some understanding of memory hardware
The lower-level parts of computers
General: Computer power consumption · Computer noises Memory: Some understanding of memory hardware · CPU cache · Flash memory · Virtual memory · Memory mapped IO and files · RAM disk · Memory limits on 32-bit and 64-bit machines Related: Network wiring notes - Power over Ethernet · 19" rack sizes Unsorted: GPU, GPGPU, OpenCL, CUDA notes · Computer booting
|
"What Every Programmer Should Know About Memory" is a good overview of memory architectures, RAM types, reasons bandwidth and access speeds vary.
RAM types
DRAM - Dynamic RAM
- lower component count per cell than most (transistor+capacitor mainly), so high-density and cheaper per storage size
- yet capacitor leakage means it forgets its state, so this has to be refreshed regularly,
- also meaning you need a DRAM controller, more complexity (not something you'd DIY), and higher latency than some
- (...some latency is less of an issue when you have multiple chips)
- this or a variant is typical as main RAM, due to low cost per bit
SDRAM - Synchronous DRAM - is mostly a practical design consideration
- ...that of coordinating the DRAM via an external clock signal (previous DRAM was asynchronous, manipulating state as soon as lines changed)
- This allows the interface to that RAM to be a predictable state machine, which allows easier buffering, and easier interleaving of internal banks
- which makes higher data rates a bunch simpler (though not necessarily lower latency)
- SDR/DDR:
- DDR doubled busrate by widening the (minimum) units they read/write (double that of SDR), which they can do from single DRAM bank(verify)
- similarly, DDR2 is 4x larger units than SDR and DDR3 is 8x larger units than SDR
- DDR4 uses the same width as DDR3, instead doubling the bus rate by interleaving from banks
- unrelated to latency, it's just that the bus frequency also increased over time.
SRAM - Static RAM
- Has a higher component count per cell (6 transistors) than e.g. DRAM
- Retains state as long as power is applied to the chip, no need for refresh, also making it a little lower-latency
- no external controller, so simpler to use
- the higher component count per cell makes it more expensive per storage size
- e.g used in caches, due to speed, and acceptable cost for lower amounts
PSRAM - PseudoStatic RAM
- A design tradeoff, somewhere between SRAM and DRAM
- its like DRAM with built-in refresh, so functionally it's as "don't think about it" as SRAM
- (yes, DRAM technically can have built-in refresh, but that's often points a sleep mode that retains state without requiring an active DRAM controller, not something for active use)
- it's slower than DRAM, and cheaper than SRAM
- its like DRAM with built-in refresh, so functionally it's as "don't think about it" as SRAM
- SRAM makes sense for internal RAM, PSRAM makes sense for extended RAM in situations DRAM is not necessary
Non-volatile RAM
While the concept of Random Access Memory (RAM) only tells you that you can access any part of it with comparable ease (contasted with e.g. tape storage, where more distance meant more time, so more storage meant more time)...
...we tend to think about RAM as volatile, only useful as an intermediate scratchpad between storage and use, and will lose its contents as soon as it is unpowered. Probably because the commonly chosen designs have that property.
Yet there are various designs that are both easily accessible and keep their state.
And there is a gliding scale of various properties in that area as well.
We may well call it NVM (non-volatile memory) when we haven't yet gotten to some more specific properties,
like how often we may read or write, or how difficult that is.
Say, some variants of EEPROM aren't the easiest to deal with. We like Flash more, even though it's basically a development from EEPROM. But both wear out.
NVRAM on the other hand tends to be easier, more reisable, like FRAM, MRAM, and PRAM, or nvSRAM or even BBSRAM.
nvSRAM - SRAM and EEPROM stuck on the same chip.
- seems intended as a practical improvement on BBSRAM
- and/or a "access common stuff quickly, occasionally write a chunk to EEPROM" style of data logging, black boxes, that sort of thing
- https://en.wikipedia.org/wiki/NvSRAM
BBSRAM - Battery Backed SRAM
- basically just SRAM alongside a lithium battery, so that it'll live a good while
- is sort of cheating, but usefully so.
FRAM - Ferroelectric RAM
- functions more like flash, also limited in amount of use (but with many more cycles)
- read process is destructive (like e.g DRAM), so you need a write-after-read to keep data around
- so it's great for things like round-robin logging (which would be pretty bad for Flash)
- https://electronics.stackexchange.com/questions/58297/whats-the-catch-with-fram
PRAM
DRAM stick types
ECC RAM ('Error correction code')
- can detect many (and correct some) hardware errors in RAM
- The rate of of bit-flips is low, but will happen. If your computations or data are very important to you, you want ECC rather than the regular, non-ECC type.
- See also:
Registered RAM (sometimes buffered RAM) basically places a buffer on the DRAM modules (register as in hardware register)
- offloads some electrical load from the main controller onto these buffers, making it easier to have designs more stably connect more individual memory sticks/chips.
- ...at a small latency hit
- typical in servers, because they can accept more sticks
- Must be supported by the memory controller, which means it is a motherboard design choice to go for registered RAM or not
- pricier (more electronics, fewer units sold)
- because of this correlation with server use, most registered RAM is specifically registered ECC RAM
- yet there is also unregistered ECC, and registered non-ECC, which can be good options on specific designs of simpler servers and beefy workstations.
- sometimes called RDIMM -- in the same context UDIMM is used to refer to unbuffered
- https://en.wikipedia.org/wiki/Registered_memory
FB-DIMM, Fully Buffered DIMM
- same intent as registered RAM - more stable sticks on one controller
- the buffer is now between stick and controller [1] rather than on the stick
- physically different pinout/notching
SO-DIMM (Small Outline DIMM)
- Physically more compact. Used in laptops, some networking hardware, some Mini-ITX
EPP and XMP (Enhanced Performance Profile, Extreme Memory Profiles)
- basically, one-click overclocking for RAM, by storing overclocked timing profiles
- so you can configure faster timings (and Vdimm and such) according to the modules, rather than your trial and error
- normally, memory timing is configured according to a table in the SPD, which are JEDEC-approved ratings and typically conservative.
- EPP and XMP basically means running them as fast as they could go (and typically higher voltage)
In any case, the type of memory must be supported by the memory controller
- DDR2/3/4 - physically won't fit
- Note that while some controllers (e.g. those in CPUs) support two generations, a motherboard will typically have just one type of memory socket
- registered or not
- ECC or not
Historically, RAM controllers were a thing on the motherboard near the CPU, while there are now various cases where the controller is on the CPU.
More on...
DRAM versus SRAM
Separately, capacitors slowly leak charge anyway (related to closeby cells, related to their bulk addressing, and note that the higher the memory density, the smaller the capacitor so the sooner this all happens), so DRAM only makes sense with refresh: when there is something going through reading every cell and writing it back, just to keep the state over time.
The DRAM controller will refresh each DRAM row within (typically) 64ms, and there are order of thousands tens of thousands of them in a DRAM chip.
Yes, this means you randomly incur some extra latency.
Larger chips effectively have longer refresh overhead.
Each chip is slower-than-ideal, which can be made irrelevant by having the same amount of RAM in more chips on a memory stick. (Seems to also part of why servers often have more slots(verify))
It also means DRAM will require more power than most others, even when it's not being used.
With all these footnotes, DRAM seems clunky, so why use it?
Mainly because it's rather cheaper per bit (even with economy of scale in production), and as mentioned, you can alleviate the performance part fairly easily.
The first thing you'ld compare DRAM to is often SRAM (Static RAM), or some variant of it.
SRAM cells are more complex per bit, but don't need refresh, are fundamentally lower-latency than DRAM, and take less power when idle.
(with some variation; lower speed SRAM can be low power, whereas at high speeds and use power can can be comparable to DRAM)
The main downside is that due to their complexity, they are lower density, cost more silicon (and therefore money) per bit.
There are a lot of high-speed cases, or devices, where a little SRAM makes a lot of sense, like network switches, and also L1, L2, and L3 caches in your computer.
SRAM is electrically easier to access (also means you need less of a separate controller),
so simple microcontrollers may prefer it, also because it's easier to embed on the same IC.
Since SRAM uses noticeably more silicon than DRAM per cell, SRAM is often under a few hundred kilobyte - in part because you'ld probably use SRAM for important bits, alongside DRAM for bulkier storage.
Pseudostatic RAM (PSRAM, a.k.a. PSDRAM) are ICs that contains both DRAM and a controller, so has DRAM speeds, but are as easy to use as SRAM, and a price somewhere inbetween.
There are even variants that are basically DRAM with an SRAM cache in front
so that well controlled access patterns can be quite fast.
More DRAM notes:
For a few reasons (including that there are a lot of bits in the address, to save dozens of pins as well as silicon on internal demultiplexing), DRAM is typically laid out as a grid, and the address is essentially sent in two parts, the row and the column, sent one after the other.
This is what RAS and CAS are about - the first is a strobe that signals the row address can be used, the second that the column can be.
And, because capacitors are not instant, there needs to be some time between RAS and CAS, and between CAS and data coming out. This, and other details (e.g. precharge) are a property of the particular hardware, and should be adhered to to be used reliably.
Setting these parameters would be annoying, so on DDR DRAM sticks there is a small chip[2] that tells the BIOS you the timing options.
DRAM is also so dense that it has led to some electrical issues, e.g. the row hammer exploit.
Because you still spend quite bit of time on addressing, before the somewhat-faster readout of data,
a lot of DRAM systems do prefetch/burst (what you'd call readahead in disks).
That is, instead of fetching a cell, it fetches (burst_length*bus_width), with burst_length apparently linked to DDR type, but 64 bytes for DDR3 and DDR4. (also because that's a common CPU cache line size)
This is essentially a forced locality assumption, but it's relatively cheap and frequently useful.
"RAS Mode"
- lockstep mode, 1:1 ratio to DRAM clock
- more reliable
- independent channel mode, a.k.a. performance mode, 2:1 to DRAM clock
- more throughput
- also allows more total DIMMs (if your motherboard is populated with them)
- mirror - seems to actually refer to memory mirroring.
Note this is about the channels, not the DRAM.
In PCs, the evolution from SDR SDRAM to DDR SDRAM to DDR2 SDRAM to DDR3 SDRAM is a fairly simple one.
SDR ():
- single pumped (one transfer per clocktick)
- 64-bit bus
- speed is 8 bytes per transfer * memory bus rate
DDR (1998):
- double pumped (two transfers per clocktick, using both the rising and falling edge)
- 64-bit bus
- speed is 8 bytes per transfer * 2 * memory bus rate
DDR2 (~2003):
- double pumped
- 64-bit bus (verify)
- effective bus to memory is clocked at twice the memory speed
- No latency reduction over DDR (at the same speed) (verify)
- speed is 8 bytes per transfer * 2 * 2 * memory bus rate
DDR3 (~2007):
- double pumped
- 64-bit bus (verify)
- effective bus to memory is clocked at four times the memory speed
- No latency reduction over DDR2 (at the same speed) (verify)
- speed is 8 bytes per transfer * 2 * 4 * memory bus rate
DDR4 (~2014)
- double pumped
DDR5 (~2020)
Each generation also lowers voltage and thereby power (per byte).
(Note: Quad pumping exists, but is only really used in CPUs)
The point of clocking the memory bus higher than the speed of individual memory cells
is that as long as you are accessing data from two distinctly accessed cells,
you can send both on the faster external (memory) bus. (verify)
It won't be twice, but depending on access patterns might sometimes get close(verify).
Dual channel memory is different yet - it refers to using an additional 64-bit bus to memory in addition to the first 64 bits, so that you can theorhetically transfer at twice the speed.
The effect this has on everyday usage depends a lot on what that use is, though.
It seems that even the average case is not too noticeable improvement.
so four bits of data can be transferred per memory cell cycle. Thus, without changing the memory cells themselves, DDR2 can effectively operate at twice the data rate of DDR.
Within a type (within SDR, within DDR, within DDR2, etc.), the different speeds do not point to different design. Like with CPUs, it just means that the memory will work under that speed. Cheap memory may fail if clocked even just a littl higher, while much more tolerant memory also exists, which is interesting for overclockers.
Note that the bus speed a particular piece of memory will work under depends on how c
Transfers per clocktick:
- 1 for SDR/basic SDRAM
- 2 for DDR SDRAM
- 4 for DDR2 SDRAM
- 8 for DDR3 SDRAM
- DDR4
- DDR5
SDRAM was available as:
66MHz 100MHZ 133 MHz
DDR used to have some commonly used aliases, e.g.
alias standard name speed PC-1600 DDR-200 100MHz, 200Mtransfers/s, peak 1.6GB/s PC-2100 DDR-266 133MHz, 266Mtransfers/s, peak 2.1GB/s PC-2700 DDR-333 166MHz, 333Mtransfers/s, peak 2.7GB/s PC-3200 DDR-400 200MHz, 400Mtransfers/s, peak 3.2GB/s
As of right now (late 2008), DDR3 is not it money/performancewise, but DDR2 is interesting over DDR.
PC2-3200 DDR2-400 100MHz, 400Mtransfers/s, peak 3.2GB/s PC2-4200 DDR2-533 133MHz, 533Mtransfers/s, peak 4.2GB/s PC2-5400 DDR2-667 166MHz, 667Mtransfers/s, peak 5.4GB/s PC2-6400 DDR2-800 200MHz, 800Mtransfers/s, peak 6.4GB/s
-->
ECC
Buffered/registered RAM
EPROM, EEPROM, and variants
PROM is Programmable ROM
- can be written exactly once
EPROM is Erasable Programmable ROM.
- often implies UV-EEPROM, erased with UV shone through a quartz window. You would e.g. tape that over to avoid it corrupting later.
EEPROM's extra E means Electrically Erasable
- meaning it's now a command, and not a window.
- early EEPROM read, wrote, and erased (verify) a single byte at a time. Modern EEPROM can work in larger chunks.
- you only get a limited amount of erases (much like Flash. Flash is arguably just an evolution of EEPROM)