On computer memory

From Helpful
Jump to navigation Jump to search

CPU cache notes

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

CPU caches put a little faster-but-costlier SRAM (or similar) between CPU (registers are even faster) and main RAM (slowish, often DRAM).


CPU caches mirror fragments in main RAM, and remember where it came from. Whenever accesses towards main RAM can be served from cache, they are served faster

Today [Computer_/_Speed_notes order of 1 to 10ns instead of order of 100ns], but the idea has been worth implementing in CPUs since they ran at a dozen MHz or so(verify).


These caches are entirely transparent, in that a user or even programmer should not have to care about how it does its thing, and you could completely ignore their presence, and arguably shouldn't be able to control what it does at all.


As a programmer, you may like a general idea of how they work, because designing for caches in general can help speed for longer.

Optimizing for specific CPU's cache constructions, while possible, is often often barely worth it, and may even prove counterproductive for other CPUs, or even the same brand's a few years later. If you remember just one thing, 'small data is a little likelier to stay in cache', and even that is less true if there are a lot of programs vying for CPU time.


It can also give slightly better spatial locality for individual programs.

Other things, like branch locality can help, but is largely up to the compiler.

A few things, like that arrays have sequential locality that e.g. trees do not, are more down to algorithm choice, but usually out of your hands.

And, in high level reflective OO style languages, you may have little control anyway.


Avoiding caches getting flushed more then necessary help, as can avoiding cache contention - so it helps to know what that is and why it happens. And see when.


On virtual memory

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Virtual memory ended up doing a number of different things, which for the most part can be explained separately.


Intro

Overcommitting RAM with disk: Swapping / paging; trashing

Page faults

See also

Swappiness

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Practical notes

Linux

"How large should my page/swap space be?"

On memory scarcity

oom_kill

oom_kill is linux kernel code that starts killing processes when there is enough memory scarcity that memory allocations cannot happen within reasonable time - as this is good indication that it's gotten to the point that we are trashing.


Killing processes sounds like a poor solution.

But consider that an OS can deal with completely running out of memory in roughly three ways:

  • deny all memory allocations until the scarcity stops.
This isn't very useful because
it will affect every program until scarcity stops
if the cause is one flaky program - and it usually is just one - then the scarcity may not stop
programs that do not actually check every memory allocation will probably crash.
programs that do such checks well may have no option but to stop completely (maybe pause)
So in the best case, random applications will stop doing useful things - probably crash, and in the worst case your system will crash.
  • delay memory allocations until they can be satisfied
This isn't very useful because
this pauses all programs that need memory (they cannot be scheduled until we can give them the memory they ask for) until scarcity stops
again, there is often no reason for this scarcity to stop
so typically means a large-scale system freeze (indistinguishable from a system crash in the practical sense of "it doesn't actually do anything")
  • killing the misbehaving application to end the memory scarcity.
This makes a bunch of assumptions that have to be true -- but it lets the system recover
assumes there is a single misbehaving process (not always true, e.g. two programs allocating most of RAM would be fine individually, and needs an admin to configure them better)
...usually the process with the most allocated memory, though oom_kill logic tries to be smarter than that.
assumes that the system has had enough memory for normal operation up to now, and that there is probably one haywire process (misbehaving or misconfigured, e.g. (pre-)allocates more memory than you have)
this could misfire on badly configured systems (e.g. multiple daemons all configured to use all RAM, or having no swap, leaving nothing to catch incidental variation)


Keep in mind that

  • oom_kill is sort of a worst-case fallback
generally
if you feel the need to rely on the OOM, don't.
if you feel the wish to overcommit, don't
oom_kill is meant to deal with pathological cases of misbehaviour
but even then might pick some random daemon rather than the real offender, because in some cases the real offender is hard to define
Tweak likely offenders, tweak your system.
note that you can isolate likely offenders via cgroups now.
and apparently oom_kill is now cgroups-aware
  • oom_kill does not always save you.
It seems that if your system is trashing heavily already, it may not be able to act fast enough.
(and possibly go overboard once things do catch up)
  • You may wish to disable oom_kill when you are developing
...or at least equate an oom_kill in your logs as a fatal bug in the software that caused it.
  • If you don't have oom_kill, you may still be able to get reboot instead, by setting the following sysctls:
vm.panic_on_oom=1

and a nonzero kernel.panic (seconds to show the message before rebooting)

kernel.panic=10


See also



Glossary

On memory fragmentation

Fragmentation in general

Slab allocation

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


The slab allocator does caches of fixed-size objects.

Slab allocation is often used in kernel modules/drivers that are perfectly fine to allocate only uniform-sized and potentially short-lived structures - think task structures, filesystem internals, network buffers.

Fixed size, and often separated for each specific type, makes it easier to write an allocator that guarantees allocation within very small timeframe (by avoiding "hey let me look at RAM and all the allocations currently in there" - you can keep track of slots being taken or not with a simple bitmask, and it cannot fragment).

There may also be arbitrary allocation not for specific data structures but for fixed sizes like 4K, 8K, 32K, 64K, 128K, etc, used for things that have known bounds but not precise sizes, for similar lower-time-overhead allocation at the cost of some wasted RAM.


Upsides:

Each such cache is easy to handle
avoids fragmentation because all holes are of the same size,
that the otherwise-typical buddy system still has
making slab allocation/free simpler, and thereby a little faster
easier to fit them to hardware caches better

Limits:

It still deals with the page allocator under the cover, so deallocation patterns can still mean that pages for the same cache become sparsely filled - which wastes space.


SLAB, SLOB, SLUB:

  • SLOB: K&R allocator (1991-1999), aims to allocate as compactly as possible. But fragments faster than various others.
  • SLAB: Solaris type allocator (1999-2008), as cache-friendly as possible.
  • SLUB: Unqueued allocator (2008-today): Execution-time friendly, not always as cache friendly, does defragmentation (mostly just of pages with few objects)


For some indication of what's happening, look at slabtop and slabinfo

See also:


There are some similar higher-level allocators "I will handle things of the same type" allocation, from some custom allocators in C, to object allocators in certain languages, arguably even just the implementation of certain data structures.

Memory mapped IO and files

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Note that

memory mapped IO is a hardware-level construction, while
memory mapped files are a software construction (...because files are).


Memory mapped files

Memory mapping of files is a technique (OS feature, system call) that pretends a file is accessible at some address in memory.

When the process accesses those memory locations, the OS will scramble for the actual contents from disk.

Whether this will then be cached depends a little on the OS and details(verify).


For caching

In e.g. linux you get that interaction with the page cache, and the data is and stays cached as long as there is RAM for it.


This can also save memory - in that without memory mapping, compared to the easy choice of manually cacheing the entire thing in your process.

With mmap you may cache only the parts you use, and if multiple processes want this file, you may avoid a little duplication.


The fact that the OS can flush most or all of this data can be seen as a limitation or a feature - it's not always predictable, but it does mean you can deal with large data sets without having to think about very large allocations, and how those aren't nice to other apps.


shared memory via memory mapped files

Most kernel implementations allow multiple processes to mmap the same file -- which effectively shares memory, and probably one of the simplest in a protected mode system. (Some methods of Inter-Process communication work via mmapping)


Not clobbering each other's memory is still something you need to do yourself.

The implementation, limitations, and method of use varies per OS / kernel.

Often relies on demand paging to work.

Memory mapped IO

Map devices into memory space (statically or dynamically), meaning that memory accesses to those areas are actually backed by IO accesses (...that you can typically also do directly).

This mapping is made and resolved at hardware-level thing, and only works for DMA-capable devices (which is many).

It seems to often be done to have a simple generic interface (verify) - it means drivers and software can avoid many hardware-specific details.


See also:

DMA

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Direct Memory Access comes down to additional hardware that can be programmed to copy bytes from one memory address to another, meaning the CPU doesn't have to do this.

DMA is independent enough at hardware level that its transfers can work at high clocks and throughputs (and without interrupting other work), comparable to CPU copies (CPU may be faster if it was otherwise idle. When CPU is not idle the extra context switching may slow things down and DMA may be relatively free. Details vary with specific designs, though).


They tend to work in smaller chunks, triggered by DRQ (similar in concept to IRQs, but triggering only a smallish copy, rather than arbitrary code), so that it can be coordinated as small chunks.

The details look intimidating at first, but mostly because they are low-level. The idea is actually relatively simple.


Aside from memory-to-memory use, it also allows memory-to-peripheral copies (if a specific supporting device is memory mapped(verify)).




Memory limits on 32-bit and 64-bit machines

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


tl;dr:

  • If you want to use significantly more than 4GB of RAM, you want a 64-bit OS.
  • ...and since that is now typical, most of the details below are irrelevant


TODO: the distinction between (effects from) physical and virtual memory addressing should be made clearer.


Overall factoids

OS-level and hardware-level details:

From the I want my processes to map as much as possible angle:

  • the amount of memory a single process could hope to map is typically limited by its pointer size, so ~4GB on 32-bit OS, 64-bit (lots) on a 64-bit OS.
Technically this could be entirely about the OS, but in reality this tied intimately to what the hardware natively does, because anything else would be slooow.
  • Most OS kernels have a split (for their own ease) that means that of the area a program can map, less is allocatable - to perhaps 3GB, 2GB sometimes even 1GB
this is partly a pragmatic implementation detail from back when 32 megabytes was a lot of memory, and leftover ever since


  • Since the OS is in charge of virtual memory, it can map each process to memory separately, so in theory you can host multiple 32-bit processes to together use more than 4GB
...even on 32-bit OSes: you can for example compile the 32-bit linux kernel to use up to 64GB this way
a 32-bit OS can only do this through PAE, which has to be supported and enabled in motherboard, and supported and enabled in the OS.
Note: both 32-bit and 64-bit PAE-supporting motherboards may have somewhat strange limitations, e.g. the amount of memory they will actually allow/support (mostly a problem in early PAE motherboards)
and PAE was problematic anyway - it's a nasty hack in nature, and e.g. drivers had to support it. It was eventually disabled in consumer windows (XP) for this reason. In the end it was mostly seen in servers, where the details were easier to oversee.


  • device memory maps would take mappable memory away from within each process, which for 32-bit OSes would often mean that you couldn't use all of that installed 4GB



On 32-bit systems:

Process-level details:

  • No single 32-bit process can ever map more than 4GB as addresses are 32-bit byte-addressing things.
  • A process's address space has reserved parts, to map things like shared libraries, which means a single app can actually allocate less (often by at most a few hundred MBs) than what it can map(verify). Usually no more than ~3GB can be allocated, sometimes less.


On 64-bit systems:

  • none of the potentially annoying limitations that 32-bit systems have apply
(assuming you are using a 64-bit OS, and not a 32-bit OS on a 64-bit system).
  • The architecture lets you map 64-bit addresses
...in theory, anyway. The instruction set is set up for 64 bit everything, but the current x86-64 CPU implementation's address lines are 48-bit (for 256TiB), mainly because we can increase that later without breaking compatibility, and right now it saves copper and silicon 99% of computers won't use
...because in practice it's still more than you can currently physically put in most systems. (there are a few supercomputers for which this matters, but arguably even there it's not so important because horizontal scaling is generally more useful than vertical scaling. But there are also a few architectures designed with a larger-than-64-bit addressing space)


On both 32-bit (PAE) and 64-bit systems:

  • Your motherboard may have assumptions/limitations that impose some lower limits than the theoretical one.
  • Some OSes may artificially impose limits (particularly the more basic versions of Vista seem to do this(verify))


Windows-specific limitations:

  • 32-bit Windows XP (since SP2) gives you no PAE memory benefits. You may still be using the PAE version of the kernel if you have DEP enabled (no-execute page protection) since that requires PAE to work(verify), but PAE's memory upsides are disabled (to avoid problems with certain buggy PAE-unaware drivers, possibly for other reasons)
  • 64-bit Windows XP: ?
  • /3GB switch moves the user/kernel split, but a single process to map more than 2GB must be 3GB aware
  • Vista: different versions have memory limits that seem to be purely artificial (8GB, 16GB, 32GB, etc.) (almost certainly out of market segregation)

Longer story / more background information

A 32-bit machine implies memory addresses are 32-bit, as is the memory address bus to go along. It's more complex, but the net effect is still that you can ask for 2^32 bytes of memory at byte resolution, so technically allows you to access up to 4GB.


The 'but' you hear coming is that 4GB of address space doesn't mean 4GB of memory use.


The device hole (32-bit setup)

One of the reasons the limit actually lies lower is devices. The top of the 4GB memory space (usually directly under the 4GB position) is used to map devices.

If you have close to 4GB of memory, this means part of your memory is still not addressible by the CPU, and effectively missing. The size of this hole depends on the actual devices, chipset, BIOS configuration, and more(verify).


The BIOS settles the memory address map(verify), and you can inspect the effective map (Device Manager in windows, /proc/iomem in linux) in case you want to know whether it's hardware actively using the space (The hungriest devices tend to be video cards - at the time having two 768MB nVidia 8800s in SLI was one of the worst cases) or whether your motherboard just doesn't support more than, say, 3GB at all. Both these things can be the reason some people report seeing as little as 2.5GB out of 4GB you plugged in.


This problem goes away once you run a 64-bit OS on a 64-bit processor -- though there were some earlier motherboards that still had old-style addressing leftovers and hence some issues.


Note that the subset of these issues caused purely by limited address space on 32-bit systems could also be alleviated, using PAE:

PAE

It is very typical to use virtual memory systems. While the prime upside is probably the isolation of memory, the fact that a memory map is kept for each process also means that on 32-bit, each application has its own 4GB memory map without interfering with anything else (virtual mapping practice allowing).

Which means that while each process could use 4GB at the very best, if the OS could see more memory, it might map distinct 4GBs to each process so that collectively you can use more than 4GB (or just your full 4GB even with device holes).


Physical Address Extension is a memory mapping extension (not a hack, as some people think) that does roughly that. PAE needs specific OS support, but doesn't need to break the 32-bit model as applications see it.

It allowed mapping 32-bit virtual memory into the 36 bit hardware address space, which allows for 64GB (though most motherboards had a lower limit)


PAE implies some extra work on each memory operation, but because there's hardware support it only kicked a few percent off memory access speed.


All newish linux and windows version support PAE, at least technically. However:

  • The CPU isn't the only thing that accesses memory. Although many descriptions I've read seem kludgey, I easily believe that any device driver that does DMA and is not aware of PAE may break things -- such drivers are broken in that they are not PAE-aware - they do not know the 64-bit pointers that are used internally used should be limited to 36-bit use.
  • PAE was disabled in WinXP's SP2 to increase stability related to such issues, while server windowses are less likely to have problems since they use tend to use more standard hardware and thereby drivers.

Kernel/user split

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

The kernel/user split, specific to 32-bit OSes, refers to an OS-enforced formalism splitting the mappable process space between kernel and each process.


It looks like windows by default gives 2GB to both, while (modern) linuces apparently split into 1GB kernel, 3GB application by default (which is apparently rather tight on AGP and a few other things).

(Note: '3GB for apps' means that any single process is limited to map 3GB. Multiple processes may sum up to whatever space you have free.)


In practice you may want to shift the split, particularly in Windows since almost everything that would want >2GB memory runs in user space - mostly databases. The exception is Terminal Services (Remote Desktop), that seems to be kernel space.

It seems that:

  • linuxes tend to allow 1/3, 2/2 and 3/1,
  • BSDs allow the split to be set to whatever you want(verify).
  • It seems(verify) windows can only shift its default 2/2 to the split to 1GB kernel, 3GB application, using the /3GB boot option (the feature is somewhat confusingly called 4GT), but it seems that windows applications are normally compiled with the 2/2 assumption and will not be helped unless coded to. Exceptions seem to primarily include database servers.
  • You may be able to work around it with a 4G/4G split patch, combined with PAE - with some overhead.

See also



Some understanding of memory hardware

"What Every Programmer Should Know About Memory" is a good overview of memory architectures, RAM types, reasons bandwidth and access speeds vary.


RAM types

DRAM - Dynamic RAM

lower component count per cell than most (transistor+capacitor mainly), so high-density and cheaper
yet capacitor leakage means this has to be refreshed regularly, meaning a DRAM controller, more complexity and higher latency than some
(...which can be alleviated and is less of an issue when you have multiple chips)
this or a variant is typical as main RAM, due to low cost per bit


SDRAM - Synchronous DRAM - is mostly a practical design consideration

...that of coordinating the DRAM via an external clock signal (previous DRAM was asynchronous, manipulating state as soon as lines changed)
This allows the interface to that RAM to be a predictable state machine, which allows easier buffering, and easier interleaving of internal banks
...and thereby higher data rates (though not necessarily lower latency)
SDR/DDR:
DDR doubled busrate by widening the (minimum) units they read/write (double that of SDR), which they can do from single DRAM bank(verify)
similarly, DDR2 is 4x larger units than SDR and DDR3 is 8x larger units than SDR
DDR4 uses the same width as DDR3, instead doubling the busrate by interleaving from banks
unrelated to latency, it's just that the bus frequency also increased over time.


Graphics RAM refers to varied specialized

Earlier versions would e.g. allow reads and writes (almost) in parallel, making for lower-latency framebuffers
"GDDR" is a somwhat specialized form of DDR SDRAM



SRAM - Static RAM

Has a higher component count per cell (6 transistors) than e.g. DRAM
Retains state as long as power is applied to the chip, no need for refresh, also making it a little lower-latency
no external controller, so simpler to use
e.g used in caches, due to speed, and acceptable cost for lower amounts


PSRAM - PseudoStatic RAM

A tradeoff somewhere between SRAM and DRAM
in that it's DRAM with built-in refresh, so functionally it's as standalone as SRAM and slower but you can have a bunch more of it for the same price - e.g. SRAM tends to
(yes, DRAM can have built-in refresh, but that's often points a sleep mode that retains state without requiring an active DRAM controller)




Memory stick types

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


ECC RAM

can detect many (and correct some) hardware errors in RAM
The rate of of bit-flips is low, but will happen. If your computations or data are very important to you, you want ECC.
See also:
http://en.wikipedia.org/wiki/ECC_memory
DRAM Errors in the Wild: A Large-Scale Field Study


Registered RAM (sometimes buffered RAM) basically places a buffer on the DRAM modules (register as in hardware register)

offloads some electrical load from the main controller onto these buffers, making it easier to have designs more stably connect more individual memory sticks/chips.
...at a small latency hit
typical in servers, because they can accept more sticks
Must be supported by the memory controller, which means it is a motherboard design choice to go for registered RAM or not
pricier (more electronics, fewer units sold)
because of this correlation with server use, most registered RAM is specifically registered ECC RAM
yet there is also unregistered ECC, and registered non-ECC, which can be good options on specific designs of simpler servers and beefy workstations.
sometimes called RDIMM -- in the same context UDIMM is used to refer to unbuffered
https://en.wikipedia.org/wiki/Registered_memory

FB-DIMM, Fully Buffered DIMM

same intent as registered RAM - more stable sticks on one controller
the buffer is now between stick and controller [1] rather than on the stick
physically different pinout/notching


SO-DIMM (Small Outline DIMM)

Physically more compact. Used in laptops, some networking hardware, some Mini-ITX


EPP and XMP (Enhanced Performance Profile, Extreme Memory Profiles)

basically, one-click overclocking for RAM, by storing overclocked timing profiles
so you can configure faster timings (and Vdimm and such) according to the modules, rather than your trial and error
normally, memory timing is configured according to a table in the SPD, which are JEDEC-approved ratings and typically conservative.
EPP and XMP basically means running them as fast as they could go (and typically higher voltage)



On pin count

SO-DIMM tends to have a different pin count
e.g. DDR3 has 240 pins, DDR3 SO-DIMM has 204
e.g. DDR4 has 288 pins, DDR4 SO-DIMM has 260
Registered RAM has the same pin count
ECC RAM has the same pin count


In any case, the type of memory must be supported by the memory controller

DDR2/3/4 - physically won't fit
Note that while some controllers (e.g. those in CPUs) support two generations, a motherboard will typically have just one type of memory socket
registered or not
ECC or not

Historically, RAM controllers were a thing on the motherboard near the CPU, while there are now various cases where the controller is on the CPU.

More on DRAM versus SRAM

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


On ECC

Buffered/registered RAM

EPROM, EEPROM, and variants

PROM is Programmable ROM

can be written exactly once

EPROM is Erasable Programmable ROM.

often implies UV-EEPROM, erased with UV shone through a quartz window.

EEPROM's extra E means Electrically Eresable

meaning it's now a command.
early EEPROM read, wrote, and erased (verify) a single byte at a time. Modern EEPROM can work in alrger chunks.
you only get a limited amount of erases (much like Flash. Flash is arguably just an evolution of EEPROM)


Flash memory (intro)

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


PRAM

Flash memory

Memory card types

📃 These are primarily notes, intended to be a collection of useful fragments, that will probably never be complete in any sense.


For different kinds of memory cards, see Common plugs and connectors#Memory_cards


Secure Digital (SD, miniSD, microSD), and MMC details

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Capacity types / families

  • SD (now named SDSC, 'standard capacity', to distinguish it)
    • size (somewhat artificially) limited to 1-4GB
  • SDHC (high capacity), since approx 2006
    • physically identical but conforming to a new standard that allows for higher capacity and speed.
    • adressing limited to 32GB
  • SDXC (eXtended-Capacity), since approx 2009
    • successor to SDHC that allows for higher capacity and speed
    • UHS was introduced since roughly then. Note that only some cards use UHS.
    • adressing limited to 2TB
  • Ultra-Capacity (SDUC), since approx 2018
    • limited to 128TB
  • SDIO
    • allows more arbitrary communication, basically a way to plug in specific accessories on supporting hosts - not really an arbitrarily usable bus for consumers
    • (supports devices like GPS, wired and wireless networking)



The above is partly about capacity, and partly about function. It's also not entirely aligned with SD versions, protocolwise it's even more interesting, particularly with the extra buses for the faster (UHS and Express) modes.

I think most people have list track of the details by now.




Electrically

Power is 3.3V, though there are some lower-voltage details - in particular the LVDS being lower voltage (1.8V(verify)).

(MMC had 7 pins)

SD has 9 pins before up until UHS-II

SD with UHS-II adds 8 pins for a total of 17 pins

two more LVDS pairs, and more power and ground


MicroSD has 8 pins

MicroSD with UHS-II has 17 pins



Protocol (and DIY options)

Since there are a a few card types and more transfer modes over the years, supporting all the possible things a card can do is fairly involved.

Even detecting what kind of card is there is interesting. You'ld think this is part of negotiation, but for historical reasons you need some fallback logic even in the initialisation commands.


Since you're talking to the flash controller, there is a minimal mode, namely to start talking SPI.

There are a handful of protocol variations, that are basically negotiated from the most basic one. Any fancy device will want to do that for speed, but for DIY that choice of SPI is much simpler. (note there are some recent cards where SPI mode is optional, though(verify))

In SPI mode the pins are mostly just SPI's MOSI, MISO, SCLK, select, and ground, and Vcc.

Code-wise, you'll want to find a library. If you don't, you'll probably end up writing much of one anyway.


SD Speed rating

Actual performance


There are two gotchas to speed ratings:

  • due to the nature of flash, it will read faster than it will write.
how much faster/slower depends, but it's easily a factor 2
if marketers can get away with it, they will specify the read speed
note that the differences vary, due to differences in controlles. E.g. external card readers tend to be cheap shit, though there are some examples of slow
  • writes can be faster in short bursts.
because you're actually talking to a storage controller, managing the flash
You usually care about sustained average write instead
And sometimes about the guaranteed speed, i.e. the minimum per second-or-so


This tells us it's a UHS-I card, it's video class 10, UHS class 1, and speed class 10

Marking-wise

  • Speed class - looks like a circle with a number in it - one of 2, 4, 6, or 10
(and a class 0, which doesn't specify performance so is meaningless)
tyhat figure is which is MB/s
apparently this was intended as a minimum sustained write speed, but practice proves not everyone keeps to this, so if specs look optimistic, they probably are. It seems to vary with honesty, so a good Class 6 card may well perform better than a bad Class 10 one.
there is no larger-than-10, no matter how much fast it actually is
those details (and the fact that these days most SD can sustain 10MB/s) means this class system is no longer informative
  • Video speed class - V with a number, V6, V10, V30, V60, or V 90.
again, it's just MB/s
These were introduced because realtime-not-too-compressed HD video tends to want
perhaps 10MByte/s for 1080p
perhaps 30MByte/s for FHD
perhaps 60MByte/s for 4k
perhaps 90MByte/s for 8k
These are apparently required to be sustained speeds(verify)
  • UHS speed class - looks like a U with a number in it (1, 2, or 3)
1 is 10MB/s
3 is 30MB/s
...so UHS speed class has very little to do with UHS version
  • Packaging may try to stunt with speed, but tend to say "up to" (possibly in tiny print)
For example, I have a card that says writes up to 80MB/s and reads up to 170MB/s, yet all the logos on it suggest it can't guarantee sustaining more than 30MB/s. Curious...
so assume this is marketing bullshit in general


For video, you probably want 10MB/s for standard definition, 30MB/s for 1080p, 60MB/s for 4K, and 90MB/s for 8K




💤

Bus speed:

Bus speed is how much the wiring can carry data. Note this says nothing about whether the card actually will, so this is mostly unimportant

Classically there's

  • Standard
12MB/s max
  • High-speed - clocks SDSC and SDHC at double the rate
25MB/s max


  • UHS-I
Introduced in version 3.01 (~2010, basically around when SDXC was introduced)
for SDHC and SDXC
one LDVS pair on the same row, bus speed specced at ~100MB/s max
  • UHS-II
Introduced in version 4.0 (~2011)
for SDHC and SDXC?
an extra row of eight pins: two extra LDVS pairs, and more power and ground)
bus speed specced at ~300MB/s max
  • UHS-III is only part of SDUC(verify)
introduced in version 6.0
bus speed specced at ~600MB/s max
Also introduced "Video Speed Class" rating
Express, which are primarily about extra speed --
  • SD Express (introduced in version 7.0)
bus speed specced at ~900MB/s max

I'm still confused about how SD Express and UHS-III relate


Latency

On fake flash

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Fake flash refers to a scam where cards's controller reports a larger size than the actual storage has.


These seem to come in roughly two variants:

addressing storage that isn't there will fail,
or it will wrap back on itself and write in existing area.

...which isn't an important distinction, in that the result is just that it appears to be broken. It will seem to work for a little while, and in both cases it will corrupt later.


There are some tools to detect fake flash. You can e.g. read out what flash memory chips are in there and whether that adds up. Scammers don't go so far to fake this.

But the more thorough check is a write-and-verify test, see below.

Memory card health

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

While memory cards and USB sticks are flash memory much like SSDs, most lack all wear leveling and health introspection.

So you should assume that they will fail completely, without warning.


At lower level, even if they are still completely readable (and that's not a guarantee), filesystems are not made to deal with write failure, so you may need special tools and/or a technical friend for recovery.


You can check whether it's still readable (non-destructive) with test consisting of "read all disk surface" (for chkdsk it's the 'scan for an attempt recovery of bad sectors" checkbox)

The only real test of whether it's fully writable is to write to all of it (necessarily destructive). But this only proves that it hasn't failed already, not that it won't soon.


One useful tool is H2testw, which creates a file in free space (if empty, then almost all the SD card)

It will also tell you actual average write and read speed, not the potential lie on the front.

And implicitly be a fake flash test.

What's stored

History

Core memory