Computer data storage

Failure, error, and how to deal (concepts)
Noticing errors and failure
- Reading SMART reports
Partitioning and filesystems
- ZFS notes
Network storage
RAID notes
- mdadm notes, aacraid notes, OMSA notes, LSI notes
General & RAID performance tweaking
SSD notes
LVM notes
Some glossary
Semi-sorted

📃 These are primarily notes, intended to be a collection of useful fragments, that will probably never be complete in any sense.

Interconnection speed comparison

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

The below ought to be real-world speeds of the "sequential-write figures I've seen personally in decent hardware, or seen in halfway convincing benchmarks" sort.

At the higher end, this means some serious RAID (probably RAID10). It also varies a lot with its setup details and hardware quality, so above ~300MB/s, take the speed figures with a grain of salt.

Connector name	interface speed	data speed, in theory	real-world speed	Notes
Firewire-400	50MByte/s (400Mbit)		~40MB/s(verify)
Firewire-800	100MByte/s (800Mbit)		~85MB/s(verify)
USB2	~60MByte/s (480Mbps)	48 MB/s (because of 10b/8b)	~35MByte/s (but varies)	Relies on CPU more than most others. Which is probably the reason for that ~35MB/s, which also varies between computers and implementations (often 25MB/s is ). Expect USB latency to be higher than FW, eSATA, largely from the protocol wrapping.
USB3	~625MByte/s (5Gbps)	500 MB/s (because of 10b/8b)	early devices slower than eSATA; may not easily cross ~150MB/s(verify) (and I haven't tested RAID boxes with USB3)	may prove to be quite comparable to eSATA speeds. With slightly higher latency(verify)
eSATA	~350MByte/s (3Gbit)		250MByte/s typically limited by drive speed	I've seen 250MByte/s from a decent-quality consumer RAID box, and the same box doing 150MByte/s on a cheap controller. Speed can vary a lot with hardware on both sides
Gigabit Ethernet	1GBit/s	~120MByte/s	~105MByte/s	Varying a bunch with overhead implied by protocol and type of copy; 80 or 90 may be more common
10-gig Ethernet	10GBit/s	~900MByte/s	...so rarely the bottleneck
Thunderbolt	1.2GByte/s (10GBbps)		~500MB/s(verify)	(which as of this writing is a fairly arbitrary figure, because there are few test cases)
SAS	1.5GByte/s (12GBps from 4-channel 3Gbit)		~1200MB/s	Higher speed SAS exists, but you start running into the computer's various buses.
Fibre Channel

Further notes:

USB depends on doing more work in in the CPU (compared to most others in the list), which can mean more speed variation and higher latency

Most of these are sustained write or susttained read tests. For some purposes, access time overhead may prove equally or more important.

Some of these technologies can do the speed bidirectionally, some not. This often doesn't matter much for storage, though.

Drive wear - platter drives

For platter drives, the amount of work that it's doing has some effect on its lifetime, though apparently fairly little.

It seems that on a drive that isn't pushing the limits on magnetic reliability (a design thing), the components likely to wear first are the spindle and head assembly bearings(verify). When the drive is spun down, or even off, those don't wear.

See spindown

Unsorted

Microsoft's NFI.exe utility can tell you which file is backed by a particular sector. This can be handy to see which files in a rescued image are damaged.

To soft-reset a port:

echo - - - > /sys/class/scsi_host/hostN/scan

Is a full hard drive slower than an empty one?

diagnosis and recovery

Trouble check

Getting data from the drive

Recovering soft errors

Some utility notes

dd, ddrescue and such

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Recovery

dd takes all readable data off a drive, block for block, regardless of partitions or whatever other structure.

In a pinch you may want to use something like:

dd if=/dev/sda of=drive.img conv=noerror,sync bs=64k

noerror means 'skip problematic areas', sync tells it to write zeroes when it can't read (instead of omitting data, which would easily make it non-mountable because the filesystems aren't in the excpected place).

I'm assuming it skips bs-sized blocks on errors. If so, you may want smallish block sizes, even if that's slower than larger block sizes.

There are better utilities for data recovery.

Many seem to prefer GNU ddrescue over other CLI tools, as it's cleverer than most: It tries to read the error-free areas quickly, then goes on to the problem areas, and tries to minimize the amount of missing data by trying to read in increasingly smaller parts.

(note: myrescue is similar, but doesn't have as many features. Also, dd_rescue, with an underscore, is probably not what you want)

When using ddrescue:

you probably always want to use a logfile, which keeps track of what it has and hasn't tried/succeeded at, which means you repeatedly run the command and again and it'll only try the parts it had not successfully copied.

ddrescue /dev/sda /mnt/bigdisk/rescueimage /mnt/bigdisk/rescueimage.log

You also may want to look at:

-d / --direct: Disable readahead ('use direct disk access', bypassing the kernel's readahead).

Without this, reads of good sectors just before a bad sector will probably be much slower, because readahead already gets into the bad sectors (and holds up the drive)

some suggest -d is a good option for a second now-try-the-harder-areas pass

but arguably it's useful to specify always, as readahead makes little speed difference for continuous sequential reads, which this tool does on all the healthy ranges anyway.

-a / --min-read-rate: "When transfer rate drops under this, skip this section for now".

A decent "forget the harder/damaged areas, get the easy stuff of ASAP" option, useful for a quick first pass when you intend to follow it with a slower try-hard second pass

"sending ioctl 1261 to a partition"

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

The message is produced by relatively recent kernel's ioctl processing.

More informational than a problem, and possibly a temporary thing as a programmer's reminder-to-self/others.

Apparently this has to do with whether to communicate ioctls between logical volumes and underlying devices, so you see this around software and hardware RAID.

ioctl 1261 seems to be BLKFLSBUF, which seems to mean "flush the page cache"

DMA intr status 0x51, error 0x84

Seeing the following in your log:

hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

Or, in libata something like:

ata2.00: tag 0 cmd 0xc8 Emask 0x10 stat 0x51 err 0x84 (ATA bus error)

...means high-speed UDMA transfers are failing. These are warnings, not errors, and it doesn't mean anything went bad unless the fallbacks fail - which is rare and you would see those errors fairly immediately after this in the logs.

If you see only the above pairs of lines, all that happened is that the drive will be set to a lower speed (lower UDMA mode, possibly even PIO) until the operation succeeded. These mode changes will also be logged (immediately after).

If you see these messages at all, you will mostly see them during bootup.

Apparently, the cause is often that there is too much noise in the PATA cable for a certain UDMA mode, most of the time because it is a cable not guaranteed for a particular speed, or just a low quality cable.

(Other causes may or may not include incompatible controllers on the same cable and too little power supplied to your hard drive, or a failing hard drive. (verify))

libata messages

EH refers to error handling.

EH actually runs frequently, but only logs anything when when there is something worth mentioning.

When you see it in logs, it does not necessarily refer to an error. It could e.g. be that EH is choosing a slower, more basic and robust transfer method, or is handling an interface reset, both of which are generally transparent to apps (except in the delay).

But yes, in other cases EH may be verbosely mentioning the various details about a drive failing. Or the results of a bad cable. Or of some bad interaction between controller/driver/drive.

"Is this information, a warning, or an error?"

A SATA drive initializing, e.g. being recognized around bootup, looks something like:

SCSI device sda: 976773168 512-byte hdwr sectors (500108MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back

this is just telling you things about the drive and access mechanisms.

Similarly, a port (re)initializing looks something like:

ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.00: configured for UDMA/133

Once is just necessary.

If it happens more often, it will probably have an error handling (EH) message near it explaining why it just reset the port (look for EH complete and possibly a soft resetting port in the logs).

If the reason seems valid (e.g. you plugged out a device and possibly in again), this is not a hardware problem (though if it was writing when you yanked it, the file and/or filesystem won't be in a happy state).

DRDY itself means "device ready", and seems to appear in logs when a device wasn't immediately responsive for some other reason(verify).

It's not an error in itself, but read the lines around it, particularly if it combines with ERR.

For example, if you see:

Emask 0x9 (media error)
status: { DRDY ERR }
error: { UNC }

That's an error, and UNC points to bad sectors. Check smartctl to see whether you need to replace the drive. (the command that led to this, mentioned before, will often be a READ of some variant(verify))

(If a WRITE fails, data loss is fairly likely -- stop things now and diagnose)

If it's

status: { DRDY ERR }
error: { ABRT }

it's something like the drive not becoming ready for some reason (quite varied reasons. Flaky/old controller power management failing to spin up after resume is something that seems to come up in discussions).

If you see 10B8B and/or BadCRC, data transfer checksums (not surface checksums) are failing (verify). The first thing to check is the cable.

If you see something like:

failed command: SMART
cmd b0/d5:01:06:4f:c2/00:00:00:00:00/00 tag 30 pio 512 in
res 40/00:00:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
status: { DRDY }
hard resetting link

...and it's an Intel SSD, this seems to be because these drives choose not to support old-style SMART logs. It's safe (verify), though the reset (of which there may be several due to multiple queries) may take half a second(verify).

The tl;dr seems to be "use -x instead of -a" (verify) - though note that tools that parse smartctl output now need to rewrite that parsing

Addressing

Benchmarking

SATA mode - IDE or AHCI

'IDE mode' exists for older OSes that do not understand AHCI.

AHCI is necessary for newer features such as command queueing, hotplugging and such -- well, in comparison with IDE, in that SATA.

You can switch from IDE to AHCI after you installed your OS without having to reinstall, but this may take preparation and/or cleanup.

For example, Win7 will bluescreen (STOP 7B, can't find boot device) if you change this BIOS setting without any preparation, and there's no automatic resolution.

The reason seems to be that its boot loads the storage drivers it detected it needed at install time, so if it was installed while set to IDE, that excludes the AHCI driver. (side note: 'RAID' usually implies AHCI)

You can work around this by, apparently

telling windows (which versions?(verify)) to go use the most generic driver for one boot, and changing the controller setting in that same boot(verify)

It'll then boot, and install the driver for whatever it sees.

or by getting it to start the msahci driver during boot

which you'ld need to do booted - probably meaning you need to switch back to IDE/combined for this

or, in windows 10, you do a safe-mode startup and force msahci driver to load

https://en.wikipedia.org/wiki/Advanced_Host_Controller_Interface#System_drive_boot_issues

Hard drive showing up as removable AHCI

AHCI, as an interface, supports hotplugging.

In some cases, all drives will be considered removable. Sometimes this makes sense, sometimes you won't ever use it (and clutters the 'safely remove hardware' list), and in some cases it's nonsense (showing the drive windows is running from, which is never actually possible to remove).

If you're seeing this in Win7, it's quite possible you're using the generic Microsoft AHCI driver. It seems that usually installing the driver for the hardware you have (look for drivers for your motherboard) will make it more considerate about what ports are removable.

If that's not it, most drivers allow you to disable the hotpluggability, either per channel, or globally. The details may depend on the driver in use. (verify)

Computer data storage - Semi-sorted

Contents

Interconnection speed comparison

Drive wear - platter drives

Unsorted

Is a full hard drive slower than an empty one?

diagnosis and recovery

Trouble check

Getting data from the drive

Recovering soft errors

Some utility notes

dd, ddrescue and such

"sending ioctl 1261 to a partition"

DMA intr status 0x51, error 0x84

libata messages

"Is this information, a warning, or an error?"

Addressing

Benchmarking

SATA mode - IDE or AHCI

Hard drive showing up as removable AHCI

Navigation menu