Computer data storage - RAID - mdadm notes
\
Computer data storage |
md is linux's software RAID implementation. It's regularly referred to as mdadm, which is actually md's administration utility.
Mostly, see the man page, it's pretty decent.
Inspect / watching
State of arrays
Reading /proc/mdstat
You can look at the known md arrays and what they're up to via:
cat /proc/mdstat
The first line for a device mostly just mentions the disks:
- (S) means spare
- (F) means failed
The second line (starts with blocks) most importantly mentions something like
- [2/1]
- the first is the total number of drives assigned
- the second is the number of currently active drives
- [U_]
- U means used
- _ means failed, or just 'not yet used'
Some examples:
md0 : active raid5 sdc1[0] sdd1[1] sdh1[6] sde1[2] sdf1[3] sdg1[4] 9767552000 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU]
...is a 6-disk RAID5 array with no failed disks, and no spare.
md0 : active raid5 sdd1[3] sdc1[1] sdb1[0] 5860527104 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_] [>....................] recovery = 1.5% (44144492/2930263552) finish=519.7min speed=92552K/sec
...is a 3-disk RAID5 set (no spare), which was just --created and is currently being built.
md1 : active raid5 sdc1[0] sde1[2] sdf1[4](F) 5860535808 blocks level 5, 64k chunk, algorithm 2 [4/2] [U_U_]
...is a 4-disk RAID5 array that has had two drives drop from it. The fact that that it's effectively dead isn't mentioned explicitly (it was still mounted at this stage, and showed I/O errors on a lot of actual data accesses. It probably won't mount again). The fourth member was sdd. I'm guessing it isn't in the list because this server had rebooted after it had failed, while sdf failed in the current boot.
md0 : inactive sdc1[1](S) sdd1[3](S) sdb1[4](S) 8790792192 blocks super 1.2 unused devices: <none>
This used to be a 3-disk RAID5, now it's weird.
mdadm --detail
...gives a lot more detail. The below only shows some of the more interesting bits.
For example, the 3-disk RAID5 example mentioned above is reported as:
Raid Level : raid5 Array Size : 5860527104 (5589.03 GiB 6001.18 GB) Raid Devices : 3 Total Devices : 3 State : clean, degraded, recovering Active Devices : 2 Working Devices : 3 Failed Devices : 0 Spare Devices : 1 Chunk Size : 512K Rebuild Status : 7% complete Name : hostname:0 (local to host hostname) UUID : 1706dedb:0a47bad3:52704024:0489eb4b Events : 2 Number Major Minor RaidDevice State 0 8 17 0 active sync /dev/sdb1 1 8 33 1 active sync /dev/sdc1 3 8 49 2 spare rebuilding /dev/sdd1
The failed array as:
/dev/md1: Version : 0.90 Creation Time : Wed Jul 28 19:39:40 2010 Raid Level : raid5 Array Size : 5860535808 (5589.04 GiB 6001.19 GB) Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB) Raid Devices : 4 Total Devices : 3 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Wed Apr 10 20:04:09 2013 State : clean, degraded Active Devices : 2 Working Devices : 2 Failed Devices : 1 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : ff86440d:eff102a9:253bf10a:15a680f2 Events : 0.6579830 Number Major Minor RaidDevice State 0 8 33 0 active sync /dev/sdc1 1 0 0 1 removed 2 8 65 2 active sync /dev/sde1 3 0 0 3 removed 4 8 81 - faulty spare /dev/sdf1
Monitor/follow
You can use mdadm --monitor / mdadm --follow to look at state changes such as a failed disk (...not a drive that is starting to show signs of failure, so for RAID0 this isn't meaningful)
- mdadm --monitor /dev/md0, (or --scan?)
- -m admin@example.com if you want changed mailed to you
- -1 checks one instead of continuously. Useful for quick checks, and for cron jobs like mdadm --monitor --scan -1
- you'll often want (or be requried to) specify one or more of:
- a mail address to send errors to
- a program to run
SparesMissing means either that your configuration says you'd add spares and you didn't, or that a spare became part of the active set so that you don't have a spare anymore.
If a disk moved from spare to active you probably want to replace it sometime.
If you accidentally set the amount of spares too high at creation time, you can lower it in mdadm.conf
Maintenance and recovery
Replacing a drive
Assuming we're talking RAID-as-in-redundancy (i.e. this wasn't RAID0 or linear), you can do an online replace.
Basically
- Look at /proc/mdstat or mdadm --detail /dev/md0
- If md already considers it failed, skip this. If it doesn't, you can do that yourself like:
mdadm --manage /dev/md0 --fail /dev/sdb
- Use mdadm to remove the faulty drive from the array (which you can only do to failed and spare drives), e.g.
mdadm --manage /dev/md0 --remove /dev/sdb
- physically replace the drive with a new one you want.
- If you hand partitions to mdadm, set those up first.
- Add the drive to the array
mdadm --manage /dev/md0 --add /dev/sdb
- (or sdb1 if you use partitions)
- Look at /proc/mdstat. It will mention it is doing a rebuild/recovery, and how long that will take.
watch cat /proc/mdstat
Altering (create, grow, delete, etc.)
Semi-sorted
Speed limit
Resync and rebuilds try to adhere to a goal speed, a KByte/s figure settable via:
- sysctl dev.raid.speed_limit_min or /proc/sys/dev/raid/speed_limit_min
- defaults to 1000 (1MByte/s)
- meant as a "try to do this much even when people are using the drive" (verify)
- dev.raid.speed_limit_max or /proc/sys/dev/raid/speed_limit_max
- defaults to something like 100000 or 200000 (100 MByte/sec or 200 MByte/s)
Preventive maintenance
Scrubbing
To start:
echo check > /sys/block/mdX/md/sync_action
You probably want to put that in crontab.
Recovery
Example
Unsorted
Config and details
mdadm.conf
mdadm.conf is not strictly necessary.
That is, md can just scans devices and assembles everything based on the metadata on the devices. If you've got a single array, this is very predictable.
mdadm.conf can be useful when managing multiple arrays, though.
To generate the details of your current arrays, you can do an...
mdadm --examine --scan
You'll get something like:
ARRAY /dev/md/2 metadata=1.2 UUID=65a5ef0a:b45fd96c:cbd2d6c6:0569ec26 name=callisto:2 ARRAY /dev/md/1 metadata=1.2 UUID=ad1c8195:1bcc805e:bdebea7e:f86f9de8 name=callisto:1 ARRAY /dev/md/0 metadata=1.2 UUID=7f722f3d:1112966b:28a7ba07:720f6a80 name=callisto:0
If you drives with old metadata, you can expect incomplete entries, double entries, or whatnot.
A number of details, such as level=raid5, num-devices=4, and metadata=1.2, may be added, but would be read from the metadata if omitted.
In theory, that's all you need.
If you want to update-and-overwrite the current mdadm.conf, you can do:
mdadm -Es --config=/etc/mdadm.conf > /etc/mdadm.conf
md device numbers and names
The /dev/md devices are created based on init reading the lines in mdadm.conf (verify)
md devices can always be referenced by number, e.g. /dev/md0 or /dev/md127
Recent kernels additionally add name references (which are symlinks to the numbered devices), for example /dev/md/hostname:arrayname
If you didn't hand a name to mdadm --create, it will enumerate it like hostname:0
Names won't change, so when you have more than one array in a host, this can help clarity in your management.
Names that do not match the hostname (gethostname) will get underscore and a digit, like /dev/md/home_0. This is relevant when you move arrays between hosts, e.g. for recovery.