Computer data storage - RAID - mdadm notes

\

Computer data storage

Failure, error, and how to deal (concepts)
Noticing errors and failure
- Reading SMART reports
Partitioning and filesystems
- ZFS notes
Network storage
RAID notes
- mdadm notes, aacraid notes, OMSA notes, LSI notes
General & RAID performance tweaking
SSD notes
LVM notes
Some glossary
Semi-sorted

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

md is linux's software RAID implementation. It's regularly referred to as mdadm, which is actually md's administration utility.

Mostly, see the man page, it's pretty decent.

Inspect / watching

State of arrays

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Reading `/proc/mdstat`

You can look at the known md arrays and what they're up to via:

cat /proc/mdstat

The first line for a device mostly just mentions the disks:

(S) means spare

(F) means failed

The second line (starts with blocks) most importantly mentions something like

[2/1]
- the first is the total number of drives assigned
- the second is the number of currently active drives

[U_]
- U means used
- _ means failed, or just 'not yet used'

Some examples:

md0 : active raid5 sdc1[0] sdd1[1] sdh1[6] sde1[2] sdf1[3] sdg1[4]
      9767552000 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU]

...is a 6-disk RAID5 array with no failed disks, and no spare.

md0 : active raid5 sdd1[3] sdc1[1] sdb1[0]
      5860527104 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
      [>....................]  recovery =  1.5% (44144492/2930263552) finish=519.7min speed=92552K/sec

...is a 3-disk RAID5 set (no spare), which was just --created and is currently being built.

md1 : active raid5 sdc1[0] sde1[2] sdf1[4](F)
      5860535808 blocks level 5, 64k chunk, algorithm 2 [4/2] [U_U_]

...is a 4-disk RAID5 array that has had two drives drop from it. The fact that that it's effectively dead isn't mentioned explicitly (it was still mounted at this stage, and showed I/O errors on a lot of actual data accesses. It probably won't mount again). The fourth member was sdd. I'm guessing it isn't in the list because this server had rebooted after it had failed, while sdf failed in the current boot.

md0 : inactive sdc1[1](S) sdd1[3](S) sdb1[4](S)
      8790792192 blocks super 1.2
       
unused devices: <none>

This used to be a 3-disk RAID5, now it's weird.

`mdadm --detail`

...gives a lot more detail. The below only shows some of the more interesting bits.

For example, the 3-disk RAID5 example mentioned above is reported as:

     Raid Level : raid5
     Array Size : 5860527104 (5589.03 GiB 6001.18 GB)
   Raid Devices : 3
  Total Devices : 3
          State : clean, degraded, recovering
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1
     Chunk Size : 512K
 Rebuild Status : 7% complete
           Name : hostname:0  (local to host hostname)
           UUID : 1706dedb:0a47bad3:52704024:0489eb4b
         Events : 2

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       3       8       49        2      spare rebuilding   /dev/sdd1

The failed array as:

/dev/md1:
        Version : 0.90
  Creation Time : Wed Jul 28 19:39:40 2010
     Raid Level : raid5
     Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
  Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Wed Apr 10 20:04:09 2013
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : ff86440d:eff102a9:253bf10a:15a680f2
         Events : 0.6579830

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       1       0        0        1      removed
       2       8       65        2      active sync   /dev/sde1
       3       0        0        3      removed
       4       8       81        -      faulty spare   /dev/sdf1

Monitor/follow

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

You can use mdadm --monitor / mdadm --follow to look at state changes such as a failed disk (...not a drive that is starting to show signs of failure, so for RAID0 this isn't meaningful)

mdadm --monitor /dev/md0, (or --scan?)
-m admin@example.com if you want changed mailed to you
-1 checks one instead of continuously. Useful for quick checks, and for cron jobs like mdadm --monitor --scan -1
you'll often want (or be requried to) specify one or more of:
- a mail address to send errors to
- a program to run

SparesMissing means either that your configuration says you'd add spares and you didn't, or that a spare became part of the active set so that you don't have a spare anymore.

If a disk moved from spare to active you probably want to replace it sometime.

If you accidentally set the amount of spares too high at creation time, you can lower it in mdadm.conf

Maintenance and recovery

Replacing a drive

Assuming we're talking RAID-as-in-redundancy (i.e. this wasn't RAID0 or linear), you can do an online replace.

Basically

Look at /proc/mdstat or mdadm --detail /dev/md0

If md already considers it failed, skip this. If it doesn't, you can do that yourself like:

mdadm --manage /dev/md0 --fail /dev/sdb

Use mdadm to remove the faulty drive from the array (which you can only do to failed and spare drives), e.g.

mdadm --manage /dev/md0 --remove /dev/sdb

physically replace the drive with a new one you want.

If you hand partitions to mdadm, set those up first.

Add the drive to the array

mdadm --manage /dev/md0 --add /dev/sdb

(or sdb1 if you use partitions)

Look at /proc/mdstat. It will mention it is doing a rebuild/recovery, and how long that will take.

watch cat /proc/mdstat

Altering (create, grow, delete, etc.)

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Semi-sorted

Speed limit

Resync and rebuilds try to adhere to a goal speed, a KByte/s figure settable via:

sysctl dev.raid.speed_limit_min or /proc/sys/dev/raid/speed_limit_min
- defaults to 1000 (1MByte/s)
- meant as a "try to do this much even when people are using the drive" (verify)

dev.raid.speed_limit_max or /proc/sys/dev/raid/speed_limit_max
- defaults to something like 100000 or 200000 (100 MByte/sec or 200 MByte/s)

Preventive maintenance

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Scrubbing

To start:

echo check > /sys/block/mdX/md/sync_action

You probably want to put that in crontab.

Recovery

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Example

Unsorted

Config and details

mdadm.conf

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

mdadm.conf is not strictly necessary.

That is, md can just scans devices and assembles everything based on the metadata on the devices. If you've got a single array, this is very predictable.

mdadm.conf can be useful when managing multiple arrays, though.

To generate the details of your current arrays, you can do an...

mdadm --examine --scan

You'll get something like:

ARRAY /dev/md/2 metadata=1.2 UUID=65a5ef0a:b45fd96c:cbd2d6c6:0569ec26 name=callisto:2
ARRAY /dev/md/1 metadata=1.2 UUID=ad1c8195:1bcc805e:bdebea7e:f86f9de8 name=callisto:1
ARRAY /dev/md/0 metadata=1.2 UUID=7f722f3d:1112966b:28a7ba07:720f6a80 name=callisto:0

If you drives with old metadata, you can expect incomplete entries, double entries, or whatnot.

A number of details, such as level=raid5, num-devices=4, and metadata=1.2, may be added, but would be read from the metadata if omitted.

In theory, that's all you need.

If you want to update-and-overwrite the current mdadm.conf, you can do:

mdadm -Es --config=/etc/mdadm.conf > /etc/mdadm.confmd device numbers and names✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.The /dev/md devices are created based on init reading the lines in mdadm.conf (verify) md devices can always be referenced by number, e.g. /dev/md0 or /dev/md127 Recent kernels additionally add name references (which are symlinks to the numbered devices), for example /dev/md/hostname:arrayname If you didn't hand a name to mdadm --create, it will enumerate it like hostname:0 Names won't change, so when you have more than one array in a host, this can help clarity in your management. Names that do not match the hostname (gethostname) will get underscore and a digit, like /dev/md/home_0. This is relevant when you move arrays between hosts, e.g. for recovery.

Computer data storage - RAID - mdadm notes

Contents

Inspect / watching

State of arrays

Reading `/proc/mdstat`

`mdadm --detail`

Monitor/follow

Maintenance and recovery

Replacing a drive

Altering (create, grow, delete, etc.)

Semi-sorted

Speed limit

Preventive maintenance

Scrubbing

Recovery

Example

Unsorted

Config and details

mdadm.conf

md device numbers and names

Navigation menu

Computer data storage - RAID - mdadm notes

Inspect / watching

State of arrays

Reading /proc/mdstat

mdadm --detail

Monitor/follow

Maintenance and recovery

Replacing a drive

Altering (create, grow, delete, etc.)

Semi-sorted

Speed limit

Preventive maintenance

Scrubbing

Recovery

Example

Unsorted

Config and details

mdadm.conf

md device numbers and names

Navigation menu

Reading `/proc/mdstat`

`mdadm --detail`