Computer data storage - RAID - mdadm notes

From Helpful
Jump to navigation Jump to search
Computer data storage
This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

md is linux's software RAID implementation. It's regularly referred to as mdadm, which is actually md's administration utility.

Mostly, see the man page, it's pretty decent.


Inspect / watching

State of arrays

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Reading /proc/mdstat

You can look at the known md arrays and what they're up to via:

cat /proc/mdstat


The first line for a device mostly just mentions the disks:

(S) means spare
(F) means failed

The second line (starts with blocks) most importantly mentions something like

  • [2/1]
    • the first is the total number of drives assigned
    • the second is the number of currently active drives
  • [U_]
    • U means used
    • _ means failed, or just 'not yet used'


Some examples:

md0 : active raid5 sdc1[0] sdd1[1] sdh1[6] sde1[2] sdf1[3] sdg1[4]
      9767552000 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU]

...is a 6-disk RAID5 array with no failed disks, and no spare.


md0 : active raid5 sdd1[3] sdc1[1] sdb1[0]
      5860527104 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
      [>....................]  recovery =  1.5% (44144492/2930263552) finish=519.7min speed=92552K/sec

...is a 3-disk RAID5 set (no spare), which was just --created and is currently being built.


md1 : active raid5 sdc1[0] sde1[2] sdf1[4](F)
      5860535808 blocks level 5, 64k chunk, algorithm 2 [4/2] [U_U_]

...is a 4-disk RAID5 array that has had two drives drop from it. The fact that that it's effectively dead isn't mentioned explicitly (it was still mounted at this stage, and showed I/O errors on a lot of actual data accesses. It probably won't mount again). The fourth member was sdd. I'm guessing it isn't in the list because this server had rebooted after it had failed, while sdf failed in the current boot.


md0 : inactive sdc1[1](S) sdd1[3](S) sdb1[4](S)
      8790792192 blocks super 1.2
       
unused devices: <none>

This used to be a 3-disk RAID5, now it's weird.


mdadm --detail

...gives a lot more detail. The below only shows some of the more interesting bits.

For example, the 3-disk RAID5 example mentioned above is reported as:

     Raid Level : raid5
     Array Size : 5860527104 (5589.03 GiB 6001.18 GB)
   Raid Devices : 3
  Total Devices : 3
          State : clean, degraded, recovering
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1
     Chunk Size : 512K
 Rebuild Status : 7% complete
           Name : hostname:0  (local to host hostname)
           UUID : 1706dedb:0a47bad3:52704024:0489eb4b
         Events : 2

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       3       8       49        2      spare rebuilding   /dev/sdd1

The failed array as:

/dev/md1:
        Version : 0.90
  Creation Time : Wed Jul 28 19:39:40 2010
     Raid Level : raid5
     Array Size : 5860535808 (5589.04 GiB 6001.19 GB)
  Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Wed Apr 10 20:04:09 2013
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : ff86440d:eff102a9:253bf10a:15a680f2
         Events : 0.6579830

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       1       0        0        1      removed
       2       8       65        2      active sync   /dev/sde1
       3       0        0        3      removed
       4       8       81        -      faulty spare   /dev/sdf1

Monitor/follow

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

You can use mdadm --monitor / mdadm --follow to look at state changes such as a failed disk (...not a drive that is starting to show signs of failure, so for RAID0 this isn't meaningful)


  • mdadm --monitor /dev/md0, (or --scan?)
  • -m admin@example.com if you want changed mailed to you
  • -1 checks one instead of continuously. Useful for quick checks, and for cron jobs like mdadm --monitor --scan -1
  • you'll often want (or be requried to) specify one or more of:
    • a mail address to send errors to
    • a program to run


SparesMissing means either that your configuration says you'd add spares and you didn't, or that a spare became part of the active set so that you don't have a spare anymore.

If a disk moved from spare to active you probably want to replace it sometime.

If you accidentally set the amount of spares too high at creation time, you can lower it in mdadm.conf

Maintenance and recovery

Replacing a drive

Assuming we're talking RAID-as-in-redundancy (i.e. this wasn't RAID0 or linear), you can do an online replace.

Basically

  • Look at /proc/mdstat or mdadm --detail /dev/md0
If md already considers it failed, skip this. If it doesn't, you can do that yourself like:
mdadm --manage /dev/md0 --fail /dev/sdb
  • Use mdadm to remove the faulty drive from the array (which you can only do to failed and spare drives), e.g.
mdadm --manage /dev/md0 --remove /dev/sdb
  • physically replace the drive with a new one you want.
If you hand partitions to mdadm, set those up first.
  • Add the drive to the array
mdadm --manage /dev/md0 --add /dev/sdb 
(or sdb1 if you use partitions)
  • Look at /proc/mdstat. It will mention it is doing a rebuild/recovery, and how long that will take.
watch cat /proc/mdstat


Altering (create, grow, delete, etc.)

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Semi-sorted

Speed limit

Resync and rebuilds try to adhere to a goal speed, a KByte/s figure settable via:

  • sysctl dev.raid.speed_limit_min or /proc/sys/dev/raid/speed_limit_min
    • defaults to 1000 (1MByte/s)
    • meant as a "try to do this much even when people are using the drive" (verify)
  • dev.raid.speed_limit_max or /proc/sys/dev/raid/speed_limit_max
    • defaults to something like 100000 or 200000 (100 MByte/sec or 200 MByte/s)



Preventive maintenance

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Scrubbing

To start:

echo check > /sys/block/mdX/md/sync_action

You probably want to put that in crontab.


Recovery

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Example

Unsorted

Config and details

mdadm.conf

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


mdadm.conf is not strictly necessary.

That is, md can just scans devices and assembles everything based on the metadata on the devices. If you've got a single array, this is very predictable.


mdadm.conf can be useful when managing multiple arrays, though.

To generate the details of your current arrays, you can do an...

mdadm --examine --scan

You'll get something like:

ARRAY /dev/md/2 metadata=1.2 UUID=65a5ef0a:b45fd96c:cbd2d6c6:0569ec26 name=callisto:2
ARRAY /dev/md/1 metadata=1.2 UUID=ad1c8195:1bcc805e:bdebea7e:f86f9de8 name=callisto:1
ARRAY /dev/md/0 metadata=1.2 UUID=7f722f3d:1112966b:28a7ba07:720f6a80 name=callisto:0

If you drives with old metadata, you can expect incomplete entries, double entries, or whatnot.


A number of details, such as level=raid5, num-devices=4, and metadata=1.2, may be added, but would be read from the metadata if omitted.

In theory, that's all you need.


If you want to update-and-overwrite the current mdadm.conf, you can do:

mdadm -Es --config=/etc/mdadm.conf > /etc/mdadm.conf

md device numbers and names

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

The /dev/md devices are created based on init reading the lines in mdadm.conf (verify)


md devices can always be referenced by number, e.g. /dev/md0 or /dev/md127


Recent kernels additionally add name references (which are symlinks to the numbered devices), for example /dev/md/hostname:arrayname

If you didn't hand a name to mdadm --create, it will enumerate it like hostname:0


Names won't change, so when you have more than one array in a host, this can help clarity in your management.

Names that do not match the hostname (gethostname) will get underscore and a digit, like /dev/md/home_0. This is relevant when you move arrays between hosts, e.g. for recovery.