Cuma, Şubat 24, 2012

2 Disks failed simultaneously on a RAID 5 array - Disk, controller or software?

[Log in to get rid of this advertisement]
Hi there. My 1st post, so please be gentle.

I have a home server running Openfiler 2.3 x64 with 4x1.5TB software RAID 5 array (more details on the hardware and OS later). All was working well for two years until several weeks ago, the array failed with two faulty disks at the same time.

Well, those thing could happen, especially if one is using desktop-grade disks instead of enterprise-grade ones (way too expensive for a home server). Since is was most likely a false positive, I've reassembled the array:
Code:

# mdadm --assemble --force /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
mdadm: forcing event count in /dev/sdb1(0) from 110 upto 122
mdadm: forcing event count in /dev/sdc1(1) from 110 upto 122
mdadm: /dev/md0 has been started with 4 drives.

and a reboot later all was back to normal.



-------------------

http://en.wikipedia.org/wiki/Mdadm

Recovering from a loss of raid superblock

There are superblocks on the drives themselves and on the raid (apparently). If you have a power failure, hardware failure, that does not include the drives themselves, and you cannot get the raid to recover in any other way, and wish to recover the data, proceed as follows:

Get a list of the devices in the raid in question:

mdadm --detail /dev/md[x]

Result something like this:

/dev/md127:
Version : 1.2
Creation Time : Sun Aug 21 23:35:28 2011
Raid Level : raid6
Array Size : 7814047744 (7452.06 GiB 8001.58 GB)
Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
Raid Devices : 6
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Sun Jan 1 11:43:17 2012
State : clean, degraded
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : clop:1 (local to host clop)
UUID : 7ee1e93a:1b011f80:04503b8d:c5dd1e23
Events : 62
Number Major Minor RaidDevice State
0 8 33 0 active sync /dev/sdc1
1 8 49 1 active sync /dev/sdd1
2 8 81 2 active sync /dev/sdf1
3 8 65 3 active sync /dev/sde1
4 0 0 4 removed
5 0 0 5 removed

RaidDevice order (sdc1,sdd1,sdf1,sde1) and Chunk Size are critical

Record all your raid member parameters:

mdadm --examine /dev/sd[abcde...]1 | egrep 'dev|Update|Role|State|Chunk Size'

Look carefully at the Update time. If you have raid members attached to the motherboard and others attached to a raid card, and the card fails, but leaves enough members to keep the raid alive, you want to make a note of that. Look at Array State and Update Time. For example:

/dev/sdc1:
Update Time : Wed Jun 15 00:32:35 2011
Array State : AAAA.. ('A' == active, '.' == missing)
/dev/sdd1:
Update Time : Thu Jun 16 21:49:27 2011
Array State : .AAA.. ('A' == active, '.' == missing)
/dev/sde1:
Update Time : Thu Jun 16 21:49:27 2011
Array State : .AAA.. ('A' == active, '.' == missing)
/dev/sdf1:
Update Time : Thu Jun 16 21:49:27 2011
Array State : .AAA.. ('A' == active, '.' == missing)
/dev/sdk1:
Update Time : Tue Jun 14 07:09:34 2011
Array State : ....AA ('A' == active, '.' == missing)
/dev/sdl1:
Update Time : Tue Jun 14 07:09:34 2011
Array State : ....AA ('A' == active, '.' == missing)

Devices sdc1, sdd1, sde1 and sdf1 are the last members in the array and will rebuild correctly. sdk1 and sdl1 left the array (in my case due to a raid card failure).

Also note the raid member, starting with 0, the raid needs to be rebuilt in the same order. Chunk size is also important.

Zero the drive superblocks

mdadm --stop /dev/md0 # to halt the array
mdadm --remove /dev/md0 # to remove the array
mdadm --zero-superblock /dev/sd[cdefkl]1

Reassemble the raid

mdadm --create /dev/md1 --chunk=4096 --level=6 --raid-devices=6 /dev/sdc1 /dev/sdd1 /dev/sdf1 /dev/sde1 missing missing

'missing' tell the create command to rebuild the raid in a degraded state. sdk1 and sdl1 can be added later

Edit /etc/mdadm.conf and add an ARRAY line with a UUID. First get the UUID for your raid:

mdadm -D /dev/md

then:

nano /etc/mdadm.conf

and add something similar to the file (notice there is no # in front of the active line you are adding

#ARRAY /dev/md0 UUID=3aaa0122:29827cfa:5331ad66:ca767371
#ARRAY /dev/md1 super-minor=1
#ARRAY /dev/md2 devices=/dev/hda1,/dev/hdb1
ARRAY /dev/md1 UUID=7ee1e93a:1b011f80:04503b8d:c5dd1e23

Save with

Last, mark the array possilbly dirty with:

mdadm --assemble /dev/md1 --update=resync

Monitor the rebuild with

watch -n 1 cat /proc/mdstat

All your data should be recovered!

Hiç yorum yok: