raid 5 two drive failure question (linux)

86reddawg

Limp Gawd
Joined
Jan 24, 2002
Messages
259
so i noticed last night that i had a drive failure on tuesday - oddly enough after my eth0 card went down then right up... i figured maybe a spike or a quick flicker of power, so i decided to hotremove and hotadd... 2.5 hours (of 3 hours) into recovery a second drive died...dammit... so i was playing with some commands (raid5 linux software raid) and manage to force one of the drives to clear it's faulty status and now have the raid system in recovery with still one faulty drive. How is this possible? i was under the impression that 2 drives dead in raid5 meant loss of all data. Or is the data gonna be messed up once it gets done recovering?

rats, guess it's time to get some more drives in the server to replace the supposed dead ones...
 
If you in fact completely lose two drives (and their data), then you are dead in the water. On the other hand, if some glitch marked them as failed but the data is still ok (and you force it back online) then the array software or controller doesn't know any better. As long as the data and the parity match up, it should work. If I were you, I would make sure your data is all good once you get the array back up. You may want to consider hotspares as well.

I also ran into a situation like this with an Adaptec 2400A. It liked to fail drives randomly, then bring them back online after a bit. During a rebuild, it decided to randomly fail another drive... POS. I had to play with the Linux utils for it to figure out how to clear the status, as any such process was completely undocumented in the manual or on their site, bleh.
 
cool, i'm hoping for the best... right now i've got 2 drives that are waiting to go into the machine as spares, i've just been too lazy to get them in... well i know what i'm gonna be doing this weekend

*crosses fingers*
 
if the drives were totally dead, or you had corruption of data..
then two drives lost = :(:mad:
but with drives just going offline, you can recover it..
Just make sure you run a consistency check once recovery is done. I'd also advise not being in the oS while doing the recovery/consistency check.
 
it's not looking good - giving me the generic:
mount: wrong fs type, bad option, bad superblock on /dev/md1,
or too many mounted file systems

eh, i guess i'm gonna try and add in a spare disk or two (still lists at being active with 2 failed drives)
 
RAID5 is great, but can't withstand 2 drive failures, if thats indeed what happened to you. Check to see if the software can support a hot spare, if not you may consider using a hardware RAID card in conjunction with a extra drive already in place as a hot spare that way if a drive does die it can automatically start building the extra drive. I've only had a RAID5 array die once, drive failed, slapped in the new one and while it was rebuilding another disk failed, very rare but thats why the computing gods sent us tape drives :)
 
Back
Top