RAID 6: At What Point Do You Get Nervous?

parityOCP · Nov 11, 2010

This for all those girls and boys who run large arrays, in terms of number of drives. At what point do you stop making your RAID 6 array any bigger? I know with RAID 5, it's around 7-8 drives, and at that point you start looking at RAID 6.

But what about RAID 6? How many drives is a comfortable limit before you start looking at creating another array? 15? 30?

brutalizer · Nov 11, 2010

It should be a matter of the size of the drives. Not the number of drives.

If a drive is 10TB big, it will take months(?) before the RAID is repaired fully. During that time another disk can crash because of increased activity (lots of sysadmins report a secondary crash during repair).

If you have 200GB drives, it should take some hours before repair, and that time frame without redundancy is maybe acceptable.

Blue Fox · Nov 11, 2010

brutalizer said:
It should be a matter of the size of the drives. Not the number of drives.

If a drive is 10TB big, it will take months(?) before the RAID is repaired fully. During that time another disk can crash because of increased activity (lots of sysadmins report a secondary crash during repair).

If you have 200GB drives, it should take some hours before repair, and that time frame without redundancy is maybe acceptable.

As it was stated in another thread, you obviously seem to have never owned a decent RAID card and just like to preach to the choir. It will not take months to repair an array with 10TB drives. Nor will it take weeks. RAID 6 also retains redundancy after a drive failure if you didn't know. Please do us a service and stop spreading misinformation...go post on some ZFS forum. You'll be welcomed with open arms there.

parityOCP said:
This for all those girls and boys who run large arrays, in terms of number of drives. At what point do you stop making your RAID 6 array any bigger? I know with RAID 5, it's around 7-8 drives, and at that point you start looking at RAID 6.

But what about RAID 6? How many drives is a comfortable limit before you start looking at creating another array? 15? 30?

24 drives is as high as I go, but I am currently limited by my chassis to 20 drives (and I don't really want to span it across another chassis). I think it's a fairly reasonable number to be honest and it is the upper bound of any controller without using SAS expanders. Been running large arrays for many years now without a single problem (human error aside).

brutalizer · Nov 11, 2010

Blue Fox said:
As it was stated in another thread, you obviously seem to have never owned a decent RAID card and just like to preach to the choir. It will not take months to repair an array with 10TB drives. Nor will it take weeks.

Jesus. As I wrote:

"Look, if you have a raid in production, and you try to repair it, you will never reach full speed. And, when you repair a raid, what is important is random read and random write. Have you seen benchmarks for random read and random write? A 2TB disk, capable of 150MB/sec sequential workload, is down to maybe 0.4MB/sec when there is random workload. In worst case: 0.4MB/sec random workload during repair, and in addition, the raid is in production and in heavy use - well it is going to take a loooong time to repair the raid. In worst case, the users will write more than 0.4MB/sec to the raid, which is more than the speed which repairs the drive - the repair will never even finish! With 10TB disks, a month is not unrealistic"

Blue Fox · Nov 11, 2010

Have you ever actually used a RAID array in your life? Rebuild time scales linearly with drive size if the read speeds remain constant (they only go up with larger drives). Even with the rebuild occurring in the background, no modern RAID card will take that long...and surprise surprise, as RAID HBAs get faster, so will the rebuilds, so by the time 10TB drives are out, things will be a lot faster than today. Random reads and writes are not important for the rebuild itself at all. The RAID HBA rebuilds the array sequentially. You should create a large RAID array, pull a drive, and then see how long it takes to rebuild as I can assure you that it takes no more than a couple days even with 20 x 2TB drives in background mode while in use (because I've done this myself).

xeonMP · Nov 11, 2010

brutalizer said:
Jesus. As I wrote:

"Look, if you have a raid in production, and you try to repair it, you will never reach full speed. And, when you repair a raid, what is important is random read and random write. Have you seen benchmarks for random read and random write? A 2TB disk, capable of 150MB/sec sequential workload, is down to maybe 0.4MB/sec when there is random workload. In worst case: 0.4MB/sec random workload during repair, and in addition, the raid is in production and in heavy use - well it is going to take a loooong time to repair the raid. In worst case, the users will write more than 0.4MB/sec to the raid, which is more than the speed which repairs the drive - the repair will never even finish! With 10TB disks, a month is not unrealistic"

Maybe on a 3ware controller that's true but on a good controller like arecas that's not true at all.

Your original post is also not accurate my areca controller rebuilt my 23x 2TB raid 6 in a little over 4 hours certainly not months.

brutalizer · Nov 11, 2010

Great. Have you done this in production, when the raid is in heavy use?

xeonMP · Nov 11, 2010

brutalizer said:
Great. Have you done this in production, when the raid is in heavy use?

The array was in use but I wouldn't say heavy use.

However anyone who really depends on their performance/reliability using this in a heavy workload environment is going to have multiple servers and not just rely on a single machine.

houkouonchi · Nov 11, 2010

24 drives is the point where I no longer feel safe with raid6 and any more and I would want triple parity raid. Actually if triple parity raid was available on controllers (not just ZFS) then I think I would go triple parity raid at about 16+ drives but I would still happily go up to 24 if the controller did not support triple parity.

It is true that a server that is being maxed out on I/O 24/7 can take a very long time to rebuild even with a fast controller.

To make sure that disks were compatible I totally maxed out the I/O on an ARC-1880ix with eight 1.5 TB disks. Maxing I/O and doing a rebuild made the rebuild take around 90 hours so yes rebuild times can get high if the machine has its I/O maxed out.

That being said you shouldn't be running production machines at MAX I/O all the time like that and even then I think it taking a month is unlikely. I find it very rare for rebuilds to take over 1 day with 1.5 or 2 TB disks even when there is just normal 'heavy' I/O. We have around 800 machines with a raid array. All have 8-24 disks and these are shared web-servers (lots of random reads with some random/sequential writes) which are heavily loaded and again I almost never see >1 day rebuild times.

pjkenned · Nov 11, 2010

brutalizer said:
Jesus. As I wrote:

"Look, if you have a raid in production, and you try to repair it, you will never reach full speed. And, when you repair a raid, what is important is random read and random write. Have you seen benchmarks for random read and random write? A 2TB disk, capable of 150MB/sec sequential workload, is down to maybe 0.4MB/sec when there is random workload. In worst case: 0.4MB/sec random workload during repair, and in addition, the raid is in production and in heavy use - well it is going to take a loooong time to repair the raid. In worst case, the users will write more than 0.4MB/sec to the raid, which is more than the speed which repairs the drive - the repair will never even finish! With 10TB disks, a month is not unrealistic"

If you are doing heavy RANDOM small file I/O to get 0.4MB/s, there is a good chance you are also using onboard cache of a RAID card or OS cache to serve requests. This works on writes too which you do not get with software RAID since in SW RAID you do not get BBWC.

Just as a thought... nobody can afford to have a 24 spindle array running at 0.4MB/s. Companies like Adaptec have had SSD caching abilities for well over a year on hardware RAID cards. Using 2.5" 15k rpm drives and lots of spindles helps a lot. Random I/O is a big reason for 2.5" enterprise form factors because you get faster access times (less distance for head to travel) and higher drive density per U. Plus, SAS helps a bit. If you are trying to serve a super high random I/O workload solely from consumer SATA these days... you probably should not be making storage decisions.

dandragonrage · Nov 11, 2010

Personally, RAID-5 up to 4-5 drives, RAID-6 up to 10-12 drives... if the data is really important, anyway.

drescherjm · Nov 11, 2010

I will do RAID 5 from 4 to 8 drives if I have a hot spare and rebuild times are less than 10 hours. I will do raid 6 from 6 to 20 drives without a hot spare and less than 1 day rebuild times.

pjkenned · Nov 11, 2010

On topic, I am a bit paranoid so I generally like to do:
1. OS Drive RAID 1 w/ backups to RAID 6 array.
2. 2 hotspares in chassis
3. 18 drive absolute max RAID 6.

Just for reference, NetApp's RAID-DP implementation with SATA drives is spec'd for max 20 drives (including the two parity) and for FC drives 28 drives. See the FAS3200 series tech specs (FAS3200 series = mid-range filers BTW).

parityOCP · Nov 11, 2010

Many thanks for the replies. It seems there are at least three factors in play here:

1) Number of drives in the array. The more drives in the array, the more chance of one failing.

2) The size of each drive. The larger capacity each unit is, the longer the rebuild time will be and hence the greater chance of a drive failing during the rebuild.

3) The size of each drive. The larger capacity each unit is, the greater chance of hitting a URE.

Would this be a fair summary?

@pjkenned

As well as having more than one hotspare, is it possible to mirror the parity drives in some way, like making a RAID 1 part of a RAID 6 array? I assume this can be done in Linux, but what about hardware RAID controllers?

Blue Fox · Nov 11, 2010

RAID 6 has distributed parity, so there is no parity drive and that isn't really possible. Only RAID 2, 3, and 4 have a dedicated parity drive (or drives in the RAID 2 case). You could theoretically mirror one of the two (or both) parity sets however, but I haven't really seen that done.

dandragonrage · Nov 11, 2010

In my opinion, if you want more than RAID-6 then RAID-Z3 is the way to go. Obviously that limits you on the OS front.

nitrobass24 · Nov 11, 2010

dandragonrage said:
In my opinion, if you want more than RAID-6 then RAID-Z3 is the way to go. Obviously that limits you on the OS front.

RAID60

xeonMP · Nov 11, 2010

nitrobass24 said:
RAID60

RAID 60 is not going to offer any guaranteed added redundancy.

Blue Fox · Nov 11, 2010

xeonMP said:
RAID 60 is not going to offer any guaranteed added redundancy.

How do you figure that? You're looking at a minimum of 4 parity drives, so double RAID 6 (though it doesn't offer double the redundancy).

xeonMP · Nov 11, 2010

Blue Fox said:
How do you figure that? You're looking at a minimum of 4 parity drives.

Chances are better but it's still not guaranteed since it all depends on which drives fail.

dandragonrage · Nov 11, 2010

This is all about lowering the risk, but when it comes down to it, if RAID-6 isn't enough for me then RAID-60 won't be enough for me.

haileris · Nov 11, 2010

Not that there is much science behind this but I have RAID-6 sets of 12 x 2 TB disks. Nos 1, I am comfortable with that number in my wacky head. Nos 2, the Norco 4224 supports 24 disks so its a nice fit and 3, the HP SAS expander also supports 24 disks which mean I have can 2 complete RAID sets per enclosure. Like I said, makes sense to me.

Blue Fox · Nov 11, 2010

xeonMP said:
Chances are better but it's still not guaranteed since it all depends on which drives fail.

Naturally. It's still more resilient however.

dandragonrage said:
This is all about lowering the risk, but when it comes down to it, if RAID-6 isn't enough for me then RAID-60 won't be enough for me.

Should be the other way around as that logic isn't exactly sound.

dandragonrage · Nov 11, 2010

Blue Fox said:
Should be the other way around as that logic isn't exactly sound.

It's true the other way around as well, yes. My point was that 60 isn't that much more secure than 6, in my opinion. In RAID-6, 2 drives can die. In RAID-60, 2 to 4 drives can die, depending on which ones.

brutalizer · Nov 11, 2010

Here is an article about triple parity by Adam Leventhal
http://queue.acm.org/detail.cfm?id=1670144

His point is that discs are greater and greater, but not faster. So we will reach a point where discs will take a long time to rebuild. And yes, I talk about servers in production that are used. That is the reason people say raid-5 is soon obsolete, it will take to long time to rebuild during you have no extra redundancy with raid-5. Raid-6 and above are needed.

pjkenned · Nov 11, 2010

parityOCP said:
@pjkenned

As well as having more than one hotspare, is it possible to mirror the parity drives in some way, like making a RAID 1 part of a RAID 6 array? I assume this can be done in Linux, but what about hardware RAID controllers?

I have actually never tried this. Do you mean have independent RAID 1 disk pairs then RAID 6 those pairs so two pairs are used for "parity".

I guess you could but you are effectively losing 2+0.5*n drives worth of capacity which is really high.

Honestly, you are probably OK with one parity drive. For $90 2TB Hitachi's... an extra one is not bad. There are times when I am out of the country for a week or two and I sleep better. That sleep is worth the $90 extra that practically I have never had a need for.

parityOCP · Nov 11, 2010

pjkenned said:
I have actually never tried this. Do you mean have independent RAID 1 disk pairs then RAID 6 those pairs so two pairs are used for "parity".

Well first of all, I made a huge n00b error by talking about parity disks in RAID 6 when I KNEW that the parity is woven among all of the disks.

But yeah, you are on the right track. Kind of like RAID 6+1, but not, if you know what I mean?

pjkenned · Nov 11, 2010

Well the other option is that you mirror two RAID 6 arrays ;-)

SirMaster · Nov 12, 2010

So what about nested RAID arrays if you want more redundancy? Like RAID 51 or RAID 60.

Those always seemed like a pretty good option if you have a lot of drives.

odditory · Nov 12, 2010

The nested raid types are a performance multiplier, not a redundancy multiplier. But like all raid levels its usefulness depends on the application. Consider this scenario:

- RAID6: three separate RAID6 arrays of 8 drives, each in their own boat. more than two drives failing in any array means only one array sinks.
- RAID60: one boat with three RAID6 arrays of 8 drives. more than two drives failing in any array and ALL arrays sink.

So to my mind RAID60 is a waste of usable space except in performance scenarios, and mathematically riskier than running separate smaller arrays. Add to that, data recovery from a RAID60 is difficult to impossible short of a data recovery house with their own custom software. No publicly available data recovery tools exist (that I'm aware) that let you recover data from individual disks of a failed or corrupted RAID60, whereas you can use software like R-Studio to recover from individual disks of a failed or corrupted RAID6.

Ultimately if you want more redundancy than RAID6 then go with ZFS and raidz3 (triple parity), or Flexraid with infinite parity. Or if you can afford it just keep a duplicate copy of all your data. Or hang it up and go back to pencil and paper. I hear the Hare Krishna's are hiring.

SirMaster · Nov 12, 2010

What about RAID 55 with 25 disks.

5 RAID 5 arrays of 5 disks each. RAID 5 them all together.

Normally in a single RAID 5 with 25 disks you can only lose 1 drive obviously and be fine.

If you put them in RAID 55 though you can lose a maximum of 9 drives and be fine if you lost one from 4 of the arrays and another entire array. Though you could lose a minimum of 4 drives and lose everything by losing 2 drives in 2 separate arrays.

Still, losing 4 drives is twice as safe as losing 2 in a single 25-drive RAID 5. Plus the fact that they have to be only a certain combination of 4 drives makes it even less likely to happen.

Say they were all 1TB drives. In the single array you have 24TB of space. In the nested RAID55 array you have 16TB of space but it is a lot more secure. and in RAID3z you would have 22TB. So it's even more secure than RAID3z because if you lose any 4 in RAID3z you are dead, but in the RAID55 you have to lose a specific subset of 4 drives (2 in 2 different arrays.) which is obviously much less likely.

Though your probably right about the difficulties in management and recovery of such an array. Also, I bet read speeds would be great, but write speeds may be abysmal in RAID 55. Still fun to think about

odditory · Nov 12, 2010

Nobody's realistically going to run RAID55, or RAID555, (but I would run RAID666 for bragging rights, and SO WOULD YOU - really spruces up a sig). I think after double and triple parity you're wandering ever deeper into that no-man's land called diminishing returns, and it becomes trivial when you consider you should have a separate duplicate copy of the data anyway, since even N+infinity parity will never be as reliable as a physically separate duplicate, due to other single points of failure besides the controller and drives.

I saw a study along with some stats comparing different parity schemes - which I am trying to find again - and triple parity (raidz3) was the only one that - assuming a hotspare is always available - the array can statistically never fail due to drive failure. It wasn't even .001% chance, it was like .000. I think the quote was "you'll sooner see Michael Jackson spring forth from the grave, twirl around a few times, then go catch a movie and maybe some mexican food afterward." At least that's how I remember it.

That's not to say there isn't room for improvement though. For example hardware RAID controllers can add a triple parity scheme. They'll have to call it something other than Raid7 since apparently SCC has it trademarked, and they can also develop filesystem awareness (starting with NTFS, and whatever the major *nix's run - that covers most of the world) so they only have to deal with blocks actually in use and be able to do end-to-end checksumming (something ZFS already does), which also opens up the ability to do things like TRIM for SSD's. I've been nagging Areca about doing something like that, but things move slow, as it requires a major overhaul of the host O/S driver.

dandragonrage · Nov 12, 2010

RAID-5 and 6 are sometimes combined with RAID-0 (50/60) mostly for performance reasons rather than getting much extra data protection. RAID-55/66 would be sooooo sloooow at random writes.

Red Squirrel · Nov 12, 2010

lol raid 666. So 4 drives in raid 6, times 4, then those 4 arrays raided 6 again, then all that times 4, then those raided 6 again? (where 4 can be a bigger number)

I want to do that, just because. LOL

Blue Fox · Nov 12, 2010

Nested RAID levels with parity on all levels are kinda useless. With RAID 666, you only have 12.5% of the total space usable and the rest is lost to parity.

dandragonrage · Nov 12, 2010

Blue Fox said:
Nested RAID levels with parity on all levels are kinda useless. With RAID 666, you only have 25% of the total space usable and the rest is lost to parity.

Actually, RAID-666 would go as low as 12.5% usable, and that's with the minimum 64 disks. It would go up from there as you added disks to the innermost RAID sets. It's obviously not practical (not that anyone here ever said it was).

4^3 disks minimum, 2^3 disks usable at that point. The amount of disks that you can put in the array (without wasting any) is n^3, where n is at least 4. Usable space would be (n-2)^3.

Blue Fox · Nov 12, 2010

I'm good at ninja edits. I computed it for RAID 66 first.

odditory · Nov 12, 2010

Blue Fox said:
Nested RAID levels with parity on all levels are kinda useless. With RAID 666, you only have 12.5% of the total space usable and the rest is lost to parity.

No, the rest is lost to the devil. He wants your dataz. Really the whole thing is lost to the devil. And that's the point. It is what we like to call in the business.. "The Devil's Array", All Rights Reserved, that little c with a circle around it.

Blue Fox · Nov 12, 2010

Lost indeed...I don't even see it being possible with any of today's hardware.

RAID 66 is possible however.

dandragonrage · Nov 12, 2010

You could do it with a mix of hardware and software... might even be able to do it with just software in Linux. Not that it being possible makes it desirable.

RAID 6: At What Point Do You Get Nervous?

Limp Gawd

[H]ard|Gawd

[H]F Junkie

[H]ard|Gawd

[H]F Junkie

Weaksauce

[H]ard|Gawd

Weaksauce

RIP

[H]ard|Gawd

[H]F Junkie

[H]F Junkie

[H]ard|Gawd

Limp Gawd

[H]F Junkie

[H]F Junkie

[H]ard|DCer of the Month - December 2009

Weaksauce

[H]F Junkie

Weaksauce

[H]F Junkie

Limp Gawd

[H]F Junkie

[H]F Junkie

[H]ard|Gawd

[H]ard|Gawd

Limp Gawd

[H]ard|Gawd

2[H]4U

Supreme [H]ardness

2[H]4U

Supreme [H]ardness

[H]F Junkie

[H]F Junkie

[H]F Junkie

[H]F Junkie

[H]F Junkie

Supreme [H]ardness

[H]F Junkie

[H]F Junkie