8x 4TB Array -- RAID 5, 6, or 50?

nellie7979

n00b
Joined
Aug 5, 2012
Messages
17
I am using an Areca-1223-8i to run 8 4TB Hitachi 7K4000 Ultrastars in RAID 5. It is used for Blu-ray and DVD iso's so performance isn't necessarily the number one priority. I like that RAID 5 offers the most usable space but from what I read online almost no one uses RAID 5 in large arrays like this--but perhaps this may be because they are usually for enterprise purposes.

The array is backed up with an external JBOD system on a weekly basis, so even in the worst case scenario, failure would not necessarily be catastrophic.

Metadata and artwork is stored on the array, so random access speed is sort of important. Right now it takes a 5-7 seconds to load a screen of two dozen movie covers. I've considered upgrading to a more powerful raid card with dual core to help with this issue but they are extremely expensive.

The entire array is within the media server itself, with the RAID card plugged into the pci-express slot of a X58A motherboard with a Q6600 cpu
 
Last edited:
You're right; for a setup that large, RAID 5 is unconventional, because of the risk and time cost for rebuilding in the event of a failure. I would recommend at least a RAID 6 preferably with a hot spare. RAID 50 sounds nice, but since this seems like it's mostly archival, I'm not sure cutting your usable space in half is worth the speed increase.

This is just what I gathered from reading similar posts on here. Someone more knowledgeable may provide better advice.
 
. RAID 50 sounds nice, but since this seems like it's mostly archival, I'm not sure cutting your usable space in half is worth the speed increase.

A raid50 array containing 8 drives would be the same size as a raid6 array containing the same 8 disks. (N-2) x hard drive size.
 
Go raid 6. You lose 2 drives to parity but can handle losing 2 drives before data loss.
 
Definitely RAID6. Once you pass ~6 large disks or ~12TB you should move from RAID 5 to RAID 6 when using consumer drives.
 
With 8 drives and storing media my rule is RAID6 for hardware raid, and RAID5 *minimally* for software raid. By software RAID I mean a non-striping software based raid scheme like snapraid or flexraid, the benefit there being you are breaking the interdependence that drives have in striped raid - which introduces extra and arguably unnecessary risk if you dont need greater than a single disk's worth of throughput and the files are more archival in nature - like photos and movies. By interdependence in striping raid I mean drive failures or problems have a ripple effect on one another. Not so with non-striped raid. Lose 2 drives in hardware RAID5, lose everything. Lose 2 drives in nonstriping RAID5, lose only 1 drive worth of data. Lose 3 drives in hardware RAID6, lose everything. Lose 3 drives in nonstriped RAID6, lose only 1 drive of data. Much less time to recovery when non-striping, could mean the difference between re-ripping 50 Bluray discs and re-ripping 400.

You've invested in easily the best 4TB drives money can buy - enterprise Hitachi's - and understandably they're incredibly pricey for home use but you'll have to ask yourself whether the expense of an additional parity drive for RAID6 is too much to bear versus having to restore the whole array in the unlikely event there's a double disk failure. Multi-disk failures can easily happen, and its not by "drives of the same batch" or other such internet-borne FUD that tries to explain spontaneous failures happening close to one another. Its almost always temperature, power or human error related. Cooling system goes out, power supply freaks out, or a single drive fails and someone starts panicking or fiddling with disks or settings or they unplug the wrong drive, etc. All things to keep in mind.

If you didn't have a backup I'd say you're crazy to stay RAID5 but since you do and don't want to buy another drive for additional parity then with hardware RAID5 at least configure schedule volume checks (scrubbing) on that controller - every 2-4 weeks.
 
Last edited:
Thank you for the advice everyone! I appreciate the input.

Definitely RAID6. Once you pass ~6 large disks or ~12TB you should move from RAID 5 to RAID 6 when using consumer drives.

I'm using Hitachi's newest line of enterprise-class drives. I've used the same exact hardware for about one year with 8 2TB WD RE4 enterprise-class drives in RAID 5 without a hitch so maybe I'm overconfident. That's without doing volume checks or anything of that nature. Read speeds are around 470mb/s and writes around 120-170 with this, the current setup. Tomorrow I'll be upgrading to the 4TB Hitachi drives.

odditory said:
If you didn't have a backup I'd say you're crazy to stay RAID5 but since you do and don't want to buy another drive for additional parity then with hardware RAID5 at least configure schedule volume checks (scrubbing) on that controller - every 2-4 weeks.

I'm curious--do you suggest scheduling volume checks to forsee a conventional failure down the road or is there something more specific about RAID 5 itself that makes it less stable? I would need two (supposedly) very reliable drives to fail within a short span of one another for complete data loss. In that case, there is the external backup.

It just seems so unlikely for that to occur, especially knowing that all the other elements in the system (raid card, cabling, mobo, etc.) are stable (so it would have to be actual genuine drive failure as opposed to a faulty molex aplitter or some such), that I'm still sort of leaning towards RAID 5 at the moment. Am I really that crazy?
 
I said if you didn't have backup I'd consider it crazy not to at least go RAID6. And the reason you scrub RAID5 isn't about foreseeing disk failures its about mitigating silent data corruption, which happens for many reasons and are just the nature of these magnetic devices. With RAID5 if a drive fails then you can't heal the array back 100% if there's any silent data corruption on any of the other disks. You can read more about it here. http://en.wikipedia.org/wiki/Data_corruption

As aereal density has gotten greater the likelihood of silent data corruption also theoretically increases. All the textbook stuff aside its arguable that you'd ever even notice a few undetected bit flips here and there especially if we're talking video files. But all this is good to learn about so you can be better informed when making storage decisions.
 
Last edited:
Thank you for all of this information. I think I will go with RAID 6 for awhile--There is only about 20TB of data at the moment. It could be more than a year until more space is needed, so the added protection couldn't hurt.
 
Ofcourse you can only do raid level migration when adding a disk so if you wanted to stay 8 disks you'd have to backup, delete array, rerecreate as RAID6 and restore.
 
Would RAID 50 in this configuration give you a speed bump (compared to RAID 5 or RAID 6) with the added advantage of smaller RAID 5 pairs (so smaller risk)?
 
The risk in raid50 is slightly greater than raid6 from the same amount of drives. In raid6 you can loose any 2 drives w/o data loss.In raid50 loosing 2 drives from the same raid5 array means complete disaster.
Speed vs reliability,like usual.
 
Thank you for the advice everyone! I appreciate the input.

I'm curious--do you suggest scheduling volume checks to forsee a conventional failure down the road or is there something more specific about RAID 5 itself that makes it less stable? I would need two (supposedly) very reliable drives to fail within a short span of one another for complete data loss. In that case, there is the external backup.

It just seems so unlikely for that to occur, especially knowing that all the other elements in the system (raid card, cabling, mobo, etc.) are stable (so it would have to be actual genuine drive failure as opposed to a faulty molex aplitter or some such), that I'm still sort of leaning towards RAID 5 at the moment. Am I really that crazy?

2 full disk failures isn't really the only problem. Ask yourself this.

What would your RAID controller do in this situation?

1 of your disks has failed. During the (long) rebuild, your controller needs to read every single sector from every single disk to recreate the missing disk.

What happens when even only a single sector is unreadable (URE) on a remaining disk during the rebuild?

Does your controller stop the rebuild? Does it just corrupt the one file that contains that stripe? Does the whole array crash?

Best case scenario is you have one corrupt file. Worst case is your whole array crashes.

Doing a verify of your array while it's healthy minimizes the above problem from happening because if you force the array to read every sector, if it encounters an unreadable sector there is parity that exists which it can use to determine the value and then remap that unreadable sector to a reallocated sector.

RAID 6 also pretty much completely mitigates this problem because you have a whole extra parity disk, so during the rebuild of one failed drive, if another drive encounters a URE it can use parity 2 to determine that sector. You would have to have a URE on 2 disks in the same stripe during a rebuild in RAID 6 to run into a problem. Or 2 completely failed disks plus a URE on a remaining drive.

UREs only become more and more common as the TBs increase. The difference between consumer and enterprise drives is that Enterprise drives are rated to encounter less frequent UREs.
 
Would RAID 50 in this configuration give you a speed bump (compared to RAID 5 or RAID 6) with the added advantage of smaller RAID 5 pairs (so smaller risk)?

I dont think <bandwidth> speed is any problem for what he's doing. Its sounds like access times are more his complaint (cover art + meta loading). I think thats going to be a limitation of platter drivers.
 
I like RAID-10, personally. But between 5, 6 and 50, I'd also agree with 6, given that you don't need tons of performance. For sensitive data, I like the idea of RAID-15 (a RAID-5 of mirrored groups) more than anything involving RAID-0. RAID-15 data drives = n/2-1. Not quite the solution for your array or anything... just thinking aloud.
 
Last edited:
For instance software raid ZFS can take long time to repair a raid if a large disk crashes. With future 6TB disks or even larger, the repair time might take one week or even longer, maybe. During that time, one single read error on another disk might corrupt the entire raid. You dont want corruption of the entire raid.

Go raid-6.
 
With future 6TB disks or even larger, the repair time might take one week or even longer

Only on very slow raid systems.

A 10 drive x 2TB software raid 6 array I have here at work using a core2quad CPU takes a 8.5 total hours to rebuild after a failed disk.
 
For instance software raid ZFS can take long time to repair a raid if a large disk crashes. With future 6TB disks or even larger, the repair time might take one week or even longer, maybe. During that time, one single read error on another disk might corrupt the entire raid. You dont want corruption of the entire raid.

Go raid-6.

ZFS is still better. I'd take RAID-Z2 over any hardware card's RAID-6, regardless of cost. There is literally nothing his card can do that ZFS couldn't do better. But he's already got the card and I'm not really in here to convert him to ZFS. I do think it would be a better idea, but I'm not going to push for it.
 
Only on very slow raid systems.

A 10 drive x 2TB software raid 6 array I have here at work using a core2quad CPU takes a 8.5 total hours to rebuild after a failed disk.

Yeah that would indeed have to be a slow system. I used to run software RAID5 and my array of 5x2TB 5400RPM only took 10 hours to rebuild on a 1.6GHz Intel Atom CPU with 1GB DDR2 even.
 
There is literally nothing his card can do that ZFS couldn't do better. But he's already got the card and I'm not really in here to convert him to ZFS. I do think it would be a better idea, but I'm not going to push for it.

"Literally nothing", except 1-disk-at-a-time online capacity expansion, more data recovery options and available software tools in the event an array becomes corrupted (raidz is a black curtain abstraction layer and if your pool decides it doesnt want to mount one day and your backup isn't current, you're SOL), and most importantly HW raid is independent of host O/S and doesn't require one to run the array.

I gotta say the enthusiasm shared by ZFS fans is usually helpful but gets a bit tiresome when its every thread as it can mislead people just trying to weigh all the options. ZFS is great but not nearly one size fits all for storage.
 
Last edited:
(raidz is a black curtain abstraction layer and if your pool decides it doesnt want to mount one day and your backup isn't current, you're SOL)
do your homework, buy/use quality drives. i haven't heard of this happening using stable code ZFS implementations in a long time.

you also mentioned single drive expansion. sure, that is possible with hardware raid and then you have to rebuild the entire array. zfs can do single device expansion with as few as two drives without requiring a rebuild/resilver. granted, it won't balance data/io across the new disks without moving the data in/out of the pool or just moving it to a new directory. I see this as mainly a niche and or home build feature though. typically with servers you either build out the server with all drive slots used or when you expand you expand, at a minimum, 2 disks at a time.

another thing you aren't mentioning is the process of online expansion after you add this new disk. these days most OS' handle this fairly well but with zfs it just automatically has more space. no growing the filesystem, no lvm bs, no windows disk mangler ... it just works.

and most importantly HW raid is independent of host O/S and doesn't require one to run the array.
technically so is zfs with the caveat being the destination host/os needs needs to read/understand the source's partition table. if those are true then you just import the pool.

ZFS is great but not nearly one size fits all for storage.
it really is one size fits all though. there is technically nothing 'wrong' with using zfs over top hardware raid. you can do this, even without exporting a bunch of single raid0s or exporting as JBOD. yes you lose some of what makes ZFS great but you 'can' do it.

i would use ZFS under linux in a heart beat over the god awful mess that is lvm + (insert filesystem here). yes, i am aware of zfsol but it isn't stable yet and i won't run beta code in production. oh and btrfs is still crappy.

just sayin, you don't have to run the 'ideal' setup to make great use of ZFS.
 
and most importantly HW raid is independent of host O/S and doesn't require one to run the array.

THE biggest disadvantage of HW RAID cards is that you need that card (or with some companies you'll have a few options for cards that use the same layout, but you can never be sure of that). Your card dies and then you've gotta go track down the same card and possibly make sure it has the same firmware version as yours. What if it's 5 years down the line and your RAID card is hard to find a replacement for?

I don't see that big a need to make the array OS independent, because for a STORAGE SERVER, you use an OS that is suited to storage, like Solaris (I personally use Illumian in a home environment, but would use Solaris for enterprise). madrebel pointed out some advantages of the tight integration ZFS has that actually does make certain things easier on ZFS than on other file systems.
 
I am using an Areca-1228 to run 8 4TB Hitachi 7K4000 Ultrastars in RAID 5. It is used for Blu-ray and DVD iso's so performance isn't necessarily the number one priority. I like that RAID 5 offers the most usable space but from what I read online almost no one uses RAID 5 in large arrays like this--but perhaps this may be because they are usually for enterprise purposes.

The array is backed up with an external JBOD system on a weekly basis, so even in the worst case scenario, failure would not necessarily be catastrophic.

Metadata and artwork is stored on the array, so random access speed is sort of important. Right now it takes a 5-7 seconds to load a screen of two dozen movie covers. I've considered upgrading to a more powerful raid card with dual core to help with this issue but they are extremely expensive.

The entire array is within the media server itself, with the RAID card plugged into the pci-express slot of a X58A motherboard with a Q6600 cpu

A resync on that will take what, a day? I think there is no question that this should be raid6.

Keep in mind that there is additional stress on the drive during the resync, raising the probability of either another drive death, or for a drive that was already wounded weeks ago and that you haven't identified due to the low load to be discovered.
 
Only on very slow raid systems.

A 10 drive x 2TB software raid 6 array I have here at work using a core2quad CPU takes a 8.5 total hours to rebuild after a failed disk.

Depends of the number of the HDDs in the array and how much data have on it too.
I have 16 drives x2TB in hardware-based raid 6 ,80% filled with data-it takes 24 hrs for rebuild at 50% background tasks priority .
 
Depends of the number of the HDDs in the array and how much data have on it too.
I have 16 drives x2TB in hardware-based raid 6 ,80% filled with data-it takes 24 hrs for rebuild at 50% background tasks priority .

I think that means your array tracks what stripes are used and only rebuilds the ones that have been written to. I do not believe linux software raid does that by default although it may do that if I had enabled the write intent bitmap. For me the rebuild is a full rebuild no matter if the array is 1% full or 100% full (the array is > 80% full) however I am doing rebuilds at 100% background task priority in effect by telling the array I want it to rebuild at 100MB/s minimum.
 
Last edited:
ZFS is still better. I'd take RAID-Z2 over any hardware card's RAID-6, regardless of cost. There is literally nothing his card can do that ZFS couldn't do better. But he's already got the card and I'm not really in here to convert him to ZFS. I do think it would be a better idea, but I'm not going to push for it.

I can always sell the card and get most of my money back. I want the best array possible, and frankly I hadn't even considered any RAID-Z options. Could you elaborate on your position?

First priority is usable storage space; second priority is random access times (For loading covers + metadata). If RAID-Z2 really outperforms my hardware RAID 6, I'd be interested in checking it out.

Unfortunately I'm limited to the Windows environment due to the crappy DVD disc management software. Being OS-independent would be nice but it's not really that important for this project.
 
"Literally nothing", except 1-disk-at-a-time online capacity expansion, more data recovery options and available software tools in the event an array becomes corrupted (raidz is a black curtain abstraction layer and if your pool decides it doesnt want to mount one day and your backup isn't current, you're SOL), and most importantly HW raid is independent of host O/S and doesn't require one to run the array.
But then you're ENTIRELY dependent on the RAID card as the sole means to access the contents of the drive. It's an even worse 'black curtain' than ZFS. At least with ZFS a pool (using a standard version) is portable between different controllers and even operating systems. So I can take a ZFS pool from a Solaris box to one running FreeBSD usually without doing anything other than 'zpool import'. Can't do that from one RAID card to another, especially not different brands. Sometimes between different models within a given vendor, but firmware versions are often critical.

Yes, tools have evolved to help dig into corrupted hardware arrays. The lack of them for ZFS says more about ZFS not needing them as you get a lot more realtime detection of errors. ZFS can see that a drive is not storing data properly and report this. Most hardware cards do not do this per-block detection.

I don't disagree with the concern over the 'black curtain', just your characterization of it as being a ZFS issue. It's just as much a problem (and possibly worse) with a hardware card.

I gotta say the enthusiasm shared by ZFS fans is usually helpful but gets a bit tiresome when its every thread as it can mislead people just trying to weigh all the options. ZFS is great but not nearly one size fits all for storage.

I don't disagree with you regarding the tiresome misinformation. It cuts both ways.
 
I am using an Areca-1228 to run 8 4TB Hitachi 7K4000 Ultrastars in RAID 5.

Metadata and artwork is stored on the array, so random access speed is sort of important. Right now it takes a 5-7 seconds to load a screen of two dozen movie covers. I've considered upgrading to a more powerful raid card with dual core to help with this issue but they are extremely expensive.

So the first question is why are you getting such poor performance getting to the metadata and covers? What about your configuration could be improved first?

Which model card is this? Specifically, and how are the drives connected to it?

Several things come to mind, one being the card. Are you certain it's in a PCIe slot configured to operate at full speed? Just because a card fits doesn't mean it'll be running at full speed. You can see what PCIx multiplier the card is running during the firmware boot. Look at the top of the screen during boot and make sure. Some motherboards have issues with how PCIe performance is distributed.

Then, what operating system and host software are you using to share the media?

When you asked this question on another forum you mentioned problems with the software handling metadata. Having a sharing setup that cached more of it in memory would do a lot to mitigate that problem. As in, a ZFS server with a lot RAM (+8gb) and possibly an SSD L2ARC. That way the often-used metadata files would be cached and served out of faster media instead of being pulled from slower rotating media.
 
I don't disagree with you regarding the tiresome misinformation. It cuts both ways.

I think (barring the examples) he is more referencing the fact that any data storage thread here concerning sizes larger than a single drive can provide always devolves into how ZFS is superior in every way to everything and can solve every problem you'll ever encounter. I'm being sarcastic, but your earlier comment does confirm this world view.

This is not to say that these claims aren't true or that you're wrong in this case, but alternatives have a tendency to be crowded out and marked as completely inferior, even if it can sufficiently solve the task.

So, no one here has heard of FlexRAID?

Isn't RAID-6 faster than FlexRAID in performance since its striped data transfer vs single disk transfer?
 
I followed Flexraid for a while but there are just too many questions surrounding the product. Development and support is pretty much a one man show with all the drawbacks that implies. Currently I am using WHS2011 with DriveBender. My next step is going to be either Windows Server 8 Essentials with Storage Spaces or a full blown ZFS solution. I am just waiting to save up some cash and see how Storage Spaces ends up on release.
 
I can always sell the card and get most of my money back. I want the best array possible, and frankly I hadn't even considered any RAID-Z options. Could you elaborate on your position?

...

Unfortunately I'm limited to the Windows environment due to the crappy DVD disc management software. Being OS-independent would be nice but it's not really that important for this project.

If you're definitely sticking to Windows then it's a moot point anyway. You can use ZFS in Solaris/OpenSolaris/Nexenta/Illumos/Illumian/other derivatives, FreeBSD, or even Linux (Linux isn't the best choice for ZFS though.. you either need to use FUSE (main disadvantage of FUSE: SLOW) or some third-party kernel modules that I am not sure I'd trust at this point). Unfortunately not Windows.

It's not like your card sucks.. I just like how well ZFS can fix problems and recover data better than other filesystems, and how it can take snapshots easily, how flexible the RAID options are, etc.
 
So the first question is why are you getting such poor performance getting to the metadata and covers? What about your configuration could be improved first?

Which model card is this? Specifically, and how are the drives connected to it?

Several things come to mind, one being the card. Are you certain it's in a PCIe slot configured to operate at full speed? Just because a card fits doesn't mean it'll be running at full speed. You can see what PCIx multiplier the card is running during the firmware boot. Look at the top of the screen during boot and make sure. Some motherboards have issues with how PCIe performance is distributed.

Then, what operating system and host software are you using to share the media?

When you asked this question on another forum you mentioned problems with the software handling metadata. Having a sharing setup that cached more of it in memory would do a lot to mitigate that problem. As in, a ZFS server with a lot RAM (+8gb) and possibly an SSD L2ARC. That way the often-used metadata files would be cached and served out of faster media instead of being pulled from slower rotating media.

I think you misunderstood me. I doesn't take 5-7 seconds to load one piece of artwork, it takes 5-7 seconds for a whole screen of ~20 covers to load. Some of them load within a second or two or instantly. That's with the old WD2003FYYS drives anyway.

The drives are SATA drives plugged directly into the areca ARC-1223-8I. The PCI-e the card is in is running at "8X/5G" according to the areca BIOS program.

I'm confident that raid array was running at full speed. It might take 5-7 seconds to load a full screen of covers, but that's because the program loads all 2000 covers at once instead of just seeking the covers the viewer can see onscreen.
 
Last edited:
I do use FlexRAID. I used to use Linux md RAID, but I moved to FlexRAID because a non-striped solution is much more appealing.

The software has been working perfectly and the questions I've asked in the forum were answered by the developer himself within 2 hours.

So my little experience with support has been awesome.

Though the beauty of FlexRAID is that the data is non-striped and is still just normal NTFS or EXT4 so you aren't really running much risk with using his software.

If you lost a drive I guess the worst case scenario is his software would fail and your rebuild of the drive would fail. But since its non-striped you would only lose that one drive.

FlexRAID doesn't write to your data drives other than when you issue a rebuild. It only writes to the parity drive(s).

I feel fine with FlexRAID especially since my whole array is also backed up offsite with crashplan.
 
I think you misunderstood me. I doesn't take 5-7 seconds to load one piece of artwork, it takes 5-7 seconds for a whole screen of ~20 covers to load. Some of them load within a second or two or instantly. That's with the old WD2003FYYS drives anyway.

The drives are SATA drives plugged directly into the areca ARC-1223-8I. The PCI-e the card is in is running at "8X/5G" according to the areca BIOS program.

I'm confident that raid array was running at full speed. It might take 5-7 seconds to load a full screen of covers, but that's because the program loads all 2000 covers at once instead of just seeking the covers the viewer can see onscreen.

Is the cache on your raid controller running in write back mode? If not try that.
 
FYI - your "cover loading problem" isn't for lack of disk throughput. Which software are you using?
 
FYI - your "cover loading problem" isn't for lack of disk throughput. Which software are you using?

I am using the "My Movies" plugin for Windows Media Center. I've done some benchmarks and it seems like the array is running at full speed. Are there any specific tests I should do that might prove otherwise?
 
Were these random IO tests? STR benchmarks will not measure random performance / IOPS.
 
Is the cache on your raid controller running in write back mode? If not try that.

Could this be the "Disk Write Cache Mode" option on my RAID controller BIOS menu? Right now Disk Write Cache Mode is set to "Auto," and I could also choose "Disabled" or "Enabled". If it matters, I have the battery backup unit installed and working.

There is also an option for "HDD Read Ahead Cache" which is currently set to "Enabled."
 
Back
Top