can someone explain how raid backs up multiple drives using only 1 or 2?

ekuest

Supreme [H]ardness
Joined
Feb 23, 2009
Messages
6,094
i think raid 5 and 6 do this? i am not an idiot, but i dont understand how you can have an array with like 10 drives and just 2 of those are for redundancy. does it sense when a drive is dying and write all the data before it popos out? im getting to the point where id like to build a dedicated server soon and try raid 5 or 6, but i want to understand it first. thanks!
 
Backup is a very wrong term. There is nothing backed up and also remember that using raid does not remove the requirement to make backups because there will always be the chance for the entire raid to fail or part of it being corrupt, or a virus or some one deliberately deleting the entire array...

However this should help explain how raid5 works:
http://blog.open-e.com/how-does-raid-5-work/
 
Raid isn't backup. It provides data protection. A raid array with redundancy (level 1, 5, 6, 10, etc) allows you to continue operation without data loss after the loss of a drive. It prevents downtime and speeds recovery. It also allows you to replace the failed drive(s) without shutdown to recover your data protection.

But please don't confuse it with backup...even with raid you still need good backup/recovery plans.

Now - with the obligatory "raid is not backup" comments out of the way - an attempt to answer your question.

With a raid5 or raid6 array the data is spread across all of the disks and a "parity" block is generated for every N-1 (raid5) or N-2 (raid6) blocks, where "N" is the number of disk drives in the array. The data blocks and parity blocks are written in a rotating pattern across the disks. For example, if you have a 4 drive Raid5 the blocks are ordered between disks this way:

1 - 2 - 3 - P
4 - 5 - P - 6
7 - P - 8 - 9
P - 10 - 11 - 12
etc.

As long as you can read a data block you are OK. You can recover any data block as long as you can read the "parity" block and all but 1 data blocks in the same parity group. Given the above, you can see that you can always recover any data block even with the loss of a single drive in a Raid5. Raid6 works the same way but with two parity blocks for each group, allowing you to lose two drives and still survive.

It would still work even if all of the parity blocks were written to a single "parity" drive. The parity blocks are rotated as shown above in Raid 5/6 to minimize the number of blocks that have to be "recovered' using parity calculations after losing a drive - which allows the performance loss in a degraded operating mode to be minimized.
 
Raid 10(1+0) =

D1 + D2 = mirrors(raid 1) of each other, lets call this virtual disk A
D3 + D4 = mirrors(raid 1) of each other, lets call this virtual disk B

A + B = Striped pair(Raid 0)

So you could lose up to two disks without losing data, one from each mirrored pair. If you lose both disks in a mirrored pair though, you are done for.
The downfall of this setup is that you lose one disk of each pairs capacity to redundancy. So a Raid 10 with 4 2TB disks is only 4TB.
The upside is that you usually see a performance gain since there is a stripe involved.

You can also do this with more than two pairs of mirrored disks and keep it raid 10 or you can do a raid 5+3 as well.

Raid 5+3 =

D1 + D2 = mirrors(raid 1) of each other, lets call this virtual disk A
D3 + D4 = mirrors(raid 1) of each other, lets call this virtual disk B
D5 + D6 = mirrors(raid 1) of each other, lets call this virtual disk C

A + B + C = Raid 5

The downside of this is that you lose one drive for each pair of disks but also one total pair due to parity.

Lastly
Raid 0+1 =

D1 + D2 = striped(raid 0), lets call this virtual disk A
D3 + D4 = striped(raid 0), lets call this virtual disk B

A + B = mirrored pair(Raid 1)

Same deal as Raid 10 but if you lose a drive in a pair it effectively takes that entire pair down.
 
Last edited:
Backup is a very wrong term. There is nothing backed up and also remember that using raid does not remove the requirement to make backups because there will always be the chance for the entire raid to fail or part of it being corrupt, or a virus or some one deliberately deleting the entire array...

However this should help explain how raid5 works:
http://blog.open-e.com/how-does-raid-5-work/

ok that link explained it very quickly and easily. i googled but didnt find anything that clear. thanks a lot!

so you guys say that it provides redundancy but not backup and i still need backup. why do i need backup if i have this? i can just keep a spare drive on hand to rebuild the missing drive while i RMA it so i dont risk losing another drive right? whats the point in backing up say 9TB when i already have a parity drive? right now i have 3x3TB drives backed up to 3x3TB external drives, but manual backups are a pain which is whats prompting me to investigate raid5.
 
The problem is that you need two totally separate systems to prevent large failures from causing data loss. The redundancy keeps your array alive but if your controller decides to corrupt every drive or a power spike takes out 75% of your drives you are out of luck.

Several levels of backups are recommended. I have a NAS with Raid 10, which makes a backup of my data to an external drive daily. It also makes a backup of my data to another external drive once a week on Sunday which I lock in a fire safe on Monday and plug back in on Saturday night. This prevents most failures from causing more than a weeks dataloss for me, short of a nuclear blast or a severe fire.
 
raid doesn't do anything against accidental file deletion, overwriting, OS corruption etc. Nor does it help if you if you have if you have multiple drives fail at the same time, which happens more than you think. A raid array is very stressed for a long period of time when it has to rebuild, you can easily lose another drive in that process. If your data is valuable, back it up.
 
so you guys say that it provides redundancy but not backup and i still need backup. why do i need backup if i have this? i can just keep a spare drive on hand to rebuild the missing drive while i RMA it so i dont risk losing another drive right?
The problem is that another drive could die during the RAID rebuild. Or the RAID rebuild itself isn't successful for a variety of reasons.
whats the point in backing up say 9TB when i already have a parity drive?
Pretty much what dave99 and catogtp said. Shit happens.
 
so you guys say that it provides redundancy

Redundancy is perhaps not quite the best word. Resilience is better. RAID 5 provides resilience against the failure of a single drive; RAID 6, 2 drives.

And you still need to back it all up in case the whole lot fails - lightning strike, fire, hurricane etc.
 
Redundancy is perhaps not quite the best word. Resilience is better. RAID 5 provides resilience against the failure of a single drive; RAID 6, 2 drives.

And you still need to back it all up in case the whole lot fails - lightning strike, fire, hurricane etc.

i think the word is "parity." :) ok yeah i understand. so really the best thing to do is to just always have all my data backed up. if i'm going to do this then what's the benefit of using RAID at all? better performance? at the sake of introducing a new variable that can take out the entire array? i think i might just stick with manual backups and no raid. also can anyone recommend me a good itx board that plays nice with the IBM M5015? with my gigabyte h67n, it goes to sleep with the computer and then doesnt wake back up, which means i have to restart my computer every time it falls asleep. i took it out and am just using the mobo headers now.
 
i think the word is "parity." :) ok yeah i understand. so really the best thing to do is to just always have all my data backed up. if i'm going to do this then what's the benefit of using RAID at all? better performance? at the sake of introducing a new variable that can take out the entire array? i think i might just stick with manual backups and no raid. also can anyone recommend me a good itx board that plays nice with the IBM M5015? with my gigabyte h67n, it goes to sleep with the computer and then doesnt wake back up, which means i have to restart my computer every time it falls asleep. i took it out and am just using the mobo headers now.

A raid array can give you better performance than a single drive.
A striped or parity raid array will give you resilience - you can still work even after a drive fails.
A striped or parity raid array can recover itself when you replace the failed drive.
A raid array can include "spare" drives that add even more resilience - the spare automatically replaces the failed drive.
Its about uptime. Availability. Performance. Protection. Your ability to use the data on the drives without disruption.

With "only" backups you have to stop and manually recover as soon as a drive fails.

You should still do backups...
 
-striped+mirrored

But otherwise yeah, like he said raid WILL protect you from one or more drive failures depending on the raid type.

Raid will NOT protect you from:

Virus or other malware corrupting or otherwise destroying data.
You being a dumbass and deleting then overwriting data.
You being a dumbass and dropping or otherwise damaging your server trying to do maintenence on it.
Your dog being a dick and peeing all over it.
A tornado.

That's why a proper backup plan will at least backup everything you'd rather not lose locally and everything you just can't lose both locally and remotely. (by stuff you'd rather not lose I mean stuff you could re-create if necessary and generally not used often, for me stuff like DVD rips, steam downloads, technet downloads, etc...)
 
But otherwise yeah, like he said raid WILL protect you from one or more drive failures depending on the raid type.

Raid will NOT protect you from:

Virus or other malware corrupting or otherwise destroying data.
You being a dumbass and deleting then overwriting data.
You being a dumbass and dropping or otherwise damaging your server trying to do maintenence on it.
Your dog being a dick and peeing all over it.
A tornado.

That's why a proper backup plan will at least backup everything you'd rather not lose locally and everything you just can't lose both locally and remotely. (by stuff you'd rather not lose I mean stuff you could re-create if necessary and generally not used often, for me stuff like DVD rips, steam downloads, technet downloads, etc...)

no antivirus so legit point
not a dumbass
not a dumbass
no dog
in hawaii so no tornado

hmmmmm...
 
I am concerned about your use of the word "manual backups". To me that screams you just copy files to another drive. If so I would argue this is not a backup.

1 backups should be automatic, humans make errors.

2 backups should be incremental and versioned so that you can go back in time and retrieve older copies. What happens if something corrupts your data and you do not discover until after you backup that corrupted data.

3 backups should be disconnected from the machine being backed up, preferable to be in a separate location.


P.S. if you are human, trust me you are a dumbass. We all are. We all make those stupid mistakes.
 
OS drives in all my computers backup automatically to a non-os drive in my server. i also manually back up now and then everything on all my server drives to external drives.
 
I never lost data due to a failed drive, but by accidential deletion, a flaky power connection to a harddrive cage and the worst loss was a 160 GB HDD on a system that after a windows update recognized only 128 GB and totally messed up the filesystem (LBA48 problem). RAID implementations cannot protect you from such events, and I always recommend automated incremental backup to a second drive instead of a raid1.
 
ive lost data due to a failed drive, but it was backed up to an external. :)

another question: i have my drives just JBOD right now, 3x3TB. is there a way to automate the process of backing up each one to a 3tb external so i just plug it in, it adds anything missing, then i can unplug it? i dont want to have to keep track of whats new between backups.
 
data - data - data - data - checksum
data - data - data - checksum - data
data - data - checksum - data - data
etc.

If the known data adds up to even and you know its supposed to be an odd, you know something is wacky.
 
ive lost data due to a failed drive, but it was backed up to an external. :)

another question: i have my drives just JBOD right now, 3x3TB. is there a way to automate the process of backing up each one to a 3tb external so i just plug it in, it adds anything missing, then i can unplug it? i dont want to have to keep track of whats new between backups.

Yes any real backup software will handle incremental backups. NTbackup is one that most people have.
 
The short answer is "XOR". XOR is a binary operator that takes two inputs and produces one output. The rules are:

1 xor 1 = 0
1 xor 0 = 1
0 xor 1 = 1
0 xor 0 = 0

Parity in RAID 5 involves reserving some space for parity information. Parity data is an additional digit of information that helps you recover lost data.

Another way to describe this parity is "even parity". That means we try to keep the number of "1" bits even. If there are 2 "1"s, the parity is "0". If there is only 1 "1", the parity is "1". In short:

1 1 parity 0
1 0 parity 1
0 1 parity 1
0 0 parity 0

In practical terms, the data on the disk is stored in cylinders. A cylinder is all the data that passes under the disk head in one revolution. Cylinders are grouped into stripes, which are the same-numbered cylinder across all the drives. So, stripe 1 is the group of all the cylinder 1s across all the disks. (Note that this is the idealized RAID. Actual RAID is less tidy, but the stripe is the basic set of data that is protected.)

RAID calculates parity across cylinders, within a stripe. So if you have a three-disk RAID 5 array, and your data is on stripe 0, two of the cylinders hold data, and the third cylinder holds the parity.

Parity is calculated across the cylinders. (It's not calculated within a single byte, the way it is on networks.) So if cylinder 1 on disk 1 looks like this:

10010011101110110001101...
And cylinder 1 on disk 2 looks like this:

10000001000000010000000...
Then the parity looks like this:

00010010101110100001101...
Here are the three cylinders again, but closer together so it makes sense:

10010011101110110001101...
10000001000000010000000...
00010010101110100001101...
Notice how there are an even number of 1s in each column of bits. The parity is even.

That is how RAID can recover from a lost hard disk. If you replace a disk, you can rebuild it because you know that there should always be an even number of bits in each column.

If you lost the second disk, your data suddenly looks like this:

10010011101110110001101...
_______________________...
00010010101110100001101...
You can rebuild the second disk by setting the 1 and 0 bits based on our parity rule, that there should be an even number of 1s. Here's an example of regenerating the first four bits:

10010011101110110001101...
1000___________________...
00010010101110100001101...
All we do is repeat this calculation several billion times, and our data is rebuilt.

Now, RAID 6 with 2 disc parity is a different beast. It uses the same underlying concepts, but uses more complex equations than simple XOR to calculate the parity.
 
data - data - data - data - checksum
data - data - data - checksum - data
data - data - checksum - data - data
etc.

If the known data adds up to even and you know its supposed to be an odd, you know something is wacky.

Yeah but that wouldn't help you reconstructing the original data. It's not just a "checksum".
 
Yes any real backup software will handle incremental backups. NTbackup is one that most people have.

ok thanks, ill look into that one. are there any other programs anyone recommends, specifically of the free variety? :p
 
ok thanks, ill look into that one. are there any other programs anyone recommends, specifically of the free variety? :p

I recommend CrashPlan. The free version is for local and remote backups to your own computers. You only have to pay to use their cloud servers as a destination.

It supports everything you would want like keeping deleted files, file versioning, incremental backups, and much more.
 
I recommend CrashPlan. The free version is for local and remote backups to your own computers. You only have to pay to use their cloud servers as a destination.

It supports everything you would want like keeping deleted files, file versioning, incremental backups, and much more.

I also recommend crashplan. Been using it with the paid plan for a couple of years now.

Course I also use NTBackup for baremetal restores as well. I highly recommend if you are serious about backups to have 2 backups. With one being offsite. It is also not a bad idea for them to use different services. While crashplan is great, if you use it with both backups and it does something very wrong for whatever reason both backups are toast. Course I am a bit paranoid, so you just need to figure out your level of paranoia.
 
The short answer is "XOR". XOR is a binary operator that takes two inputs and produces one output. The rules are:
...
Now, RAID 6 with 2 disc parity is a different beast. It uses the same underlying concepts, but uses more complex equations than simple XOR to calculate the parity.
Damn good explanation, certainly better than mine.
 
hey all, first off i wanna thank everyone who helped me understand backups and RAID and recommended some backup programs. im finally getting around to organizing my backups, and i cant figure out how to do what i want in either windows "backup and restore" (which i think is equivalent to NT backup, no?) or crashplan so im back asking for help again.

i have a 3TB drive for movies, and a 3TB drive for tv shows. i also have 2 external usb drives that are also 3TB each. i want to set it up so that every time i plug in either one, it will detect which one it is and back up the corresponding drive in my system. if i cant do that, then i would at least like to set it up so i can plug in my movies b/u drive, and tell the program to save any changes to my internal movies drive to the external. i dont want to leave my backup drives plugged in 24/7 so they can automatically back up to avoid the wire clutter.

tldr: i need to set up 2 separate backups from 2 internal drives to their corresponding external drives without leaving them plugged in 24/7 and just scheduling automatic backups.
 
Back
Top