"RAID is not a backup" ..ok then, what is?

Dark Prodigy

Jawbreaker
Joined
Mar 10, 2006
Messages
2,803
I read this statement a lot on these forums... yet so many people use it.


According to this, the only true "backups" are discs(which can get damaged), offsite(pay subscription) and true tape drives(super expensive).


What's your "backup" solution since RAID is not/should not be one.
 
1. Use a real backup program.
2. Backup only what is needed
3. Make multiple backups of what is needed.

I have monthly fulls scheduled on my home network to two independent hard drives. On top of the monthly fulls I do daily incrementals and weekly differential backups. The system rotates what hard drive to use for backups monthly so last months backups are on 1 disk while this months backups are on the second disk. For me only what is needed is 100 to 200 GB out of > 6TB of data that I have.
 
According to this, the only true "backups" are discs(which can get damaged), offsite(pay subscription) and true tape drives(super expensive).

That pretty much covers it. If you want cheap, back your important stuff up to DVD-RW's and take them off-site somewhere. It's all about minimizing the impact of potentially catastrophic events.
 
RAID is no more of a backup than a single hard disk (or a single copy of your data) is a backup. you can back up your data to a RAID array. but just having disk parity information does not protect against flood/ fire/ theft/ virus/ nukes/ etc. what RAID does do is prevent downtime and complete data loss in the event of a single hardware failure. its more protection than a single disk, but does not replace or equal a backup.
 
Once a month to an external drive or when major change occur. The the external drive is just stored on a shelf.
I may be changing the way i back up this Christmas and having 2 external drives and rotating them offsite, storing one at work.
 
RAID is no more of a backup than a single hard disk (or a single copy of your data) is a backup. you can back up your data to a RAID array. but just having disk parity information does not protect against flood/ fire/ theft/ virus/ nukes/ etc. what RAID does do is prevent downtime and complete data loss in the event of a single hardware failure. its more protection than a single disk, but does not replace or equal a backup.

You expect nukes? :eek: Honest to goodness nuclear warheads?
 
I took that as a joke but also to mean some other unexpected reason for data loss..
 
nukes or zombies or ze germans invading, take it as you will. RAID does not protect against any of them.
 
What's your "backup" solution since RAID is not/should not be one.

All my critical data are backed up to three external hard drives that are then taken off site to my family's house. The rest of my data as well as copies of that critical data is stored on my WHS server. In addition, I also have copies of both data in my main rig.

So even if my family's house burns down, I still have two copies of that data stored on my WHS and PC.
If my house burns down, I still have a copy of my most important data located in those external drives.
If my gaming rig goes down for whatever reason, I still have two copies of data stored on my WHS and external drives.
If my WHS setup goes down for whatever reason, I still have two copies of data stored on my PC and external drives.
If my external drives goes down for whatever reason, I still have two copies of data stored on my PC and WHS server.

That's backup. Having a single RAID array with the only copy of your data is not backup.
 
It's still more than that as an Internet raid1 would not be a backup. Its multiple snapshots of your important data. And I mean snapshots. Also as above some users also take steps to make sure that some of these copies are offsite to guard against fire / theft ...
 
So, the medium you use to store the data on doesn't matter then. "Backup" is simply multiple location stores of the same data... whether its on a tape drive or disc or hdd.
 
To me it is a matter of how much the data is worth to you. Shades of gray if you will...

For example, I am currently working on moving my media archive from a raid 0 to a raid 6 array, because to me it is at least worth having that extra peace of mind that barring some sort of unforeseen catastrophe it will most likely be safe. Now, is this data worth having 2 mirrored raid 6 arrays? Or backing up to LTO4/5s? I can't quite justify that cost for that kind of data...
 
A backup is something that sits on another filesystem and shares no hardware with the original.

With RAID you have redundancy, but you share the same filesystem. Filesystem corruption or accidental deletion or virusses that infect your files; many things can cause dataloss.

Also i believe the protection that traditional RAID gives you is being overestimated; we all seem to assume that the RAID engine itself can not fail, and that it is the perfect RAID engine; but the real world has actual implementations, rather than the theoretical RAID scheme which tells you "this should be possible when using RAID".

For example, the BER issue threatens traditional RAID systems. And TLER only aggravates this situation. We desperately need more modern filesystems (like ZFS and Btrfs) available to the mainstream. For now the best thing you can do to protect your data is: backups!

I use ZFS, but also use a full backup fileserver. Many people do not need this level of data security, but using an external drive to backup your most important stuff (family pictures, work, letters) is highly recommended!
 
So, single external independent disks are better than say... an external RAID box like an EX-50 or Drobo (just examples) which uses its own software for RAID and can connect to any PC with its RAID data intact.
 
The external disk example i mentioned was meant to act as backup, where you use something else like an internal RAID array for primary storage. Then you can disconnect your external disk, ruling out a number of factors already. USB3 external casing + 2TB 5400rpm disk like Samsung F4 would be a very good choice for this task.

You can make the backups yourself, or use some software to automate this, possibly using incremental backups. I'm not too familiar with what is available on Windows, however, but there should be many solutions that allow you to easily sync with your external disk, say every week or so.

Then you can use RAID and other stuff without too much headaches if something goes wrong: you have a backup! Isn't that comforting? Why put your data at risk with 2TB disks so cheap now; buy more and use them as backup! Then if you want you can even use RAID0 on your primary storage, considering that you don't mind losing the changes since last backup cycle and the efforts required to recover from a disk failure.

My rule of thumb is: any disk currently in your possession, somewhere, should be able to fail, die completely, without you having a bad day.
 
A backup is something that sits on another filesystem and shares no hardware with the original.
My rule of thumb is: any disk currently in your possession, somewhere, should be able to fail, die completely, without you having a bad day.

There ya have it. :)
 
I figure that hard drives are dependable. Have not had a data loss in 20 years.

My backup is

Every day: Zipping changed files and mirroring the data drive to 3 hard drives. One drive is on our disaster escape route.

Every week: One of the backup hard drives is swapped with a backup hard drive that is kept off site.

Every month: A new mirror is started on each of the 3 on site and 1 off site hard drives. (2 years of backup data fits easily on a large hard drive.)

Each year: The previous 2 years of back up information is stored in 2 locations - in my detached garage in a 2 hour fire media safe and 20 miles away in a tornado shelter. Previous backups are tested and usually consolidated to newer and larger hard drives. It is amazing that I tend to have a large number of large hard drives that have been lightly used for a couple years and are available for this consolidation.

---

It takes under 15 minutes to restore operations if the server fails and takes the data files with it.
 
I figure that hard drives are dependable. Have not had a data loss in 20 years.

A single hard drive is not however. It is expected that > 1% of your hard drives will fail each year. At work (where I usually have 200 to 400 drives spinning 24/7/365) over the last 15 years we have a 1% to 7% annual failure rate on drives. And these are not just 5+ year old drives a number of failures were from drives that were less than 2 years old.
 
You should keep backups, and you should keep backups in a RAID.

I guess I don't understand this question.

I have locations with about 1TB of business data on a raid.
That data is backed up incrementally on a different RAID.
Every month, new data is backed up to a third standalone drive which is stored offsite.
 
raid = 0 down time if a drive fails
backup = i accidentally delete something and i need to retrieve it off my backup medium.

if i accidentally delete something on my raid 1 array, it's gone across all drives.
 
and you should keep backups in a RAID.

I disagree with that part of what you said. There is no need that a backup be on a raid versus a single disk, tape or other media. You do however need to have more than 1 backup on different hardware and filesystem.

A single raid is not a good place to store 100% of your backups because raids do fail and they can corrupt the entire raid. I have seen this happen with expensive hardware raids more than once at work (when I was not in charge of the data).
 
DVD-DL's in a fireproof safe, inside a larger fireproof safe. Anything that won't fit on those I consider expendable.
 
I do amateur video work for my kids and their friends. And I'm paranoid...

My working PC has Raid for data integrity/single drive failures.
- Large data raid-6 for performance a simple data protection (8x 2TB Hitachi, raid6, Areca controller)
- System disk is 4xSSD raid 0 for performance.

System disk is ghost-imaged daily to a local, bootable copy on a 500Gb hard drive, fully automated by Norton. I completely expect the raid-0 to fail.
System disk ghost image is made weekly to separate WHS. Just in case I screw something up and need to go backwards. I usually keep the last 6-months or so plus selected older snapshots.

Data array gets "sync" backup daily to WHS server, automated at 3am. I happen to use Goodsync for this but there are litereally dozens of sync products on the marketplace.

Monthly, the same Sync program is used to copy all "new" video files on the WHS to a hard drive. I define "new" as <62 days old. Drive is taken to a storage shed away from home. I'm generating less than 500Gb/Month right now and 500Gb drives are dirt cheap.

Every 3 or 4 months, I copy what I consider "high value" files, mostly my wife's digital photo's, onto a Blu-Ray and store them at the storage unit. I use Blu-Ray because - unlike DVD - the recordable disks do not contain any organic materiel and are considered "archival" quality. Theoretical readable life is 50-100 years (vs 4-5 years for DVD+/-R).

This somewhat paranoid protocol was put together after a raid-controller failure completely wiped a 5x1.5TB raid array that I thought was completely safe. Safe from single drive failures - yes. Safe from completely whacked raid controller that decided to randomly write all over all 5 drives, not so much. Since I've been doing this it has saved my A$$ several times, usually from human error rather than equipment failures (e.g., ah crap, why did I over-write that file).
 
I have the following setup (just including the storage part):

Two filservers at home:
server 1: 8 x 1.5TB in RAID6 (hw)
server 2: 8 x 2TB in RAID6 (sw)

Then at my office desk at work (a real backup has to be offsite!) i have put a 3. fileserver which is a dedicated backupserver. It accesses the internet through the local internet solution at work.
backupserver: 5 x 1TB in RAID6 (sw)

This backupserver makes a complete rsync snapshot (using --link-dest) of both of my servers at home every night. I backup everything on the storage disks except dvd and bd .iso-files and a temp folder for things like torrents in download. In total over 700GB for one snapshot. I have selected a retention time for 8 months before deleting old backups today, but I'll probably increase this as long as there is disk space. That means that I can go back to any day less than 8 months ago and get back my files.

This setup has no issues running over a slow ADSL with only 400kbit/s upload as long as the first backup is done over gigabit at home.
Now I have 20/20 mbit/s fiber (and 40/40 at work) so it's usually over in seconds, or a few minutes (encrypted in IPsec tunnel of course, or rsync using ssh if you want to)

I have made a backup script which take all scenarios I can think of into account, so I will never ever think I have a complete backup without it actually beeing complete.
It also sends the backup status to my gmail address every day, so I can immediatly see if something is wrong.

The backupserver establishes an IPsec tunnel to my home firewall, so I can easily reach it for management or fetching backup files (through samba file sharing or scp) even though it is protected behind two firewalls and double nat at work. Opening ports at work is obviously not an option so I went the tunnelling route using StrongSwan. All my servers are running Debian Squeeze.

All in all I have to loose 3 disks at home AND 3 disks at work at the same time to loose any data. Since I get the status on email I always know that I have a complete backup, so I don't need to worry about that.
And because I have a complete snapshot every day for 8 months, I won't run into problems with things like accidentally deleted files, virus infected files or something else that isn't detected at once, like I would if I only used a normal data syncronisation setup like many people do (which is still far better than nothing of course)

In addition, using RAID with parity gives the advantage that it will correct uncorrectable bit errors when doing the montly (in my case) scrubbing of the RAID. If you aren't using any kind of redundant RAID (or ZFS equivalent) correcting this kind of disk errors is not possible (unless you use a backup software that handles this)

I also do nightly short S.M.A.R.T testing of all my drives, and weekly long tests, together with continuous monitoring of the servers with warnings sent to my gmail address. This will usually warn me if a drive is about to fail well in advance or tell me about any other hardware or reliability problems (fans, temperatures etc.). Making the backup solution fully automatic and "self-repairing" is a must! If not, your backup will probably not be updated as often as it should be.
 
Last edited:
I disagree with that part of what you said. There is no need that a backup be on a raid versus a single disk, tape or other media. You do however need to have more than 1 backup on different hardware and filesystem.

A single raid is not a good place to store 100% of your backups because raids do fail and they can corrupt the entire raid. I have seen this happen with expensive hardware raids more than once at work (when I was not in charge of the data).


Not everything in a single RAID, of course.

However, depending on the nature of your data I would certainly keep more redundancy... especially when storage is so cheap.

In the particular case I was referring to, the live system is on a RAID10, the daily backups are a complete image of the system on a RAID5.
This is for hardware redundancy and to speed up restoring.

Restoring the entire system from incremental offsite backups is a pain, and takes time. So the onsite backup is protected against hardware failures itself.
 
I read this statement a lot on these forums... yet so many people use it.


According to this, the only true "backups" are discs(which can get damaged), offsite(pay subscription) and true tape drives(super expensive).


What's your "backup" solution since RAID is not/should not be one.

RAID isn't a backup solution because the data is still in one place. It also tends to be live (online) and thus is vulnerable to viruses, data corruption and other potential risk factors. It isn't that disks or tapes are the only form of backup which is acceptable. The reasons people use media such as those two are numerous

1.) These forms of media are cheap.
2.) Media such as tapes and CD's are easily and safely transported without as much risk of damage as hard disks are.
3.) Media like CD's and tapes are small and thus can be stored in offsite storage facilities while taking up minimal space. Thus reducing the cost of storage.
4.) Large amounts of removable storage for a low price.

There is actually a fifth benefit to using media like a CD or tape. While these forms of media are easily damaged, they are somewhat more resilient than say a dropped hard disk drive. Additionally what kills a hard drive won't always kill removable media. CD's aren't terribly vulnerable to moisture, being dropped, etc. and scratches are easily repaired through resurfacing. Additionally they are not effected by magnetic fields the way magnetic forms of media are. Tapes are generally destroyed by anything that would destroy a hard disk except that tapes can be dropped with a little less risk than hard disks. Though their advantage compared to CD, DVD, and Blu-Ray media is size. These tapes can now handle more than a terrabyte if you go high enough end.

The benefit of offsite storage should be self-explanatory. If there is a flood, fire or earth quake the odds of data being in two separate locations both being destroyed is very small. The further apart the data is offsited, the better. You don't necessarily need iron mountain. I leave data in a friend's gun safe. It isn't particularly sensitive, but it is important to me. If you don't want the data used in any way you can't predict or won't accept, then encryption is a possibility for added security.

You could store your data on another drive and leave it at a friend's house but that's riskier than other storage methods and costs more. See the pattern? RAID isn't a backup and never was intended to be. It simply maximizes uptime which matters to businesses with mission critical applications and servers. It also in some applications adds performance to the server where multiple disk reads are concerned.

1. Use a real backup program.
2. Backup only what is needed
3. Make multiple backups of what is needed.

I have monthly fulls scheduled on my home network to two independent hard drives. On top of the monthly fulls I do daily incrementals and weekly differential backups. The system rotates what hard drive to use for backups monthly so last months backups are on 1 disk while this months backups are on the second disk. For me only what is needed is 100 to 200 GB out of > 6TB of data that I have.

It would depend on your goals. A real backup program isn't necessary if you just need a handful of files. Typically what's built into Windows is sufficient for automation if you need it. I store everything I need in a few folders and simply copy what I need to a CD or DVD. That's pretty much it for me. I'd reinstall the OS on any system that died and if I didn't want to have to do that, Ghost or any image cloning software would be enough. Having two hard drives on the system or even in the same house really isn't that good of a plan for backing up data. It leaves the disks in a partially online state, or fully online if both are connected full time. They are both in danger from power spikes and the same environmental conditions. Fire, flood etc.

Most home user's backup methods are atrocious when analyzed and thought out to any degree. Even IT professionals and enthusiasts generally don't place very much emphasis on proven backup strategies thinking they've got it covered because they snagged a copy of Arcserve from the office. (Or whatever application they chose.) It's about location, minimizing risk to the data itself, and keeping it from harm. You can't do that if it's always accessible to viruses, hacking, theft, etc.
 
In my opinion, really good backup should always be offsite and online (as in always on and verified, not internet), not offline. It is okay to have a third level of backup that is offline, as an extra world war III backup.

The reason for this is that you never know if the DVD, tape or external harddrive actually works when you pull it out of the safe/storage and actually need the data on it. I feel much more comfortable at work after we started to use online backup (using a professional company) and not the old manual routine tape hell which we were never sure if were actually okay if we needed them.
 
Two stages of backup is a good idea. Having something always online makes recovery much easier and I'm not arguing against that. Functionality of offline / offsite storage should be verified before you put it away. Simple as that.
 
Functionality of offline / offsite storage should be verified before you put it away. Simple as that.

Of course it worked when you put it away. If it didn't you wouldn't put it away as your backup.
But it is entirely possible that the external drive decides to die just when you power it up to retreive your backup, or that the burned DVD is not readable because the discs is not very compatible with your current DVD player.

My father actually thought he had a backup of all his photos on DVD until he by chance checked it one day, and it was empty! It turned out that there was a software problem with his computer that did everything as it was supposed to, inkluding going through the complete burning process, but the data wasn't there when checked on any other computer. The burning software only cached it in some way.
This is of course a very special case, but it almost went wrong and backup is all about having thought of all these special cases IMO. It is probably not the obvious one that is going to hit you.

But as you can see a couple of posts up. I am paranoid when it comes to data integrity and backup.
 
Last edited:
Of course it worked when you put it away. If it didn't you wouldn't put it away as your backup.
But it is entirely possible that the external drive decides to die just when you power it up to retreive your backup, or that the burned DVD is not readable because the discs is not very compatible with your current DVD player.

My father actually thought he had a backup of all his photos on DVD until he checked it one day, and it was empty! It turned out that there was a software problem with his computer that did everything as it was supposed to, inkluding goiint through the complete burning process, but the data wasn't there when checked on any other computer. The burning software only cached it in someway.
This is of course a very special case, but it almost went wrong and backup is all about having thought of all these special cases IMO. It is probably not the obvious one that is going to hit you.
Re: the backup drive dies on power-up. This is exactly why my monthly archive contains the last two months of data. Over time every file should appear on at least two separate archive disks.
 
Re: the backup drive dies on power-up. This is exactly why my monthly archive contains the last two months of data. Over time every file should appear on at least two separate archive disks.

It is smart doing it that way. It could be as easy as just dropping and breaking your DVD or external drive when you fetch it from the safe. As I said, it's often the things you haven't thought about that is going to happen.
 
raid = 0 down time if a drive fails
backup = i accidentally delete something and i need to retrieve it off my backup medium.

if i accidentally delete something on my raid 1 array, it's gone across all drives.
this, except backup can include any kind of file loss. I see redundant arrays as more about downtime of the machine than anything else.
edit: you can definitely use redundant arrays as a backup solely against drive failure. But you have to remember that in this case it is the only thing it protects against, so it's hard to call it a complete back up solution since it doesn't protect against so many other things.
 
I use ZFS, but also use a full backup fileserver. Many people do not need this level of data security, but using an external drive to backup your most important stuff (family pictures, work, letters) is highly recommended!
Yeah whatever
Everyone who is on HardOCP should have a ZFSGuru box for their files, anything less is a disgrace

How many people have their music stored as flac these days?
How many people have ripped their movies from dvds into files?
How many people are taking RAW images at family parties?

I'm not going to bother with the bullshit of having DATA STORAGE be my problem
If you are at home doing this: buy four of the largest drives you can afford : 2 RAID1 ZFS arrays, snapshots enabled. A drive fails? Who cares you still got the array working PLUS ANOTHER DAMN ARRAY WITH YOUR DATA
Your mom deleted her photos? No Problem restore from snapshot, they are so lightweight you can make them EVERY FIVE MINUTES!
Fire? Just pull out one of the drives and run to cover. When you're finally back at a computer plug in the drive into any zfs capable box and zfs import that ho. DONE!
 
RAID (except RAID0) is mostly a tool to avoid downtime when a disk fails and to get speed, but there is more to it.

I obviously use it because it's much faster, but from a data integrity perspective it has more uses which can't really be replaced by a backup alone (in a disk failure scenario):

- RAID is always current. So if you loose a drive you won't loose the data that has changed since last backup.
- RAID also reduces the need to actually use the backup. This is good because you probably have your data only one physical place if you actually need your backup. If you have RAID you still have your data two places even with a failed disk.
- RAID scrubbing protects your data much better from "bit-rot" over time. So your data will not go corrupt easily. This is why also the backup should be using some kind of RAID or other error protection.
 
One of the biggest problems with rebuilding raid arrays is downtime
This is why ZFS has a major advantage. I'm not building 2 boxes just in case 1 array is down for rebuilding.
Chances of another drive failure during a rebuild is much higher too!
RAID is a double edged sword, more complexity = more things to go wrong.
 
One of the biggest problems with rebuilding raid arrays is downtime
This is why ZFS has a major advantage. I'm not building 2 boxes just in case 1 array is down for rebuilding.
Chances of another drive failure during a rebuild is much higher too!
RAID is a double edged sword, more complexity = more things to go wrong.

There is no downtime rebuilding RAID. Depending on your RAID selection there will be different performance hits, but no downtime.
Modern arrays with large TB drives should definitively be double fail resistant because the chances of a second failure caused by a unrecoverable bit error is statistically plausible. This is why I use RAID6 at all my servers, including the backup server.
RAID isn't complex. It is more complex than a single drive, but it is easily created and maintained.
ZFS is king. I (as a Windows and Linux guy) envy you. Really looking forward to btrfs for that reason.
 
So, "backup solution" means multiple external locations of important data..RAIDed or unRAIDed...

No one here is talking much about tape backups... which I heard was the only real backup solution.
 
Tape is so yesterday. But until recent years it was the most economically way to store a lot of data offsite (you need a manual tape change routine. Using a BIG robot alone isn't enough as some people think). It is still probably the most economically way to store rarely accessed data long term in enterprises, because they don't use cheap 2TB S-ATA drives for storage.

For home users tape has no usage because storage is so cheap for us. Tape is not.
 
I use tape drives for occasional backups with my Linux system and using RAID 0 for added drive space. Ideally I would like to use RAID 5 on my PERC, since having the loss of a disk in RAID 5 is ok since it can rebuild. I'm not so big on RAID 0, it's not fun when one drive kills the array.

Tapes are ideal for mass storage, but they should be done on a regular schedule.

Do remember that tape drives, even the faster models are very slow, sometimes only 10-20MB/s, mine averages around 8-10MB/s.

So when backing up multiple TB of data, you want to make sure that the tape drives run throughout the night, rather than having them set for a schedule during the day when a SAN/NAS/DAS or server drives might have heavy usage.
 
Back
Top