Is RAID useful in a personal backup environment?

Hello once again, after yesterdays post I received  this comment “Looks (at HLD at least) very simple…no RAIDs or such like to be seen” and I felt it would provide me an opportunity to state why I am not using RAID in the design for my personal backup solution.

So some of the less enlightened (lucky souls!!) may not know what RAID is, well RAID stands for Redundant Array of Inexpensive Disks or in laymen’s terms hard disks are relatively cheap lets use them cleverly! For a brief overview I recommend looking at the Wikipedia page or this How Stuff Works video. RAID is characterised by using specific configurations on a number of disks to create a single ‘virtual’ disk with the main configurations being the RAID 0, 1, 5, 6 and 10. Which in my opinion can be described using the pretty pictures below (credit to Wikipedia).

  • RAID 0, data is halved and placed on 2 different disks, improves performance, decreased resilience.
    Raid 0
  • RAID 1, Mirroring an exact copy of the data is stored on two separate disks, this increases resiliency and read performance but not write performance.
    Raid 1
  • RAID 5, this is called block level striping with distributed parity and is effectively where the raid controller (that looks after the array) generates a parity, which basically involves doing some maths like this Disk 1 Data + Disk 2 Data = Parity (RAID5 needs minimum 3 drives) if Disk 2 fails then the data on it can be recalculated by doing Parity – Disk 1 Data = Disk 2 Data. All this clever maths means that you can recover from a single drive failure however the generation of the parity means that you take a performance hit. This ‘distributed’ bit means that the parity is spread over all the disks in the array like the diagram below.
    Raid 5
  • RAID 6, is double distributed parity and builds on RAID 5 but provides the ability to recover from two disk failures for increased resilience.
    Raid 6
  • RAID 10, this is also easily described as RAID 1 + 0 and involves a pair of striped, mirrors and thus reaps the benefits of performance and resiliency from a single drive failure. The main downside however is the expense due to the requirement for a minimum 4 disks for 2 disks worth of capacity.
    Raid 10

So RAID seems like a good idea to provide resilience (apart from RAID 0) doesn’t it? Well yes and no as I am going to go through and explain why I chose not to use any of these in my solution.

  • RAID 0, not resilience and expensive, personal backup storage doesn’t need to be fast.
  • RAID 1. this is closest to what I am actually going to be implementing (off site mirror) however doing it locally only protects you from a drive failure, if your Power Supply fries your computer, you have a fire, flood or anything else your resilience is useless. This is in fact the main reason for not picking RAID at all. RAID 1 also is wasteful in terms of hard disk because you only get half as much hard disk capacity displayed as you actually have. In other words if I have two 3TB drives (like I do oddly enough) totalling 6TB of Storage I would only see 3TB of it.
  • RAID5, now this is an interesting one that I used extensively at IBM with what is termed a ‘Hot Spare’ generally the servers we used would have 6 hard disks. 5 of them would sit actively in the array in use (displaying 4 drives worth of capacity) then if one of these drives failed the hot spare automatically kicked in and the array was rebuilt onto it once again providing resiliency.  RAID5’s major weakness in my eyes however is that in the state of rebuilding an array (which can take hours, or even days with large SATA disks in the TBs region) it is incredibly vulnerable to another drive failure. You may think that well that is probably a 1/1000 risk or something like that, however as this blog post from John Cook explains often an array can be built from disks from the same batch, which have a higher probability of failing at the same time. Which brings me quite nicely onto RAID6.
  • RAID6, in my most recent job at Atos I discovered that RAID6 is what the storage giant EMC recommend for use with high capacity SATA drives as it provides two drives worth of redundancy and protection whilst rebuilding an array. This is what I would use if it weren’t for one problem, RAID6 needs a minimum of 4 disks to provide 2 disks worth of capacity admittedly this gets better as you scale up to 6, 8 or even 10 disks but for me it works out very expensive. If for sake of argument I bought 5 1TB disks (providing a total 3TB of raw storage) it would cost me £250 at time of writing, this vs the £80 I spent for a single 3TB disk some 3 times more money.
  • RAID10, again very expensive for not much tangible benefit.

So in conclusion I decided that RAID wasn’t going to provide with a any real benefit in my environment as all my data is going to be held in 3 different places (the originating device, the local backup server and the remote backup server) providing me with 2 drives worth of redundancy (sound familiar!). Anyway that is all from me today.

 

Leave a comment