I am in no doubt sure that as I start writing this post that people will be thinking, not another post about backup! As it is often something we don’t like to think about and is generally considered to be a relatively boring topic. I can honestly say I used to sit in this camp however during my placement at IBM and my current job at Atos my eyes have been open to how important this topic is. During the day I help orchestrate backups of terabytes of data for a number of large and important clients data that their businesses rely on, but wait a second, isn’t my data just as important to me, my photos, documents, videos, music all these things are irreplaceable if my data was lost I simply couldn’t replace most of it. This is why I have continually backed up since IBM using a variety of different techniques that I am going to outline below.
Method 1, USB Stick Galore!
This method is probably the most straight forward, and very much a KISS (keep it simple stupid) idea of shoving whatever data you consider important onto a USB stick. However there are several problems with this:
- It requires you to actually think about doing something, when was the last time you promised yourself you would do a backup, when did you actually get round to it?
- Who hasn’t lost at least one USB stick in their life?
- Does the USB leave the house with you, if not do you really have the rather necessary off-site backup that protects you from things like fire damage??
- Version Control, once again who hasn’t accidentally overwritten a crucial document with an older version?
- Small storage capacity – fine for some but not for several people including myself.
The main reason why I stopped using the method was that for me at least I never got round to actually backing anything up! And capacity also started to become a concern which effectively killed this.
Method 2, External Hard Disk
This method is a little more common that the above owing to the advantages a hard disk can offer in terms of capacity. I feel the need to make an important distinction though between two types of setup though. The permanently attached disk that is constantly plugged in and powered on (using a separate power adapter). And the ‘as and need’ disk that is plugged into the computer to run a backup. For the latter most of the USB stick problems apply so I am going to highlight some problems with the former.
- If a power surge is suffered (and yes I know most people including myself have serge protection kit) that fries your PC in all likelihood it probably fried your backup hard disk as well.
- No crucial (I can’t emphasise this enough!) off site backup, fire/tornado/flood anybody?
My main problem with this was a lack of off site for me and I was getting older and more cynical and started to think this wasn’t a great idea. I also (foolishly in retrospect) treated the external as a USB stick manually backing up. Today if I were to use this method (which I don’t recommend) I would use a tool like Windows backup or TimeMachine to allow me to have a setup and forget attitude.
Method 3, Local Rsync to ‘Backup Server’
Whilst at IBM and using some of the Linux knowledge I had gained (and a HP Microserver I had bought with Wages) I started to use a localised Rsync. Rsync is a Unix tool that carries out what is called an incremental backup, in layman’s terms meaning that only the files that are changed are updated. This has huge advantages when you are having to send your backup over a network. Rsync can also be scripted in Unix and set to run at set intervals using a scheduler (in Linux refered to as the crontab) meaning that I had backups running every 15 minutes so that in the event of a hardware failure I only lost 15 minutes of data.
- The problem with this method was simply the lack of an off site storage mechanism, a fire could have easily wiped out my data.
I ran with this solution for a little while until I decided I needed an offsite backup.
Method 4, The Cloud Provider (in my case Dropbox)
Firstly I will start by saying that I find Dropbox to be a fantastic tool that does it’s job very well. For those unfamiliar with Dropbox it is a tool that allows you to synchronise folders between all sorts of devices and Dropbox themselves and is completely free for a limited storage capacity. The way Dropbox provides this synchronisation is with a clever application they developed that allows you to push your files to ‘The Dropbox’ which is actually a frontend to Amazon’s highly popular S3 storage cloud. This is the solution I used throughout my final year at Uni and is still the one I presently do today, I however had to upgrade from the basic package as I required more space (jumping to the 100GB package at $10 a month). However I started to think after a budget review with my wife shortly after getting married that was the any way I could eliminate that cost which is really what has spurred my current design thought. Dropbox also has it’s little annoyances like only things within the local Dropbox folder are synced and therefore backed up.
Before I continue to my new design idea however I would like to point out that Dropbox is a fantastic product that for people with modest data requirements can be a fantastic backup solution.
Firstly, and in good IT practice I want to set out the requirements I have for a backup solution for my home.
- Ability to back up both Windows and Linux (me) and OSX (wife) environments and potentially both our phones as well.
- Ability to store sufficiently large amount of data (initially in the region of 100s of GBs)
- Off site storage for increased resilience.
- Setup and forget mentality (automatic)
- Protection against drive failures.
What it is clear from my requirements is that I would need some sort off site storage whether it be cloud or some sort of secondary site. Having scooted around cloud providers however it quickly become clear that large data volumes such as those I wished to back up would start to get expensive, which is the exact opposite of what I am trying to achieve (for example Dropbox has a 500GB package for $40 a month at time of writing). So I would need some sort of second site to store data on.
Storing data off site
Now my thought turned to where could I freely store my data as far a way from my house as possible, and I came up with…….. my inlaws house (or my parents house for that matter). Both have unlimited 20+Mb based internet connections and would be happy to host a device that was left on 24/7 providing it was low powered. Instantly I thought of the Raspberry Pi the highly publicised credit card sized computer that draws a maximum 5W from the mains. This coupled to a large capacity external hard disk could provide my off site storage.
Local Backup Storage
After sussing out the off site storage it was time to switch back to the local end, ideally I would want a device that could consolidate a series of device backups before being transmitted over the internet to my offsite storage. Thankfully I already had this device, my trusty HP Microserver running VMware’s popular ESXi virtualisation platform. It already provided a NAS storage device for my wife and I to share much of our data between each other. All it really would need to fulfill the role is a bigger hard disk to store the data.
It is all well and good having off site and local storage but what is the best way to get data to be sent from the local to the remote (and from the end devices to the local for that matter). Well there are multiple options for this, Rsync that I mentioned earlier would work nicely but it isn’t natively supported by Windows. What I was really looking for was a ‘Dropbox like’ alternative without the cloud in the middle. It just so happened towards the end of my degree I was also acting as a technical adviser to a low budget film production called Lost Diagnosis. The problem they had was that they needed to transfer substantial (400+GB) of 1080p footage between members of the post-production team, to this end (and with perfect timing) I stumbled upon the newly beta released Bitorrent Sync. This utilises the infamous file sharing protocol to allow folder syncing between machines without a cloud in the middle. The bitorrent protocol itself is heavily used for large file transfers (of often questioning legality) in a Many to Many environment allowing faster transfers by breaking down files and sending chunks between machines. (Like in the image below).
With all this in mind therefore I decided to design a solution incorporating this fantastic tool.
Some purchasing 🙂
Which brings me nicely forward to today, where I finally got round to buying the new hard disks (I had already bought the Pi a couple of months earlier to play with) from Ebuyer so my expenditure on the project is as follows:
- Raspberry Pi (with power supply/case): £40
- External USB Hard Disk 3TB: £85
- Internal SATA Hard Disk 3TB: £80
- Total: £205
An HLD (High Level Diagram)
Drawing to a close…
Well that is all I really have time to talk about today, I look forward to keeping people updated with the process of the project as it continues.