It starts with that sinking feeling and all sorts of questions running through your head:
When did I last backup?
Did I include everything?
Does the backup actually work?
When did I last try restoring it?
How much unrecoverable data could I have lost?
Losing data can be an absolute stomach churning experience. Those first moments when the penny drops and you realise you’ve got a big, big problem are absolutely nightmarish and without a robust backup strategy it’s just a matter of time until you experience this firsthand.
This last happened to me a couple of years ago when a disk had a catastrophic failure and refused to boot. Slaving the disk off another machine didn’t do much good so it was off the data recovery specialists and let me tell you, this is not cheap and it’s not fast. I used a group in Sydney called Payam Data Recovery and whilst their service was fantastic, I always felt like each conversation was akin to speaking to your doctor and expecting him to tell you you’ve got a terminal illness and that you need to “start making arrangements”.
Anyway, I got to thinking about the whole backup strategy thing again recently after both Jeff Atwood and Phil Haack suffered data loss on their blogs after a server hardware failure. Being pretty popular guys this lead to a lot of feedback and suggestions from the community as to how to both retrieve their data (some very innovative feedback there!) and how to implement a suitable backup strategy in the future. It brought the whole disaster recovery topic back into focus for me so I thought I’d share my approach which personally, I think works pretty well.
Personal, not enterprise
Let me quickly contextualise this; I’m talking about personal backups, not enterprise data which is probably more consistent with Jeff and Phil’s experience above although many of the principals are the same. Let me typify what I mean by personal data:
- Documents such as Word, Excel, PDF.
- Audio content such as MP3s (could be from iTunes, ripped from CD, etc).
- Photos (predominantly JPG and possibly RAW) and video.
- Any other personal content you don’t want to lose, which for me, includes things like my Subversion repositories and database backups for personal projects.
Essentially I’m talking about the sort of content most of us accrue during our daily lives as civilians. It’s the same sort of data your parents or kids or cousins accrue regardless of whether they’re techie people or not.
Manual processes will fail; it’s just a question of when
Here’s a pretty typical backup strategy:
- Grab the thumb drive / USB hard drive / blank DVD from the cupboard / safe / relative’s house.
- Copy and paste the important stuff onto the device.
- Return said device to its storage location.
The biggest problem here is that it requires you to be proactive. You have to consciously think “Wow, haven’t done a backup for a while, might be a good idea” and the reality is this never happens as frequently as it needs to and you carry a lot of risk between backups. What’s more, the process will become increasingly lengthy as the data expands. I could already fill nearly 20 DVDs with my data and it would take a long time to copy the data over via USB to an external device. Inevitably, the process deteriorates and becomes suboptimal increasing the risk of losing important data.
Onsite backups are vulnerable
One thing you need to get your head around and I don’t think most people do, is that if your backup strategy doesn’t involve getting your data outside of your house then you need consider what happens in the event of your backups disappearing. This usually happens one of two ways:
- You get burgled and the thieves take anything that has a flashing light or looks valuable which might include your NAS device, your backup server or your carefully hidden thumb drive.
- You literally experience a disaster such as flood or fire. 11 months ago we had absolutely catastrophic fires in Australia with the loss of 173 lives and, I suspect, huge amounts of now unrecoverable data.
The point is that if your data and your backups are anywhere near each other then they are both at risk of the same events causing you to lose it. Irrecoverably.
“Last version” backups can still lead to data loss
A typical use case for restoring from backup is corrupted data. This could occur via program error or malicious activity but the net result is still the same; you need to restore from an uncompromised copy. The problem is, if you’re using one of the typical approaches mentioned above and you’re actually using it frequently, there’s a good chance your corrupted copy has overwritten a previous good version. The bottom line is that without the ability to restore from point in time you are vulnerable if files become corrupted. You could take the approach of continually making new copies, rather than overwriting existing ones, but the feasibility of doing this decreases dramatically with increasing content size and frequency of backups.
You need to be able to get the data back again
In the wake of Jeff and Phil’s data loss I mentioned earlier, Joel Spolsky posted about Let’s stop talking about “backups”. His point was simply that although backups are great, you need to be able to reliably get the data back again. This means you need confidence in the backup process, the integrity of the media it’s stored on, the ability to read the content again at a later date and of course that you’ve actually captured everything you need to restore from square one i.e. new machine with zero data. Entrusting media such as DVDs and thumb drives to reliably hold data without physical degradation for possibly many years is again, a risky proposition.
As far as I’m concerned, you’re really only left with one viable choice if your data is important and that’s using an online service such as Mozy. After my own data loss I spent a great deal of time researching suitable backup providers and although there were a few out there at the time, Mozy is the one that really stood out to me. The status quo may have changed by now but certainly Mozy is still doing a great job of managing my backups.
The purpose of Mozy is pretty straight forward:
Mozy is a simple and safe way to back up all the important stuff on your computer. A copy of your data is stored in a secure, remote location for safekeeping, so that in the event of disaster your data is still retrievable.
It sounds easy, and it is. Mozy provides a small client which runs on your machine and allows you to specify backup sets. By default, you get a bunch of common sets such as photos and documents.
Alternatively, you can just backup based on file system path as I’ve done (more on that later).
Finally, you decide on what sort of schedule you’d like your backups to happen on. By default it will try and do it when the PC is not in use but you can manually define a particular time of day if it suits. I found there was no noticeable performance impact on the PC when it ran (you can actually adjust the balance between backup speed and PC performance if you like) so I just ran with the defaults.
That’s it, you’re done! Any content you place in the specified backup sets or file paths will automatically be backed up offsite.
Mozy provides a few different options here, the easiest of which is that your entire backup set gets mapped as another drive you can simply navigate to in Windows Explorer. You see a list of each drive you’ve previously backed up to from which you can drill down to the appropriate path, locate the content you’re looking for then just right click and “Restore”.
That’s it; restoring lost content could not possibly be easier! Obviously there is a bandwidth consideration depending on the volume of content you’re restoring but I’ll touch on that again in a moment.
But what if the latest backup was a corrupted version? Not a problem, right click on any folder, choose “Change time” and you can restore from any point in time in the past so you can easily roll back to a known good version. You can access the same functionality by right clicking on the source file and choosing “Restore Previous Version” which then lists all the previous versions you’ve successfully backed up. Although not its primary purpose, this is also a neat little version control feature.
The other online mechanism of restoring files is via a web interface. This is pretty handy if, for whatever reason, you don’t have the Mozy client on a machine and still want to access backups.
All this works great for restoring a reasonable size collection of files, but what if you’re talking about catastrophic loss and the need to restore tens of GB? Mozy provides a sneaker-ware solution involving burning to DVD and FedExing the content to you. Obviously they’re going to charge for this service but its US$30 plus US$0.50 per GB so even if you’re talking 100GB you’re only looking at 80 bucks to get all your data back. That’s a very small fraction of what I paid Payam to recover my data and even then, some of it wasn’t recoverable.
When I talk to people about this backup strategy one of the most common questions is “is it secure?” It’s a bit of a tough one because there’s no simple yes or no answer. Is anything on the net “secure”? Banks get compromised! Mozy reports using 128 bit SSL for the transport layer and 448 bit Blowfish encryption in storage so at least on the surface of it, the encryption appears to be solid.
Just a couple of other thoughts on security though. Firstly, how secure is your data at present? I mean what is the possibility of someone walking out your PC or your backup device under their arm and reading the data? There might be a password on the PC but I’d hasten a guess that’s about the extent of it.
The other consideration is the balance of impact of loss versus impact of disclosure. Unless you’ve got your own Paris Hilton style home video on your machine, I’d personally be far more upset about losing my data altogether than someone getting hold of my photos or even my financial statements. Passwords can be changed; family photos can never be recreated.
The other obvious question is “What’s it going to do to my bandwidth?” Obviously large backups are going to chew up a lot of upstream capacity. I just came from holidays with over 10GB of photos and video and it took several days for Mozy to work through it all. In terms of the impact of using this bandwidth, I haven’t noticed any adverse behaviour primarily because I have Mozy configured not to run while I’m using it which means a lot of the network utilisation is happening during the night.
The other big factor is how your ISP handles upstream data billing. Providers such as Telstra BigPond count it along with your downstream traffic but someone like TPG, who I use, don’t count any sent traffic. Even if they did, when you’re looking at less than A$70 a month for 160GB of traffic (ok, half of that is off peak), bandwidth is starting to get pretty cheap.
First, the good news; you can use up to 2GB for free which means you can download the client, test it out, make sure you’re happy with it then make a financial commitment. After that you’re looking at US$4.95 a month for unlimited data which on my cappuccino-meter (i.e. the price of a coffee which you wouldn’t think twice about spending – around A$4), it’s less than a cappuccino and a half a month which I reckon is very, very good value.
Update, 2 April 2010: Online Backups Review currently has a 15% discount promotion running.
There’s only gotcha which really causes me any grief and that is the “Home” version (as opposed to the “Business” version which is significantly more expensive) must backup from a local drive. I use a DLINK DNS-323 NAS device to store absolutely everything because it’s easily accessible from any machine, has built in RAID 1 with two mirrored drives and acts as an FTP server. Unfortunately because this drive is considered external there was absolutely no way I could get Mozy to backup from it.
What I’ve ended up doing is installing a spare disk in my primary PC then configuring Robocopy to mirror all the content I actually want on the NAS to this drive. It runs nightly and in some ways makes things a little simpler as I just tell Mozy to backup absolutely everything on the F drive which is the spare disk. It would be nice to be able to avoid this but I can see where they’re coming from in terms of segregating their business and personal offerings.
Update (8 Jan 2011): A fatal flaw in this approach has just become apparent. I removed the backup disk (usually F drive) in anticipation of getting my local PC shop to investigate some odd behaviour I suspect is related to the motherboard or RAM. When I did this, one of my DVD drives was reassigned to F when the machine restarted. My nightly Robocopy wasn’t running (expected, but I wasn’t generating much new content), and Mozy obviously started backing up from the empty DVD drive. I’ve just come back from Christmas holidays with many GB of photos to backup and guess what? Total content currently backed up is 0GB. Ouch!
Mozy has a bit of a built in defence for this in that it keeps the last 30 days of file changes. Unfortunately I pulled the disk out further back from that so it’s clear I’ve now lost my backups. I’ve now explicitly assigned the backup drive letter to T so it won’t be automatically assigned in the future. Now begins well over 100GB of backups all over again. Glad my ISP doesn’t count uploads in my quota!
There are a couple of alternatives to this sort of approach. Apple has the Time Machine which I hear good things about and Microsoft has Windows Home Server which both provide personal backup solutions. I’m not sure how the respective manufacturers recommend taking backups offsite but this to me is an essential requirement if the backups are to be seriously foolproof. In my case, neither solution was suitable (I don’t have a Mac and don’t particularly want to run a server at home), but they might be viable alternatives for some assuming the data can be safely and reliably stored somewhere redundant.
In reading back through this post, it does appear overtly favourable towards Mozy but I’m honestly just really, really impressed with the product and am not incentivised in any way to write this. I just love the fact that you install it and forget about it and the only time you’ll ever need to actually think about it is when you need it most. It sure makes me sleep a lot better at night and as so long as I myself don’t suffer a catastrophic failure (incidentally, I have a backup for me as well!), I’m confident I’m never going to be experiencing that sinking feeling again.