The Challenge of Cloud Backups

May 24th, 2017
The Challenge of Cloud Backups

For what it's worth, I'm a Cloud guy. Senior Cloud Architect, to be official about it. But, in a world where everything seems to have the word "Cloud" appended to it, the title feels a little less meaningful than it used to. I point that out because I want to explain how difficult and expensive it can be to use the Cloud for backups or archiving.  Now, Remember: I'm the Cloud guy. I'm the one who usually preaches about how the Cloud is full of rainbows, so if I say Cloud Backups are challenging, you may want to listen.

Just like diamonds, there are three C's that determine the value of a Cloud Backup solution:
•    Complexity
•    Capacity
•    Cost
    
Complexity
Getting your data to the Cloud is a challenge in itself. Your backup software may need to understand object storage, or make direct API calls. It may need WAN acceleration, or even dedicated bandwidth. Most importantly, it will certainly need to preserve data efficiencies (like compression and deduplication) to get your data offsite in a reasonable backup window.  

But getting your data back from the Cloud can be even more difficult. Most backup software doesn't integrate with Cloud storage yet, and that makes things difficult. They can't take advantage of cheaper forms of Cloud storage (i.e. AWS Glacier) without losing track of archived data. Often, this means the software can't search for files once they've been archived to cheaper storage tiers.

Simply put, most backup software doesn't speak Cloud. This means that you'll likely have two, independent systems to maintain your backups. One to get the data to the Cloud and another to manage it once it's there. This is Complexity. There are a few emerging products on the market that are tackling this problem, but they're young and they're expensive.

Capacity
This refers to the amount of data you need to protect, and the amount of bandwidth you have to send it offsite. Consider this:  if your backup software can't track block-level changes or perform synthetic full backups, you'll probably need to send full copies of your data offsite with each backup. Even at gigabit speeds, your backup and restore jobs would take days or weeks, instead of minutes or hours. 

Cost
This is the big one. Everyone thinks the Cloud is cheaper, so let me try to dispel this myth with an example. You have 20TB of data, and you need to keep monthly backups for 7 years. For this example, we'll assume that your data doesn't grow at all over the 7 years just to keep the math a bit simpler. Your first month is dirt cheap.

        First Month:  20TB x (1000GB/TB) x ($0.004/GB-month for AWS Glacier) = $80.00

By the second month, the amount of data you have stored in the Cloud will have doubled. You'll have the original 20TB monthly archive plus another 20TB monthly archive for the second month.

        Second Month:  40TB x (1000GB/TB) x ($0.0125/GB-month for AWS IA) = $160.00

If you extrapolate this logarithmic calculation out to 7 years, you'll have spent over $285,000 for Amazon's cheapest form of archive storage. The number is much higher if you factor in data growth and the high cost of recovering files when needed. If your backup software requires you to use online storage like S3 instead of Glacier, you’re looking at total costs over a million dollars.

Cloud Backups and archiving is in its infancy. Over time, software vendors will figure out how to minimize complexity, capacity and cost, but for now, you may want to keep that tape library spinning.