Not long ago, 2nd Watch published an article on Amazon Glacier. In it Caleb provides a great primer on the capabilities of Glacier and the cost benefits. Now that he’s taken the time to explain what it is, let’s talk about possible use cases for Glacier and how to avoid some of the pitfalls. As Amazon says, “Amazon Glacier is optimized for data that is infrequently accessed and for which retrieval times of several hours are suitable.” What immediately comes to mind are backups, but most AWS customers do this through EBS snapshots, which can restore in minutes, while a Glacier recall can take hours. Rather than looking at the obvious, consider these use cases for Glacier Archival storage: compliance (regulatory or internal process), conversion of paper archives, and application retirement.
Compliance often forces organizations to retain records and backups for years, customers often mention a seven year retention policy based on regulatory compliance. In seven years, a traditional (on premise) server can be replaced at least once, operating systems are upgraded several times, applications have been upgraded or modified, and backup hardware/software has been changed. Add to that all the media that would need to be replaced/upgraded and you have every IT department’s nightmare – needing to either maintain old tape hardware or convert all the old backup tapes to the new hardware format (and hope too many haven’t degraded over the years). Glacier removes the need to worry about the hardware, the media, and the storage fees (currently 1¢ per GB/month in US-East) are tiny compared to the cost of media and storage on premise. Upload your backup file(s) to S3, setup a lifecycle policy, and you have greatly simplified your archival process while keeping regulatory compliance.
So how do customers create these lifecycle policies so their data automatically moves to Glacier? From the AWS Management Console, once you have an S3 bucket there is a Property called ‘Lifecycle’ that can manage the migration to Glacier (and possible deletion as well). Add a rule (or rules) to the S3 bucket that can migrate files based on a filename prefix, how long since their creation date, or how long from an effective date (perhaps 1 day from the current date for things you want to move directly to Glacier). For the example above, perhaps customers take backup files, move them to S3, then have them move to Glacier after 30 days and delete after 7 years.
Before we go too far and setup lifecycles, however, one major point should be highlighted: Amazon charges customers based on GB/month stored in Glacier and a one-time fee for each file moved from S3 to Glacier. Moving a terabyte of data from S3 to Glacier could cost little more than $10/month in storage fees, however, if that data is made up of 1k log files, the one-time fee for that migration can be more than $50,000! While this is an extreme example, consider data management before archiving. If at all possible, compress the files into a single file (zip/tar/rar), upload those compressed files to S3 and then archive to Glacier.
-Keith Homewood, Cloud Architect