Having been in the IT Industry since the 90s I’ve seen many iterations on Disaster Recovery principals and methodologies. The concept of DR (disaster recovery) of course far exceeds my tenure in the field as the idea started coming about in the 1970s as businesses began to realize their dependence on information systems and the criticality of those services.
Over the past decade or so we’ve really seen the concept of running a DR site at a colo facility (either leased or owned) become a popular way for organizations to have a rapidly available disaster recovery option. The problem with a colo facility is that it is EXPENSIVE! In addition to potentially huge CapEx (if you are buying your own infrastructure), you have the facility and infrastructure OpEx and all the overhead expense of managing those systems and everything that comes along with that. In steps the cloud… AWS and the other players in the public cloud arena provide you the ability to run a DR site without having really any CapEx. Now you are only paying for the virtual infrastructure that you are actually using as an operational cost.
An intelligently designed disaster recovery solution could leverage something like Amazon’s Pilot Light to keep your costs reduced by running the absolute minimal core infrastructure needed to keep the DR site fully ready to scale up to production. Well that is a big improvement over purchasing millions of dollars of hardware and having thousands and thousands of dollars in OpEx and overhead costs every month.
Even still… there is a better way. If you architect your infrastructure and applications following the AWS best practices, then in a perfect world there is really no reason to have DR at all. By architecting your systems to balance across multiple AWS regions and availability zones; correctly designing architecture and applications for handling unpredictable and cascading failure; and to automatically and elastically scale to meet increases and decreases in demand you can effectively eliminate the need for DR.
Your data and infrastructure are distributed in a way that is highly available and impervious to failure or spikes/drops in demand. So in addition to inherent DR, you are getting HA and true capacity-on-demand. The whole concept of a disaster taking down a data center and the subsequent effects on your systems, applications, and users becomes irrelevant. It may take a bit of work to design (or redesign) an application to this new cloud geo-distributed model, but I assure you that from a business continuity perspective, reduced TCO, scalability, and uptime it will pay off in spades.
That ought to put the proverbially nail in the coffin. RIP.
-Ryan Kennedy, Senior Cloud Engineer




