In the world of IT, disasters come in all shapes and sizes from infrastructure and application outages, to human error, data corruption, ransomware, malicious attacks, and other unplanned events. Other than perhaps a hurricane or blizzard, we often don’t have visibility into when a disaster will occur. After the immediate impact of the disaster subsides, the focus rapidly shifts to the recovery.
At the core of the disaster recovery is a focus on how quickly applications and data can be restored to resume servicing your customers. Downtime means a loss of productivity, revenue, or even profit from credits being paid out to your customers for failure to maintain service.
But disaster recovery goes well beyond the post-crisis events, and its success hinges on the preparation done well in advance of any disaster occurring. Now, a disaster recovery strategy should not be confused with a business continuity plan. A business continuity plan is far greater in scope, covering not only recovering your IT systems, data, and applications to service customers again, but how to continue running your business even beyond IT system disruptions. For example, a business continuity plan will outline what steps to take when the physical building becomes unavailable and your employees can’t come into the office; how to handle supply chain disruptions, etc.
When discussing disaster recovery strategies, often times back-up and disaster recovery are used synonymously. Back-up should factor into your business continuity planning, and in some cases a back-up may be sufficient in restoring your systems and meeting compliance requirements. However, back-ups are a point-in-time solution and can take significant time to restore your systems, delaying your recovery time. Compounding this dilemma, back-ups are only as up to date as the last snapshot taken, which, for many, could mean losing a complete day’s worth of sales. A solid disaster recovery strategy should not only focus on recovering your systems but do it in a manner that exceeds the business requirements and minimizes the disruption your customers.
Traditional disaster recovery solutions have really required significant investment from both a financial perspective and a human resource perspective. It’s not unusual for enterprises to be required to purchase fully redundant hardware and duplicative software licenses, locate that hardware in geographically disbursed colo facilities, set-up connectivity and replication between the two sites, and have IT admins maintain the second site, which is commonly under-utilized.
Cloud based disaster recovery has solved many of these problems and can do it for a fraction of the price. To help bring this solution to our customers, 2nd Watch has partnered with CloudEndure, an AWS Company, to help enterprises accelerate their adoption of Cloud Disaster Recovery.
The CloudEndure Disaster Recovery solution replicates everything in real time, meaning everything is always up to date, down to the second, allowing you to achieve your Recovery Point Objectives (RPOs). CloudEndure provisions a very low-cost staging area in AWS, eliminating the need for duplicate resource provisioning. Should a disaster occur, automated orchestration combined with machine conversion enables you to achieve a Recovery Time Objectives (RTOs) of minutes and only pay for the cloud instances when actually needed.
Our Cloud Disaster Recovery service provides you a disaster recovery proof of concept for 100 machines in less than 30 days, while allowing you to continue to leverage your entire existing infrastructure. We apply our proven methodology to ensure your organization is getting optimal value from your existing infrastructure while allowing fast, easy, and cost-effective recovery in the AWS cloud.
Business resumption, also known as disaster recovery, has always been a challenge for organizations. Aside from those in the banking and investment industry, many businesses don’t take business resumption as seriously as they should.
I formerly worked at a financial institution that would send their teams to another city in another state where production data was backed up and could be restored in the event of a disaster. Employees would go to this location and use the systems in production to complete their daily workloads. This would the redundancy of a single site, but what if you could have many redundant sites? What if you could have a global backup option and have redundancy not only when you need it, but as a daily part of your business strategy?
To achieve true redundancy, I recommend understanding your service provider’s offerings. Each service provider has different facilities located in different regions that are spread between different telecom service providers.
From a customer’s perspective, this creates a good opportunity to build out an infrastructure that has fully redundant load balances, giving your business a regional presence in almost every part of the world. In addition, you are able to deliver application speed and efficiency to your regional consumers.
Look closely at your provider’s services like hardware health monitoring, log management, security monitoring and all the management services that accompany those solutions. If you need to conform to certain compliance regulations, you also need to make sure the services and technologies meet each regulation.
Organize your vendors and managed service providers so that you can get your data centralized based on service across all providers and all layers of the stack. This is when you need to make sure that your partners share data, have the ability to ingest logs, and exchange APIs with each other to effectively secure your environment.
Additionally, centralize the notification process so you are getting one call per incident versus multiple calls across providers. This means that API connectivity or log collection needs to happen between technologies that are correlating triggered events across multiple platforms. This will centralize your notification and increase the efficiency and decrease detection time to mitigate risks introduced into your environment by outside and inside influences.
Lastly, to find incidents as quickly as possible, you need to find a managed services provider that will be able to ingest and correlate all events and logs across all infrastructures. There are also cloud migration services that will help you with all these decisions as they help move you to the cloud.
Learn more about 2W Managed Cloud Security and how our partnership with Alert Logic can ensure your environment’s security
IT infrastructure is the hardware, network, services and software required for enterprise IT. It is the foundation that enables organizations to deliver IT services to their users. Disaster recovery (DR) is preparing for and recovering from natural and people-related disasters that impact IT infrastructure for critical business functions. Natural disasters include earthquakes, fires, etc. People-related disasters include human error, terrorism, etc. Business continuity differs from DR as it involves keeping all aspects of the organization functioning, not just IT infrastructure.
When planning for DR, companies must establish a recovery time objective (RTO) and recovery point objective (RPO) for each critical IT service. RTO is the acceptable amount of time in which an IT service must be restored. RPO is the acceptable amount of data loss measured in time. Companies establish both RTOs and RPOs to mitigate financial and other types of loss to the business. Companies then design and implement DR plans to effectively and efficiently recover the IT infrastructure necessary to run critical business functions.
For companies with corporate datacenters, the traditional approach to DR involves duplicating IT infrastructure at a secondary location to ensure available capacity in a disaster. The key downside is IT infrastructure must be bought, installed and maintained in advance to address anticipated capacity requirements. This often causes IT infrastructure in the secondary location to be over-procured and under-utilized. In contrast, Amazon Web Services (AWS) provides companies with access to enterprise-grade IT infrastructure that can be scaled up or down for DR as needed.
The four most common DR architectures on AWS are:
Backup and Restore ($) – Companies can use their current backup software to replicate data into AWS. Companies use Amazon S3 for short-term archiving and Amazon Glacier for long-term archiving. In the event of a disaster, data can be made available on AWS infrastructure or restored from the cloud back onto an on-premise server.
Pilot Light ($$) – While backup and restore are focused on data, pilot light includes applications. Companies only provision core infrastructure needed for critical applications. When disaster strikes, Amazon Machine Images (AMIs) and other automation services are used to quickly provision the remaining environment for production.
Warm Standby ($$$) – Taking the Pilot Light model one step further, warm standby creates an active/passive cluster. The minimum amount of capacity is provisioned in AWS. When needed, the environment rapidly scales up to meet full production demands. Companies receive (near) 100% uptime and (near) no downtime.
Hot Standby ($$$$) – Hot standby is an active/active cluster with both cloud and on-premise components to it. Using weighted DNS load-balancing, IT determines how much application traffic to process in-house and on AWS. If a disaster or spike in load occurs, more or all of it can be routed to AWS with auto-scaling.
In a non-disaster environment, warm standby DR is not scaled for full production, but is fully functional. To help adsorb/justify cost, companies can use the DR site for non-production work, such as quality assurance, ing, etc. For hot standby DR, cost is determined by how much production traffic is handled by AWS in normal operation. In the recovery phase, companies only pay for what they use in addition and for the duration the DR site is at full scale. In hot standby, companies can further reduce the costs of their “always on” AWS servers with Reserved Instances (RIs).
Smart companies know disaster is not a matter of if, but when. According to a study done by the University of Oregon, every dollar spent on hazard mitigation, including DR, saves companies four dollars in recovery and response costs. In addition to cost savings, smart companies also view DR as critical to their survival. For example, 51% of companies that experienced a major data loss closed within two years (Source: Gartner), and 44% of companies that experienced a major fire never re-opened (Source: EBM). Again, disaster is not a ready of if, but when. Be ready.
The jump to the cloud can be a scary proposition. For an enterprise with systems deeply embedded in traditional infrastructure like back office computer rooms and datacenters the move to the cloud can be daunting. The thought of having all of your data in someone else’s hands can make some IT admins cringe. However, once you start looking into cloud technologies you start seeing some of the great benefits, especially with providers like Amazon Web Services (AWS). The cloud can be cost-effective, elastic and scalable, flexible, and secure. That same IT admin cringing at the thought of their data in someone else’s hands may finally realize that AWS is a bit more secure than a computer rack sitting under an employee’s desk in a remote office. Once the decision is finally made to “try out” the cloud, the planning phase can begin.
Most of the time the biggest question is, “How do we start with the cloud?” The answer is to use a phased approach. By picking applications and workloads that are less mission critical, you can try the newest cloud technologies with less risk. When deciding which workloads to move, you should ask yourself the following questions; Is there a business need for moving this workload to the cloud? Is the technology a natural fit for the cloud? What impact will this have on the business? If all those questions are suitably answered, your workloads will be successful in the cloud.
One great place to start is with archiving and backups. These types of workloads are important, but the data you’re dealing with is likely just a copy of data you already have, so it is considerably less risky. The easiest way to start with archives and backups is to try out S3 and Glacier. Many of today’s backup utilities you may already be using, like Symantec Netbackup and Veeam Backup & Replication, have cloud versions that can directly backup to AWS. This allows you to use start using the cloud without changing much of your embedded backup processes. By moving less critical workloads you are taking the first steps in increasing your cloud footprint.
Now that you have moved your backups to AWS using S3 and Glacier, what’s next? The next logical step would be to try some of the other services AWS offers. Another workload that can often be moved to the cloud is Disaster Recovery. DR is an area that will allow you to more AWS services like VPC, EC2, EBS, RDS, Route53 and ELBs. DR is a perfect way to increase your cloud footprint because it will allow you to construct your current environment, which you should already be very familiar with, in the cloud. A Pilot Light DR solution is one type of DR solution commonly seen in AWS. In the Pilot Light scenario the DR site has minimal systems and resources with the core elements already configured to enable rapid recovery once a disaster happens. To build a Pilot Light DR solution you would create the AWS network infrastructure (VPC), deploy the core AWS building blocks needed for the minimal Pilot Light configuration (EC2, EBS, RDS, and ELBs), and determine the process for recovery (Route53). When it is time for recovery all the other components can be quickly provisioned to give you a fully working environment. By moving DR to the cloud you’ve increased your cloud footprint even more and are on your way to cloud domination!
The next logical step is to move Test and Dev environments into the cloud. Here you can get creative with the way you use the AWS technologies. When building systems on AWS make sure to follow the Architecting Best Practices: Designing for failure means nothing will fail, decouple your components, take advantage of elasticity, build security into every layer, think parallel, and don’t fear constraints! Start with proof-of-concept (POC) to the development environment, and use AWS reference architecture to aid in the learning and planning process. Next your legacy application in the new environment and migrate data. The POC is not complete until you validate that it works and performance is to your expectations. Once you get to this point, you can reevaluate the build and optimize it to exact specifications needed. Finally, you’re one step closer to deploying actual production workloads to the cloud!
Production workloads are obviously the most important, but with the phased approach you’ve taken to increase your cloud footprint, it’s not that far of a jump from the other workloads you now have running in AWS. Some of the important things to remember to be successful with AWS include being aware of the rapid pace of the technology (this includes improved services and price drops), that security is your responsibility as well as Amazon’s, and that there isn’t a one-size-fits-all solution. Lastly, all workloads you implement in the cloud should still have stringent security and comprehensive monitoring as you would on any of your on-premises systems.
Overall, a phased approach is a great way to start using AWS. Start with simple services and traditional workloads that have a natural fit for AWS (e.g. backups and archiving). Next, start to explore other AWS services by building out environments that are familiar to you (e.g. DR). Finally, experiment with POCs and the entire gambit of AWS to benefit for more efficient production operations. Like many new technologies it takes time for adoption. By increasing your cloud footprint over time you can set expectations for cloud technologies in your enterprise and make it a more comfortable proposition for all.
The pervasive technology industry has created the cloud and all the acronyms that go with it. Growth is fun, and the cloud is the talk of the town. From the California Sun to the Kentucky coal mines we are going to the cloud, although Janis Joplin may have been there before her time. Focus and clarity will come later.
There is so much data being stored today that the biggest challenge is going to be how to quantify it, store it, access it and recover it. Cloud-based disaster recovery has broad-based appeal across industry and segment size. Using a service from the AWS cloud enables more efficient disaster recovery of mission critical applications without any upfront cost or commitment. AWS allows customers to provision virtual private clouds using its infrastructure, which offers complete network isolation and security. The cloud can be used to configure a “pilot-light” architecture, which dramatically reduces cost over traditional data centers where the concept of “pilot” or “warm” is not an option – you pay for continual use of your infrastructure whether it’s used or not. With AWS, you only use what you pay for, and you have complete control of your data and its security.
Backing data up is relatively simple: select an object to be backed up and click a button. More often than not, the encrypted data reaches its destination, whether in a local storage device or to an S3 bucket in an AWS region in Ireland. Restoring the data has always been a perpetual challenge. What the cloud does is make ing of the backup capabilities more flexible and more cost effective. As the cost of cloud-based ing falls rapidly, from thousands of dollars or dinars, to hundreds, it results in more ing, and therefore, more success after a failure whether it’s from a superstore or superstorm, or even a supermodel one.