There have been numerous articles, blogs, and whitepapers about the security of the Cloud as a business solution. Amazon Web Services has a site devoted to extolling their security virtues and there are several sites that devote themselves entirely to the ins and outs of AWS security. So rather than try to tell you about each and every security feature of AWS and try to convince you how secure the environment can be, my goal is to share a real world example of security that can be improved by moving from on premise datacenters to AWS.
Many AWS implementations are used for hosting web applications, most of which are Internet accessible. Obviously, if your environment is for internal use only you can lock down security even further, but for the interest of this exercise, we’re assuming Internet facing web applications. The inherent risk, of course, with any Internet accessible application is that accessibility to the Internet provides hackers and malicious users access to your environment as well as honest yet malware/virus/Trojan infected users.
As with on premise and colocation based web farms, AWS offers the standard security practices of isolating customers from one another so that if one customer experiences a security breach, all other customers remain secure. And of course, AWS Security Groups function like traditional firewalls, allowing traffic only through allowed ports to/from specific destinations/sources. AWS moves ahead of traditional datacenters starting with Security Groups and Network ACL’s by offering more flexibility to respond to attacks. Consider the case of a web farm that has components suspected of being compromised; AWS Security Groups can be created in seconds to isolate the suspected components from the rest of the network. In a traditional datacenter environment, those components may require making complex network changes to move them to isolated networks in order to prevent infection to spread over the network, something AWS blocks by default.
AWS often talks about scalability – able to grow and shrink the environment to meet demands. That capability also extends to security features as well! Need another firewall, just add another Security Group, no need to install another device. Adding another subnet, VPN, firewall, all of these things are done in minutes with no action from on premise staff required. No more waiting while network cables are moved, hardware is installed or devices are physically reconfigured when you need security updates.
Finally, no matter how secure an environment, no security plan is complete without a remediation plan. AWS has tools that provide remediation with little to no downtime. Part of standard practices for AWS environments is to take regular snapshots of EC2 instances (servers). These snapshots can be used to re-create a compromised or non-functional component in minutes rather than the lengthy restore process for a traditional server. Additionally, 2nd Watch recommends taking an initial image of each component so that in the event of a failure, there is a fall back point to a known good configuration.
So how secure is secure? With the ability to respond faster, scale as necessary, and recover in minutes – the Amazon Cloud is pretty darn secure! And of course, this is only the tip of the iceberg for AWS Cloud Security, more to follow the rest of December here on our blog and please check out the official link above for Amazon’s Security Center and Whitepapers.
-Keith Homewood, Cloud Architect
Amazon Web Services™ (AWS) released a new service at re:invent a few weeks ago that will have operations and security managers smiling. CloudTrail is a web service that records AWS API calls and stores the logs in S3. This provides organizations the visibility they need to their AWS infrastructure to maintain proper governance of changes to their environment.
2nd Watch was pleased to announce support for CloudTrail in our launch of our 2W Atlas product. 2W Atlas is a product that organizes and visualizes AWS resources and output data. Enterprise organizations need tools and services built for the cloud to properly manage these new architectures. 2W Atlas provides organizations with a tool that enables their divisions and business units to organize and manage the CloudTrail data for their individual group.
2nd Watch is committed to assisting enterprise organizations with the expertise and tools to make the cloud work for them. The tight integration 2nd Watch has developed with CloudTrail and Atlas is further proof of our expertise in bringing enterprise solutions that our customers demand.
To learn more about 2W Atlas or CloudTrail, Contact Us and let us know how we can help.
-Matt Whitney, Sales Executive
Cloud Services are becoming more commonplace in the Enterprise. As a result, our ability to learn from past successes and failures becomes vital for the effective launch of new IT strategies. One way to ensure this is applied in practice is through the development of an Operational Framework. An Operational Framework is a key element of cloud strategy that needs to be developed, reviewed, and executed to ensure that organizational goals are achieved and lessons applied.
The items outlined in an Operational Framework provide guidance to Organizations and help them to create, operate, and support IT Services while ensuring that their investments in IT delivers business value at acceptable levels of risk*.
Operational Frameworks can be broken down into two separate categories: IT Operations and Business Operations. IT Operations consist of Security, Fault Tolerance, and Performance. As you develop your operational user guide within your organization, you want to think about these things in parallel.
Security Operations are about taking the time to do things right the first time. Security Groups need to be addressed so that ports are not left open. CIDR configurations need to be examined along with IAM use for the account. S3 Bucket Permissions need to be reviewed so that you don’t have potential data loss. IAM password policies need to be implemented and RDS Security Group access restricted to combat against potential vulnerabilities. Organizations can eliminate many potential headaches by examining these aspects early in your strategy.
Similarly, Fault Tolerance reviews can help you increase availability and redundancy within your environments. By taking advantage of health checks, you can increase your uptime and take advantage of the full benefits of cloud technology. Within this part of your Operational Framework, you should review that you are:
- Snapshotting your EBS Volumes frequently enough
- Architecting your environment to take advantage of Multiple Availability Zones or even multiple regions.
- Taking advantage of Elastic Load Balancers and Auto-Scaling Groups and that they are configured in an optimal way that allows for peak traffic flow and performance.
- Reviewing your VPN tunnel redundancy so that is it configured ideally.
Fault tolerance is vital to an organization’s IT Operations and should be reviewed often and in detail.
Lastly, within IT Operations, review your performance matrix within your cloud environment. Cloud deployments offer you the ability to take advantage of a suite of powerful services, but often we see that customers will unintentionally over or under prevision their environment, leading to waste. By improving performance of your services and using just what you need, you can greatly maximize your operational budget
For Performance Operations, you should:
- Review your EC2 Instances, making sure you are not over/under provisioned.
- Review your service limits, so that when your auto scaling groups do kick in you can do so to meet your demand. Provisioned IOPS are commonly misunderstood and overestimated.
- Review your EBS configurations and make sure that you are utilizing your PIOPS accordingly.
Other things to consider within this group are:
- Your DNS provider
- Using Glacier for Archiving
- Utilization of yearly 3rd party audits. Having a second set of eyes audit your environment can usually pay itself off after a few months.
Business Operations and Corporate Governance are a bit easier because they focus strictly on utilization. Most importantly, you want make sure that you optimize your use of Reserved Instances. By developing a proper strategy for reservations, you will not only save money but will guarantee that the resources your environment needs are available, even in the event of an outage. Over & under utilization are equally detrimental to your bottom line. Plan to review your usage quarterly and take advantage of billing software as needed to help tighten your understanding of your environment. ELBs, EBS volumes, unused elastic IPs, and idle RDS instances should also be examined, as waste can occur easily with these services as well. Within your business operations framework communication should flow freely between your IT Department, User Groups, and the Finance Department. The free flow of information will allow for future innovation, increased budget parameters, and a unified corporate direction that everyone can agree with.
By taking a few of these simple steps, you are setting yourself up for a successful cloud strategy and implementation for years to come.
*Source: paraphrased internet website
-Blake Diers, Sales Executive
Not long ago, 2nd Watch published an article on Amazon Glacier. In it Caleb provides a great primer on the capabilities of Glacier and the cost benefits. Now that he’s taken the time to explain what it is, let’s talk about possible use cases for Glacier and how to avoid some of the pitfalls. As Amazon says, “Amazon Glacier is optimized for data that is infrequently accessed and for which retrieval times of several hours are suitable.” What immediately comes to mind are backups, but most AWS customers do this through EBS snapshots, which can restore in minutes, while a Glacier recall can take hours. Rather than looking at the obvious, consider these use cases for Glacier Archival storage: compliance (regulatory or internal process), conversion of paper archives, and application retirement.
Compliance often forces organizations to retain records and backups for years, customers often mention a seven year retention policy based on regulatory compliance. In seven years, a traditional (on premise) server can be replaced at least once, operating systems are upgraded several times, applications have been upgraded or modified, and backup hardware/software has been changed. Add to that all the media that would need to be replaced/upgraded and you have every IT department’s nightmare – needing to either maintain old tape hardware or convert all the old backup tapes to the new hardware format (and hope too many haven’t degraded over the years). Glacier removes the need to worry about the hardware, the media, and the storage fees (currently 1¢ per GB/month in US-East) are tiny compared to the cost of media and storage on premise. Upload your backup file(s) to S3, setup a lifecycle policy, and you have greatly simplified your archival process while keeping regulatory compliance.
So how do customers create these lifecycle policies so their data automatically moves to Glacier? From the AWS Management Console, once you have an S3 bucket there is a Property called ‘Lifecycle’ that can manage the migration to Glacier (and possible deletion as well). Add a rule (or rules) to the S3 bucket that can migrate files based on a filename prefix, how long since their creation date, or how long from an effective date (perhaps 1 day from the current date for things you want to move directly to Glacier). For the example above, perhaps customers take backup files, move them to S3, then have them move to Glacier after 30 days and delete after 7 years.
Before we go too far and setup lifecycles, however, one major point should be highlighted: Amazon charges customers based on GB/month stored in Glacier and a one-time fee for each file moved from S3 to Glacier. Moving a terabyte of data from S3 to Glacier could cost little more than $10/month in storage fees, however, if that data is made up of 1k log files, the one-time fee for that migration can be more than $50,000! While this is an extreme example, consider data management before archiving. If at all possible, compress the files into a single file (zip/tar/rar), upload those compressed files to S3 and then archive to Glacier.
-Keith Homewood, Cloud Architect
A lot of companies have been dipping their toe in the proverbial Cloud waters for some time, looking at ways to help their businesses be more efficient, agile and innovative. There have been a lot of articles published recently about cloud being overhyped, cloud being the new buzzword for everything IT, or about cloud being just a fad. The bottom line is that cloud is enabling a completely new way to conduct business, one that’s not constrained but driven through a completely new business paradigm and should not be overlooked but leveraged, and leveraged immediately.
- Cyclical Business Demand – We’ve been helping customers architect, build, deploy and manage environments for unpredictable or spikey demand. This has become more prevalent with the proliferation of mobile devices and social media outlets where you never know when the next surge in traffic will come.
- Datacenter Transformation – Helping customers figure out what can move to the public cloud and what should stay on premise is a typical engagement for us. As the continued migration from on premise technology to cloud computing accelerates, these approaches and best practices are helping customers not just optimize what they have today but also ease the burden of trying to make an all or nothing decision.
- Financial Optimization – Designing a way to help customers understand their cloud finances and then giving them the ability to create financial models for internal chargebacks and billing can sometimes be overlooked upfront. We’ve developed solutions to help customers do both where customers are seeing significant cost savings.
For quite some time I’ve been meaning to tinker around with using Amazon S3 for a backup tool. Sure I’ve been using S3 backed Dropbox for years now and love it, and there are a multitude of other desktop client apps out there that do the same sort of thing with varying price points and feature sets (including Amazon’s own cloud drive). The primary reason I wanted to look into using something specific to S3 is because it is economical and very highly available and secure, but it also scales well in a more enterprise setting. It is just a logical and compelling choice if you are already running IAAS in AWS.
If you’re unfamiliar with rsync, it is a UNIX tool for copying files or sets of files with many cool features. Probably the most distinctive feature is that it does differential copying, which means that it will only copy files that have changed on the source. This means if you have a file set containing thousands of files that you want to sync between the source and the destination it will only have to copy the files that have changed since the last copy/sync.
Being an engineer my initial thought was, “Hey, why not just write a little python program using the boto AWS API libs and librsync to do it?”, but I am also kind of lazy, and I know I’m not that forward-thinking, so I figured someone has probably already done this. I consulted the Google machine and sure enough… 20 seconds later I had discovered Duplicity (http://duplicity.nongnu.org/). Duplicity is an open source GPL python based application that does exactly what I was aiming for – it allows you to rsync file to an S3 bucket. In fact, it even has some additional functionality like encryption and passwords protecting the data.
A little background info on AWS storage/backups
Tying in to my earlier point about wanting to use S3 for EC2 Linux instances, traditional Linux AWS EC2 instance backups are achieved using EBS snapshots. This can work fairly well but has a number of limitations and potential pitfalls/shortcomings.
Here is a list of advantages and disadvantages of using EBS snapshots for Linux EC2 instance backup purposes. In no way are these lists fully comprehensive:
- Easily scriptable using API tools
- Pre-backed functionality built into the AWS APIs and Management Console
- Non-selective (requires backing up an entire EBS volume)
- More expensive
- EBS is more expensive than S3
- Backing up an entire EBS volume can be overkill for what you actually need backed up and result in a lot of extra cost for backing up non-essential data
- Pitfalls with multiple EBS volume software RAID or LVM sets
- Multiple EBS volume sets are difficult to snapshot synchronously
- Using the snapshots for recovery requires significant work to reconstruct volume sets
- No ability to capture only files that have changed since previous backup (ie rsync style backups)
- Only works on EBS back instances
Compare that to a list of advantages/disadvantages of using the S3/Duplicity solution:
- Inexpensive (S3 is cheap)
- Data security (redundancy and geographically distributed)
- Works on any Linux system that has connectivity to S3
- Should work on any UNIX style OS (includes Mac OSX) as well
- Only copies the deltas in the files and not the entire file or file-set
- Supports “Full” and “Incremental” backups
- Data is compressed with gzip
- FOSS (Free and Open Source Software)
- Works independently of underlying storage type (SAN, Linux MD, LVM, NFS, etc.) or server type (EC2, Physical hardware, VMWare, etc.)
- Relatively easy to set up and configure
- Uses syntax that is congruent with rsync (e.g. –include, –exclude)
- Can be restored anywhere, anytime, and on any system with S3 access and Duplicity installed
- Slower than a snapshot, which is virtually instantaneous
- Not ideal for backing up data sets with large deltas between backups
- No out-of-the-box type of AWS API or Management Console integration (though this is not really necessary)
- No “commercial” support
On to the important stuff! How to actually get this thing up and running
Things you’ll need:
- The Duplicity application (should be installable via either yum, apt, or other pkg manager). Duplicity itself has numerous dependencies but the package management utility should handle all of that.
- An Amazon AWS account
- Your Amazon S3 Access Key ID
- Your Amazon S3 Secret Access Key
- A list of files/directories you want to back up
- A globally unique name for an Amazon S3 bucket (the bucket will be created if it doesn’t yet exist)
- If you want to encrypt the data:
- A GPG key
- The corresponding GPG key passphrase
- Obtain/install the application (and its pre-requisites):
If you’re running a standard Linux distro you can most likely install it from a ‘yum’ or ‘apt’ repository (depending on distribution). Try something like “sudo yum install duplicity” or “sudo apt-get install duplicity”. If all else fails, (perhaps you are running some esoteric Linux distro like Gentoo?) you can always do it the old-fashioned way and download the tarball from the website and compile it (that is outside of the scope of this blog). “Use the source Luke.” If you are a Mac user you can also compile it and run it on Mac OSX (http://blog.oak-tree.us/index.php/2009/10/07/duplicity-mac), which I have not ed/verified actually works.
- NOTE: On Fedora Core 18, Duplicity was already installed and worked right out of the box. On a Debian Wheezy box I had to apt-get install duplicity and python-boto. YMMV
- Generate a GPG key if you don’t already have one:
- If you need to create a GPG key use ‘gpg –gen-key’ to create a key with a passphrase. The default values supplied by ‘gpg’ are fine.
- NOTE: record the GPG Key value that it generates because you will need that!
- NOTE: keep a backup copy of your GPG key somewhere safe. Without it you won’t be able to decrypt your backups, and that could make restoration a bit difficult.
- Run Duplicity backing up whatever files/directories you want saved on the cloud. I’d recommend reading the main page for a full rundown on all the options and syntax.
I used something like this:
$ export AWS_ACCESS_KEY_ID=’AKBLAHBLAHBLAHMYACCESSKEY’
$ export AWS_SECRET_ACCESS_KEY=’99BIGLONGSECRETKEYGOESHEREBLAHBLAH99′
$ export PASSPHRASE=’mygpgpassphrase’
$ duplicity incremental –full-if-older-than 1W –s3-use-new-style –encrypt-key=MY_GPG_KEY –sign-key=MY_GPG_KEY –volsize=10 –include=/home/rkennedy/bin –include=/home/rkennedy/code –include=/home/rkennedy/Documents –exclude=** /home/rkennedy s3+http://myS3backup-bucket
- Since we are talking about backups and rsync this is probably something that you will want to run more than once. Writing a bash script or something along those lines and kicking it off automatically with cron seems like something a person may want to do. Here is a pretty nice example of how you could script this – http://www.cenolan.com/2008/12/how-to-incremental-daily-backups-amazon-s3-duplicity/
- Recovery is also pretty straightforward:
$ duplicity –encrypt-key=MY_GPG_KEY –sign-key=MY_GPG_KEY –file-to-restore Documents/secret_to_life.docx –time 05-25-2013 s3+http://myS3backup-bucket /home/rkennedy/restore
Overwhelmed or confused by all of this command line stuff? If so, Deja-dup might be helpful. It is a Gnome based GUI application that can perform the same functionality as Duplicity (turns out the two projects actually share a lot of code and are worked on by some of the same developers). Here is a handy guide on using Deja-dup for making Linux backups: (http://www.makeuseof.com/tag/dj-dup-perfect-linux-backup-software/)
This is pretty useful, and for $4 a month, or about the average price of a latte, you can store nearly 50GB compressed of de-duped backups in S3 (standard tier). For just a nickel you can get at least 526MB of S3 backup for a month. Well, that and the 5GB of S3 you get for free.
-Ryan Kennedy, Senior Cloud Engineer