An oft-held misconception by many individuals and organizations is that AWS is great for Web services, big data processing, DR, and all of the other “Internet facing” applications but not for running your internal business applications. While AWS is absolutely an excellent fit for the aforementioned purposes, it is also an excellent choice for running the vast majority of business applications. Everything from email services, to BI applications, to ERP, and even your own internally built applications can be run in AWS with ease while virtually eliminating future IT capex spending.
Laying the foundation
One of the most foundational pieces of architecture for most businesses is the network that applications and services ride upon. In a traditional model, this will generally look like a varying number of switches in the datacenter that are interconnected with a core switch (e.g. a pair of Cisco Nexus 7000s). Then you have a number of routers and VPN devices (e.g. Cisco ASA 55XX) that interconnect the core datacenter with secondary datacenters and office sites. This is a gross oversimplification of what really happens on the business’s underlying network (and neglects to mention technologies like Fibre Channel and InfiniBand). But that further drives the point that migrating to AWS can greatly reduce the complexity and cost of a business in managing a traditional RYO (run your own) datacenter.
Anyone familiar with IT budgeting is more than aware of the massive capex costs associated with continually purchasing new hardware as well as the operational costs associated with managing it – maintenance agreements, salaries of highly skilled engineers, power, leased datacenter and network space, and so forth. Some of these costs can be mitigated by going to a “hosted” model where you are leasing rack space in someone else’s datacenter, but you are still going to be forking out a wad of cash on a regular basis to support the hosted model.
The AWS VPC (Virtual Private Cloud) is a completely virtual network that allows businesses the ability to create private network spaces within AWS to run all of their applications on, including internal business applications. Through the VGW (Virtual Private Gateway) the VPC inherently provides a pathway for businesses to interconnect their off-cloud networks with AWS. This can be done through traditional VPNs or by using the VPC’s Direct Connect. Direct provides a dedicated private connection from AWS to your off-cloud locations (e.g. on-prem, remote offices, colocation). The VPC is also flexible enough that it will allow you to run your own VPN gateways on EC2 instances if that is a desired approach. In addition, interconnecting with most MPLS providers is supported, as long as the MPLS provider hands off VLAN IDs.
Moving up the stack
The prior section showed how the VPC is a low cost and simplified approach to managing network infrastructure. We can proceed up the stack to the server, storage, and application layers. Another piece of the network layer that is generally heavily intertwined with the application architecture and the server’s hosting is load balancing. At a minimum, load balancing enables the application to run in a highly available and scalable manner while providing a single namespace/endpoint for the application client to connect. Amazon’s ELB (Elastic Load Balancer) is a very cost effective, powerful, and easy to use solution to load balancing in AWS. A lot of businesses have existing load balancing appliances, like F5 BigIP, Citrix Netscaler, or A1, that they use to manage their applications. Many have also written a plethora of custom rules and configs, like F5 iRules, to do some layer 7 processing and logic on the application. All of the previously mentioned load balancing solution providers, and quite a few more, have AWS hosted options available, so there is an easy migration path if they decide the ELB is not a good fit for their needs. However, I have personally written migration tools for our customers to convert well over a thousand F5 Virtual IPs and pools (dumped to a CSV) into ELBs. It allowed for a quick and scripted migration of the entire infrastructure with an enormous cost savings to the customer. In addition to off-the-shelf appliances for load balancing, you can also roll your own with tools like HAProxy and Nginx, but we find that for most people the ELB is an excellent solution for meeting their load balancing needs.
Now we have laid the network foundation to run our servers and applications on. AWS provides several services for this. If you need, or desire, to manage your own servers and underlying operating system, EC2 (Elastic Compute Cloud) provides the foundational building blocks for spinning up virtual servers you can tailor to suit whatever need you have. A multitude of Linux and Windows-based Operating Systems are supported. If your application supports it, there are services like ElasticBeanstalk, OpsWorks, or Lambda, to name a few, that will manage the underlying compute resources for you and simply allow you to “deploy code” on completely managed compute resources in the VPC.
What about my databases?
There are countless examples of people running internal business application databases in AWS. The RDS (Relational Database Service) provides a comprehensive, robust, and HA capable hosted solution for MySQL, PostgreSQL, Microsoft SQL server, and Oracle. If your database platform isn’t supported by RDS, you can always run your own DB servers on EC2 instances.
NAS would be nice
AWS has always recommended a very ephemeral approach to application architectures and not storing data directly on an instance. Sometimes there is no getting away from needing shared storage, though, across multiple instances. Amazon S3 is a potential solution but is not intended to be used as attached storage, so the application must be capable of addressing and utilizing S3’s endpoints if that is to be a solution. There are a great many applications that aren’t compatible with that model.
Until recently your options were pretty limited for providing a NAS type of shared storage to Amazon EC2 instances. You could create a GlusterFS (AKA Redhat Storage Server) or Ceph cluster out of EC2 instances spanned across multiple availability zones, but that is fairly expensive and has several client mounting issues. The Gluster client, for example, is a FUSE (filesystem in user space) client and has sub-optimal performance. Linux Torvalds has a famous and slightly amusing – depending upon the audience – rant about userspace filesystems (see: https://lkml.org/lkml/2011/6/9/462). To get around the FUSE problem you could always enable NFS server mode, but that breaks the ability of the client to dynamically connect to another GlusterFS server node if one fails thus introducing a single point of failure. You could conceivable set up some sort of NFS Server HA cluster using Linux heartbeat, but that is tedious, error prone, and places the burden of the storage ecosystem support on the IT organization, which is not desirable for most IT organizations. Not to mention that Heartbeat requires a shared static IP address, which could be jury rigged in VPC, but you absolutely cannot share the same IP address across multiple Availability Zones, so you would lose multi-AZ protection.
Yes, there were “solutions” but nothing that was easy and slick like most everything else in AWS is nor anything that is ready for primetime. Then on April 9th, 2015 Amazon introduced us to EFS (Elastic File System). The majority of corporate IT AWS users have been clamoring for a shared file system solution in AWS for quite some time, and EFS is set to fill that need. EFS is a low latency, shared storage solution available to multiple EC2 instances simultaneously via NFSv4. It is currently in preview mode but should be released to GA in the near future. See more at https://aws.amazon.com/efs/.
Thinking outside the box
In addition to the AWS tools that are analogs of traditional IT infrastructure (e.g. VPC ≈ Network Layer, EC2 ≈ Physical server or VM) there are a large number of tools and SaaS offerings that add value above and beyond. Tools like SQS, SWF, SES, RDS – for hosted/managed RDMBS platforms – CloudTrail, CloudWatch, DynamoDB, DirectoryServices, WorkDocs, WorkSpace, and many more make transitioning traditional business applications into the cloud easy, all the while eliminating capex costs, reducing operating costs, and increasing stability and reliability.
A word on architectural best practices If it is at all possible, there are some guiding principles and best practices that should be followed when designing and implementing solutions in AWS. First and foremost, design for failure. The new paradigm in virtualized and cloud computing is that no individual system is sacred and nothing is impervious to potential failure. Having worked in a wide variety of high tech and IT organizations over the past 20 years, this should really come as no surprise because even when everything is running on highly redundant hardware and networks, equipment and software failures have ALWAYS been prevalent. IT and software design as a culture would have been much better off adopting this mantra years and years ago. However, overcoming some of the hurdles designing for failure creates wasn’t a full reality until virtualization and the Cloud were available.
AWS is by far the forerunner in providing services and technologies that allow organizations to decouple the application architecture from the underlying infrastructure. Tools like Route53, AutoScaling, CloudWatch, SNS, EC2, and configuration management allow you to design a high level of redundancy and automatic recovery into your infrastructure and application architecture. In addition to designing for failure, decoupling the application state from the architecture as a whole should be strived for. The application state should not be stored on any individual component in the stack, nor should it be passed around between the layers. This way the loss of a single component in the chain will not destroy the state of the application. Having the state of the application store in its own autonomous location, like a distributed NoSQL DB cluster, will allow the application to function without skipping a beat in the event of a component failure.
Finally, a DevOps, Continuous Integration, or Continuous Delivery methodology should be adopted for application development. This allows changes to be ed automatically before being pushed into production and also provides a high level of business agility. The same kind of business agility that running in the Cloud is meant to provide.
One of the things “everyone knows” about migrating to the Cloud is that it saves companies money. Now you don’t need all those expensive data centers with the very physical costs associated with them. So companies migrate to the Cloud and are so sure they will see their costs plummet… then they get their bill for Cloud usage and experience sticker shock. Typically, this is when our customers reengage 2nd Watch – they ask us why it costs so much, what they can do to decrease their costs, and of course everyone’s favorite – why didn’t you tell me it would be so much?
First, in order to know why you are spending so much you need to analyze your environment. I’m not going to go into how Amazon bills and walk you through your entire bill in this blog post. That’s something for another day perhaps. What I do want to look into is how to enable you to see what you have in your Cloud.
Step one: tag it! Amazon gives you the ability to tag almost everything in your environment, including ELB’s, which was most recently added. I always highly recommend to my customers to make use of this feature. Personally, whenever I create something manually or programmatically I add tags to identify what it is, why it’s there, and of course who is paying for it. Even in my sandbox environment, it’s a way to tell colleagues “Don’t delete my stuff!” Programmatically, tags can be added through CloudFormation, Elastic Beanstalk, auto scaling, CLI, as well as third party tools like Puppet and Chef. From a feature perspective, there are very few AWS components that don’t support tags, and more are constantly being added.
That’s all well and good, but how does this help analytics? Tagging is actually is the basis for pretty much all analytics, and without it you have to work much harder for far less information. For example, I can tag EC2 instances to indicate applications, projects, or environments. I can then run reports that look for specific tags – how many EC2 instances are associated with Project X and what are the instance types? What are the business applications using my various RDS instances? – and suddenly when you get your bill, you have the ability to determine who is spending money in your organization and work with them on spending it smartly.
Let’s take it a step further and talk about automation and intelligent Cloud management. If I tag instances properly I can automate tasks to control my Cloud based on those tags. For example, maybe I’m a nice guy and don’t make my development team work weekends. I can set up a task to shutdown any instance with “Environment = Development” tag every Friday evening and start again Monday morning. Maybe I want to have an application only online at month end. I can set up another task to schedule when it is online and offline. Tags give us the ability to see what we are paying for and the hooks to control that cost with automation.
I would be remiss if I didn’t point out that tags are an important part of using some great 2nd Watch offerings that help manage your AWS spend. Please check out 2W Insight for more information and how to gain control over and visibility into your cloud spend.
The jump to the cloud can be a scary proposition. For an enterprise with systems deeply embedded in traditional infrastructure like back office computer rooms and datacenters the move to the cloud can be daunting. The thought of having all of your data in someone else’s hands can make some IT admins cringe. However, once you start looking into cloud technologies you start seeing some of the great benefits, especially with providers like Amazon Web Services (AWS). The cloud can be cost-effective, elastic and scalable, flexible, and secure. That same IT admin cringing at the thought of their data in someone else’s hands may finally realize that AWS is a bit more secure than a computer rack sitting under an employee’s desk in a remote office. Once the decision is finally made to “try out” the cloud, the planning phase can begin.
Most of the time the biggest question is, “How do we start with the cloud?” The answer is to use a phased approach. By picking applications and workloads that are less mission critical, you can try the newest cloud technologies with less risk. When deciding which workloads to move, you should ask yourself the following questions; Is there a business need for moving this workload to the cloud? Is the technology a natural fit for the cloud? What impact will this have on the business? If all those questions are suitably answered, your workloads will be successful in the cloud.
One great place to start is with archiving and backups. These types of workloads are important, but the data you’re dealing with is likely just a copy of data you already have, so it is considerably less risky. The easiest way to start with archives and backups is to try out S3 and Glacier. Many of today’s backup utilities you may already be using, like Symantec Netbackup and Veeam Backup & Replication, have cloud versions that can directly backup to AWS. This allows you to use start using the cloud without changing much of your embedded backup processes. By moving less critical workloads you are taking the first steps in increasing your cloud footprint.
Now that you have moved your backups to AWS using S3 and Glacier, what’s next? The next logical step would be to try some of the other services AWS offers. Another workload that can often be moved to the cloud is Disaster Recovery. DR is an area that will allow you to more AWS services like VPC, EC2, EBS, RDS, Route53 and ELBs. DR is a perfect way to increase your cloud footprint because it will allow you to construct your current environment, which you should already be very familiar with, in the cloud. A Pilot Light DR solution is one type of DR solution commonly seen in AWS. In the Pilot Light scenario the DR site has minimal systems and resources with the core elements already configured to enable rapid recovery once a disaster happens. To build a Pilot Light DR solution you would create the AWS network infrastructure (VPC), deploy the core AWS building blocks needed for the minimal Pilot Light configuration (EC2, EBS, RDS, and ELBs), and determine the process for recovery (Route53). When it is time for recovery all the other components can be quickly provisioned to give you a fully working environment. By moving DR to the cloud you’ve increased your cloud footprint even more and are on your way to cloud domination!
The next logical step is to move Test and Dev environments into the cloud. Here you can get creative with the way you use the AWS technologies. When building systems on AWS make sure to follow the Architecting Best Practices: Designing for failure means nothing will fail, decouple your components, take advantage of elasticity, build security into every layer, think parallel, and don’t fear constraints! Start with proof-of-concept (POC) to the development environment, and use AWS reference architecture to aid in the learning and planning process. Next your legacy application in the new environment and migrate data. The POC is not complete until you validate that it works and performance is to your expectations. Once you get to this point, you can reevaluate the build and optimize it to exact specifications needed. Finally, you’re one step closer to deploying actual production workloads to the cloud!
Production workloads are obviously the most important, but with the phased approach you’ve taken to increase your cloud footprint, it’s not that far of a jump from the other workloads you now have running in AWS. Some of the important things to remember to be successful with AWS include being aware of the rapid pace of the technology (this includes improved services and price drops), that security is your responsibility as well as Amazon’s, and that there isn’t a one-size-fits-all solution. Lastly, all workloads you implement in the cloud should still have stringent security and comprehensive monitoring as you would on any of your on-premises systems.
Overall, a phased approach is a great way to start using AWS. Start with simple services and traditional workloads that have a natural fit for AWS (e.g. backups and archiving). Next, start to explore other AWS services by building out environments that are familiar to you (e.g. DR). Finally, experiment with POCs and the entire gambit of AWS to benefit for more efficient production operations. Like many new technologies it takes time for adoption. By increasing your cloud footprint over time you can set expectations for cloud technologies in your enterprise and make it a more comfortable proposition for all.
Amazon Web Services best practices tell us to build for stateless systems, in a perfect world any server can serve any function with absolutely no impact to customers. Sounds great, but unfortunately reality interjects into our perfect world and we find many websites and applications are not so perfectly stateless. So how can we make use of the strengths of AWS in areas like elasticity and auto scaling without completely re-writing applications to conform? After all, one of the key benefits to moving into the Cloud is cost savings which get eaten away by spending development resources rewriting code.
The solution is thankfully built-in to Amazon’s Elastic Load Balancer (ELB), so those that require sessions to remain open for a customer can enable that “sticky” option. This keeps transactions processing, real time communication alive, and businesses from needing to redesign such code or give up auto scaling. So how does it work?
The first option is to create duration-based session stickiness. This is enabled at the ELB under port configuration. From there, the “stickiness” option can be enabled, and the ELB will generate a session cookie with a limited duration (default is 60 seconds). So long as the client checks in with the ELB before the cookie expires, the session is held on that instance and that instance will not be terminated by auto scaling. The second option is to enable application-controlled stickiness. This requires more development effort unless the existing platform already makes use of custom cookies; however this gives far more control to application developers than a basic number of seconds before timeout. By using application control a web developer can keep a client connection directed to a specific instance through the ELB with no fear that a required instance will be terminated prematurely.
To leverage the full benefits of Amazon Web Services (AWS) and features such as instant elasticity and scalability, every AWS architect eventually considers Elastic Load Balancing and Auto Scaling. These features enable the ability to instantly scale-in or scale-out an environment based on the flow of internet traffic.
Once implemented, how do you the configuration and application to make sure they’re scaling with the parameters you’ve set? You could always trust the design and logic, then wait for the environment to scale naturally with organic traffic. However, in most production environments this is not an option. You want to make sure the environment operates adequately under load. One cool way to do this is by generating a distributed traffic load through a program called Bees with Machine Guns.
The author describes Bees with Machine Guns as “A utility for arming (creating) many bees (micro EC2 instances) to attack (load ) targets (web applications).” This is a perfect solution for ing performance and functionality of an AWS environment because it allows you to use one master controller to call many bees for a distributed attack on an application. Using a distributed attack from several bees gives a more realistic attack profile that you can’t get from a single node. Bees with Machine Guns enables you to mount an attack with one or several bees with the same amount of effort.
Bees with Machine Guns isn’t just a randomly found open source tool. AWS endorses the project in several places on their website. AWS recommends Bees with Machine Guns for distributed ing in their article “Best Practices in Evaluating Elastic Load Balancing”. The author says “…you could consider tools that help you distribute s, such as the open source Fabric framework combined with an interesting approach called Bees with Machine Guns, which uses the Amazon EC2 environment for launching clients that execute s and report the results back to a controller.” AWS also provides a CloudFormation template for deploying Bees with Machine Guns on their AWS CloudFormation Sample Templates page.
To install Bees with Machine Guns you can either use the template provided on the AWS CloudFormation Sample Templates page called bees-with-machineguns.template or follow the install instructions from the GitHub project page. (Please be aware the template also deploys a scalable spot instance auto scale group behind an elastic load balancer, all of which you are responsible to pay for.)
Once the Bees with Machine Guns source is installed. You have the ability to run the following commands:
The first command we run will start up five bees that we will have control over for ing. We can use the –s option to specify the number of bees we want to spin up. The –k option is the SSH key pair name used to connect to the new servers. The –I option is the name of the AMI used for each bee. The –g option is the security group in which the bees will be launched. If the key pair, security group, and instance already exist in the region you’re launching the bees, there is less chance you will see errors when running the command.
Once launched, you can see the bees that were instantiated and under control of the Bees with Machine Guns controller with the command:
To make our bees attack we use the command “bees attack”. The options used are -u which is the URL of the target to attack. Make sure to use the trailing backslash in your URL or the command will error out. The –n is the total number of connection to make to the target. The –c option is used for the number of concurrent connections made to the target. Here in as example run of an attack:
Notice that the attack was distributed among the bees in the following manner “Each of 5 bees will fire 20 rounds, 2 at a time.” Since we had our total number of connections set to 100 each bee received an equal share of the request. Depending on your choices for the –n and –c options you can configure a different type of attack profile. For example, if you wanted to increase the time of an attack you would increase the total number of connections and the bees would take longer to complete the attack. This comes in useful when ing an auto scale group in AWS because you can configure an attack that will trigger one of your cloud watch alarms which will in turn activate a scaling action. Another trick is to use the Linux “time” command before your “bees attack” command, once the attack completes you can see the total duration of the attack.
Once the command completes you get output for the number of requests that actually completed, the requests that were made per second, the time per request, and a “Mission Assessment,” in this case the “Target crushed bee offensive”.
To spin down your fleet of bees you run the command:
This is a quick intro on how to use Bees with Machine Guns for distributed ing within AWS. The one big caution in using Bees with Machine Guns, as explained by the author, “they are, more-or-less a distributed denial-of-service attack in a fancy package,” which means you should only use it against resources that you own, and you will be liable for any unauthorized use.
As you can see, Bees with Machine Guns can be a powerful tool for distributed load s. It’s extremely easy to setup and tremendously easy to use. It is a great way to artificially create a production load to the elasticity and scalability of your AWS environment.