If the pandemic and our business applications have one thing in common, it’s the difficulty in preparing for the future. Just as we could not foresee the oncome of the virus, we cannot always precisely determine the capacity required to run our applications effectively, no matter how much we plan.
When demand exceeds your application’s ability and capacity to run efficiently, it’s time to scale.
What is scalability?
Scalability is an application’s ability to increase or decrease overall support and performance in response to the changes in demand. For example, how your company’s website might respond to an increase in visitors is dependent on your application’s scalability. When met with this demand, you want to make sure your application can handle the increase so that it functions properly. Scalability has its limits, and scaling is increasing the capacity of those limits.
The question is: is scaling up or scaling out the right choice for your business?
What is vertical vs. horizontal scaling?
There are two different ways to scale: vertical scaling and horizontal scaling. Vertical scaling, also known as scaling up, is adding more power, or increasing the capacity of a single machine or server for better performance.
For example, you can scale up by adding resources, such as CPU, RAM, or disk capacity to add more processing power to your existing machine. In cloud terms, this translates into increasing the instance type for your application. In the short term, vertical scaling creates a bigger, better machine for an application to run on. Additionally, vertical scaling is data consistent, as your data is stored on a single node / instance.
One caveat to scaling up, however, is that it comes with limits to the amount of hardware that can be added to a single machine. Vertical scaling also introduces potential for hardware failures. Vertical scaling is easy in the sense that there is no need for as additions only are made to the machine, but is easier better? Not necessarily.
Horizontal scaling, or scaling out, is when you add more machines or servers to your existing pool of resources. In cloud terms, this is referred to as Auto Scaling where the cloud OS can adjust capacity to demand needs. Rather than adding to a single machine as in scaling up, scaling out is duplicating a current set up and breaking it into separate resources.
Instead of changing the capacity of your existing server you are decreasing the load of the server through additional, duplicate servers. More resources might come appear more complex for your business but scaling out pays off in the long run, especially for larger enterprises. Instead of worrying about upgrading hardware as with vertical scaling, horizontal scaling provides a more continuous and seamless upgrading process.
Horizontal vs Vertical Scaling Pros and Cons
Which type of scaling is right for your business?
There are pros and cons to both horizontal and vertical scaling, however, horizontal scaling is currently trending due to its reliability and efficiency. Vertical scaling is simpler, while horizontal scaling may prove to optimize your business operations in the long run. Most commonly, business choose to scale out. Regardless of the environment a business operates in, scaling up requires downtime, which can be inefficient for a business’s operations.
There are a several factors to consider when determining the scaling method right for you:
Flexibility: Horizontal scaling allows for flexibility because you can determine the configuration for your setup that optimizes cost and performance for your business needs. Costs are not optimized when scaling up, as you pay for the set price of the hardware.
Upgrades: With vertical scaling, hardware additions can only be upgraded to a limited extent. Horizontal scaling allows for continuous upgrades since you are not dependent on a single piece of equipment.
Redundancy: Another benefit that comes with horizontal scaling is there is no single point of failure distributed with a cloud environment. If your servers fail, the load balancer redirects the request to a different one of your servicers. Vertical scaling, on the other hand, has a single point of failure meaning if the machine goes down, the application goes down with it. Transitioning to the cloud through horizontal scaling eliminates the potential for this problem.
Cost: While vertical scaling may come with a lower upfront cost compared to horizontal scaling, horizontal scaling optimizes cost over time.
Choosing a scaling method that meets your business needs may seem like a complicated choice, but it does not have to be. 2nd Watch is an AWS Premier Partner, a Microsoft Azure Gold Partner, and a Google Cloud Partner providing professional and managed cloud services to enterprises. Contact Us to take the next step in your cloud journey.
In migrating customers to AWS one of the consistent questions we are asked is, “How do we extend our backup services into the Cloud?” My answer? You don’t. This is often met with incredulous stares where the customer is wondering if I’m joking, crazy, or I just don’t understand IT. After all, backups are fundamental to data centers as well as IT systems in general, so why on Earth would I tell someone not to backup their systems?
The short answer to backups is just not to do it, honestly. The more in depth answer is, of course, more complicated than that. To be clear, I am talking about system backups; those backups typically used for bare metal restores. Backups of databases, of file services – these we’ll tackle separately. For the bulk of systems, however, we’ll leave backups as a relic of on premise data centers.
How? Why? Consider a typical three tiered architecture: web servers, application servers, and database servers. In AWS, ideally your application and web servers are stateless, auto scaled systems. With that in mind, why would you ever want to spend time, money, or resources on backing up and restoring one of these systems? The design should be set so if and when a system fails, the health check/monitoring automatically terminates the instance, which in turn automatically creates an auto scale event to launch a new instance in its place. No painfully long hours working through a restore process.
Similarly, your database systems can work without large scale backup systems. Yes, by all means run database backups! Database backups are not for server instance failures but for database application corruption or updates/upgrade rollbacks. Unfortunately, the Cloud doesn’t magically make your databases any more immune to human error. For the database servers (assuming non-RDS), however, maintaining a snapshot of the server instance is likely good enough for backups. If and when the database server fails, the instance can be terminated and the standby system can become the live system to maintain system integrity. Launch a new database server based on the snapshot, restore the database and/or configure replication from the live system, depending on database technology, and you’re live.
So yes, in a properly configured AWS environment, the backup and restore you love to loathe from your on premise environment is a thing of the past.
One of the main differentiators between traditional on premise data centers and Cloud Computing through AWS is the speed at which businesses can scale their environment. So often in enterprise environments, IT and business struggle to have adequate capacity when they need it. Facilities run out of power and cooling, vendors cannot provide systems fast enough or the same type of system is not available, and business needs sometimes come without warning. AWS scales out to meet these demands in every area.
Compute capacity is expanded, often automatically with auto scaling groups, which add additional server instances as demands dictate. With auto scaling groups, demands on the environment cause more systems to come online. Even without auto scaling, systems can be cloned with Amazon Machine Images (AMIs) and started to meet capacity, expand to a new region/geography, or even be shared with a business partner to move collaboration forward.
Beyond compute capacity, storage capacity is a few mouse clicks (or less) away from business needs as well. Using Amazon S3, storage capacity is simply allocated as it is used dynamically. Customers do not need to do anything more than add content and storage, and that is far easier than adding disk arrays! With Elastic Block Storage (EBS), these are added as quickly as compute instances are. Storage can be added and attached to live instances or replicated across an environment as capacity is demanded.
Growth is great, and we’ve written a great deal about how to take advantage of the elastic nature of AWS before, but what about the second part of the title? Price! It’s no secret that as customers use more AWS resources, the price increases. The more you use, the more you pay; simple. The differentiators come into play with that same elastic nature; when demand drops, resources can be released and costs saved. Auto scaling can retire instances as easily as it adds them, storage can be removed when no longer needed, and with usage of resources, bills can actually shrink as you become more proficient in AWS. (Of course, 2ndWatch Managed Services can also help with that proficiency!) With traditional data centers, once resources are purchased, you pay the price (often a large one). With the Cloud, resources can be purchased as needed, at just a fraction of the price.
IT wins and business wins – enterprise level computing at its best!
Auto-Scaling gives the ability to scale your EC2 instances up or down according to demand to handle the load on the service. With auto-scaling you don’t have to worry about whether or not the number of instances you’re using will be able to handle a demand spike or if you’re overspending during a slower period. Auto-scaling automatically scales for you for seamless performance.
For instance, if there are currently 3 m1.xlarge instances handling the service, and they spend a large portion of their time only 20% loaded with a smaller portion of their time heavily loaded, they can be vertically scaled down to smaller instance sizes and horizontally scaled out/in to more or less instances to automatically accommodate whatever load they have at that time.
This can also save many dollars by only paying for the smaller instance size. More savings can be attained by using reserved instance billing for the minimum number of instances defined by the Auto-Scaling configuration and letting those scaled out instances pay the on-demand rate while running. This is a little tricky though because an instance billing cannot be changed while the instance is running. When scaling down, make sure to terminate the newest instances, since they are running at the on-demand billing rate.
When traffic on AWS Service has predictable or unpredictable increases or decreases, Auto-Scaling can keep customers happy with the service because their response times stay more consistent and High Availability is more reliable.
Auto-Scaling to Improve HA
If there is only one server instance, Auto-scaling can be used to put a new server in place, in a few minutes, when the running one fails. Just set both Min and Max number of instances to 1.
Auto-Scaling to Improve Response Time Consistency
If there are multiple servers and the load on them becomes so heavy that the response time slows, expand horizontally only for the time necessary to cover the extra load, and keep the response time low.
AWS Auto-Scaling Options to Set
When Auto-Scaling up or down, there are a lot of things to think about:
Evaluation Period is the time, in seconds, between checks of the load on the Scaling Group.
Cool Down is the time, in seconds, after a scaling operation that a new scaling operation can be performed. When scaling out, this time should be fairly short in the event that the load is too heavy for one Scale-Up operation. When scaling in, this time should be at least twice that of the Scale-Out operation.
With Scale-Out, make sure it scales fast enough to quickly handle a load heavier than one expansion. 300 seconds is a good starting point.
With Scale-In, make sure it scales slow enough to not keep going out and in. We call this “Flapping”. Some call it “Thrashing”.
When the Auto-Scale Group includes multiple AZs, Scaling out and in should be incremented by the number of AZs involved. If only one AZ is scaled up and something happens to that AZ, noticeability in a bad way goes up.
Scale-In can be accomplished by different rules:
Terminate Oldest Instance
Terminate Newest Instance
Terminate Instance Closest to the next Instance Hour (Best Cost Savings)
Terminate Oldest Launch Configuration (default)
Auto-Scaling is a two stage process, and here is the rub. The AWS Management Console does not do Auto-Scaling so it has to be done through AWS APIs.
Set up the Launch Configuration and assign it to a group of instances you want to control. If there is no user_data file that argument can be left out. The block-device-mapping argument can be found in the details for the ami_id.
The instance configuration could change for any number of reasons:
New Features added
Removal of un-used Old Features
Whenever the AMI specified in the Auto-Scaling definition is changed, the Auto-Scaling Group needs to be updated. The update requires creating a new Scaling Launch Config with the new AMI ID, updating the Auto-Scaling Group, then deleting the old Scaling Launch Config. Without this update the Scale out operation will use the old AMI.
To leverage the full benefits of Amazon Web Services (AWS) and features such as instant elasticity and scalability, every AWS architect eventually considers Elastic Load Balancing and Auto Scaling. These features enable the ability to instantly scale-in or scale-out an environment based on the flow of internet traffic.
Once implemented, how do you the configuration and application to make sure they’re scaling with the parameters you’ve set? You could always trust the design and logic, then wait for the environment to scale naturally with organic traffic. However, in most production environments this is not an option. You want to make sure the environment operates adequately under load. One cool way to do this is by generating a distributed traffic load through a program called Bees with Machine Guns.
The author describes Bees with Machine Guns as “A utility for arming (creating) many bees (micro EC2 instances) to attack (load ) targets (web applications).” This is a perfect solution for ing performance and functionality of an AWS environment because it allows you to use one master controller to call many bees for a distributed attack on an application. Using a distributed attack from several bees gives a more realistic attack profile that you can’t get from a single node. Bees with Machine Guns enables you to mount an attack with one or several bees with the same amount of effort.
Bees with Machine Guns isn’t just a randomly found open source tool. AWS endorses the project in several places on their website. AWS recommends Bees with Machine Guns for distributed ing in their article “Best Practices in Evaluating Elastic Load Balancing”. The author says “…you could consider tools that help you distribute s, such as the open source Fabric framework combined with an interesting approach called Bees with Machine Guns, which uses the Amazon EC2 environment for launching clients that execute s and report the results back to a controller.” AWS also provides a CloudFormation template for deploying Bees with Machine Guns on their AWS CloudFormation Sample Templates page.
To install Bees with Machine Guns you can either use the template provided on the AWS CloudFormation Sample Templates page called bees-with-machineguns.template or follow the install instructions from the GitHub project page. (Please be aware the template also deploys a scalable spot instance auto scale group behind an elastic load balancer, all of which you are responsible to pay for.)
Once the Bees with Machine Guns source is installed. You have the ability to run the following commands:
The first command we run will start up five bees that we will have control over for ing. We can use the –s option to specify the number of bees we want to spin up. The –k option is the SSH key pair name used to connect to the new servers. The –I option is the name of the AMI used for each bee. The –g option is the security group in which the bees will be launched. If the key pair, security group, and instance already exist in the region you’re launching the bees, there is less chance you will see errors when running the command.
Once launched, you can see the bees that were instantiated and under control of the Bees with Machine Guns controller with the command:
To make our bees attack we use the command “bees attack”. The options used are -u which is the URL of the target to attack. Make sure to use the trailing backslash in your URL or the command will error out. The –n is the total number of connection to make to the target. The –c option is used for the number of concurrent connections made to the target. Here in as example run of an attack:
Notice that the attack was distributed among the bees in the following manner “Each of 5 bees will fire 20 rounds, 2 at a time.” Since we had our total number of connections set to 100 each bee received an equal share of the request. Depending on your choices for the –n and –c options you can configure a different type of attack profile. For example, if you wanted to increase the time of an attack you would increase the total number of connections and the bees would take longer to complete the attack. This comes in useful when ing an auto scale group in AWS because you can configure an attack that will trigger one of your cloud watch alarms which will in turn activate a scaling action. Another trick is to use the Linux “time” command before your “bees attack” command, once the attack completes you can see the total duration of the attack.
Once the command completes you get output for the number of requests that actually completed, the requests that were made per second, the time per request, and a “Mission Assessment,” in this case the “Target crushed bee offensive”.
To spin down your fleet of bees you run the command:
This is a quick intro on how to use Bees with Machine Guns for distributed ing within AWS. The one big caution in using Bees with Machine Guns, as explained by the author, “they are, more-or-less a distributed denial-of-service attack in a fancy package,” which means you should only use it against resources that you own, and you will be liable for any unauthorized use.
As you can see, Bees with Machine Guns can be a powerful tool for distributed load s. It’s extremely easy to setup and tremendously easy to use. It is a great way to artificially create a production load to the elasticity and scalability of your AWS environment.