Let me start by painting the picture: You’re the CFO. Or the manager of a department, group, or team, and you’re ultimately responsible for any and all financial costs incurred by your team/group/department. Or maybe you’re in IT and you’ve been told to keep a handle on the costs generated by application use and code development resources. Your company has moved some or all of your projects and apps to the public cloud, and since things seem to be running pretty smoothly from a production standpoint, most of the company is feeling pretty good about the transition.
The promise of moving to cloud to cut costs hasn’t matriculated and attempting to figure out the monthly bill from your cloud provider has you shaking your head.
Source: Amazon Web Services (AWS). “Understanding Consolidated Bills – AWS Billing and Cost Management”. (2017). Retrieved from https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/con-bill-blended-rates.html
From Reserved Instances and on-demand costs, to the “unblended” and “blended” rates, attempting to even make sense of the bill has you no closer to understanding where you can optimize your spend.
It’s not even just the pricing structure that requires an entire department of accountants to make sense of, the breakdown of the services themselves is just as mind boggling. In fact, there are at least 500,000 SKUs and price combinations in AWS alone! In addition, your team likely has no limitation on who can spin up any specific resource at any time, intrinsically compounding the problem—especially when staff leave them running, the proverbial meter racking up the $$ in the background.
Addressing this complex and ever-moving problem is not, in fact, a simple matter, and requires a comprehensive and intimate approach that starts with understanding the variety of opportunities available for cost and performance optimization. This where 2nd Watch and our Six Pillars of Cloud Optimization come in.
The Six Pillars of Cloud Cost Optimization
- Reserved Instances (RIs)
AWS Reserved Instances, Azure Reserved VM Instances, and Google Cloud Committed Use Discounts take the ephemeral out of cloud resources, allowing you to estimate up front what you’re going to use. This also entitles you to steep discounts for pre-planning, which ends up as a great financial incentive.
Most cloud cost optimizations, erroneously, begin and end here—providing you and your organization with a less than optimal solution. Resources to estimate RI purchases are available through cloud providers directly and through 3rd party optimization tools. For example, CloudHealth by VMware provides a clear picture into where to purchase RI’s based on your current cloud use over a number of months and will help you manage your RI lifecycle over time.
Two of the major factors to consider with cloud cost optimization are Risk Tolerance and Centralized RI Management portfolios.
- Risk Tolerance refers to identifying how much you’re willing to spend up front in order to increase the possibility of future gains or recovered profits. For example, can your organization take a risk and cover 70% of your workloads with RIs? Or do you worry about consumption, and will therefore want to limit that to around 20-30%? Also, how long, in years, are you able to project ahead? One year is the least risky, sure, but three years, while also a larger financial commitment, comes with larger cost savings.
- Centralized RI Management portfolios allow for deeper RI coverage across organizational units, resulting in even greater savings opportunities. For instance, a single application team might have a limited pool of cash in which to purchase RIs. Alternatively, a centralized, whole organization approach would cover all departments and teams for all workloads, based on corporate goals. This approach, of course, also requires ongoing communication with the separate groups to understand current and future resources needed to create and execute a successful RI management program.
Once you identify your risk tolerance and centralize your approach to RI’s you can take advantage of this optimization option. Though, an RI-only optimization strategy is short-sighted. It only allows you to take advantage of pricing options that your cloud vendor offers. It is important to overlay RI purchases with the 5 other optimization pillars to achieve the most effective cloud cost optimization.
One of the benefits of the cloud is the ability to spin up (and down) resources as you need them. However, the downside of this instant technology is that there is very little incentive for individual team members to terminate these processes when they are finished with them. Auto-Parking refers to scheduling resources to shut down during off hours—an especially useful tool for development and test environments. Identifying your idle resources via a robust tagging strategy is the first step; this allows you to pinpoint resources that can be parked more efficiently. The second step involves automating the spin-up/spin-down process. Tools like ParkMyCloud, AWS Instance Scheduler, Azure Automation, and Google Cloud Scheduler can help you manage the entire auto-parking process.
Ah, right-sizing, the best way to ensure you’re using exactly what you need and not too little or too much. It seems like a no-brainer to just “enable right-sizing” immediately when you start using a cloud environment. However, without the ability to analyze resource consumption or enable chargebacks, right-sizing becomes a meaningless concept. Performance and capacity requirements for cloud applications often change over time, and this inevitably results in underused and idle resources.
Many cloud providers share best practices in right-sizing, though they spend more time explaining the right-sizing options that exist prior to a cloud migration. This is unfortunate as right-sizing is an ongoing activity that requires implementing policies and guardrails to reduce overprovisioning, tagging resources to enable department level chargebacks, and properly monitoring CPU, Memory and I/O, in order to be truly effective.
Right-sizing must also take into account auto-parked resources and RIs available. Do you see a trend here with the optimization pillars?
- Family Refresh
Instance types, VM-series and “Instance Families” all describe methods by which cloud providers package up their instances according to the hardware used. Each instance/series/family offers different varieties of compute, memory, and storage parameters. Instance types within their set groupings are often retired as a unit when the hardware required to keep them running is replaced by newer technology. Cloud pricing changes directly in relationship to this changing of the guard, as newer systems replace the old. This is called Family Refresh.
Up-to-date knowledge of the instance types/families being used within your organization is a vital component to estimating when your costs will fluctuate. Truth be told, though, with over 500,000 SKU and price combinations for any single cloud provider, that task seems downright impossible.
Some tools exist, however, that can help monitor/estimate Family Refresh, though they often don’t take into account the overlap that occurs with RIs—or upon application of any of the other pillars of optimization. As a result, for many organizations, Family Refresh is the manual, laborious task it sounds like. Thankfully, we’ve found ways to automate the suggestions through our optimization service offering.
Related to the issue of instances running long past their usefulness, waste is prevalent in cloud. Waste may seem like an abstract concept when it comes to virtual resources, but each wasted unit in this case = $$ spent for no purpose. And, when there is no limit to the amount of resources you can use, there is also no incentive to individuals using the resources to self-regulate their unused/under-utilized instances. Some examples of waste in the cloud include:
- AWS RDSs or Azure SQL DBs without a connection
- Unutilized AWS EC2s
- Azure VMs that were spun up for training or testing
- Dated snapshots that are holding storage space that will never be useful
- Idle load balancers
- Unattached volumes
Identifying waste takes time and accurate reporting. It is a great reason to invest the time and energy in developing a proper tagging strategy, however, since waste will be instantly traceable to the organizational unit that incurred it, and therefore, easily marked for review and/or removal. We’ve often seen companies buy RIs before they eliminate waste, which, without fail, causes them to overspend in cloud – for at least a year.
Storage in the cloud is a great way to reduce on-premises hardware spend. That said, though, because it is so effortless to use, cloud storage can, in a very short matter of time, expand exponentially, making it nearly impossible to predict accurate cloud spend. Cloud storage is usually charged by four characteristics:
- Size – How much storage do you need?
- Data Transfer (bandwidth) – How often does your data need to move from one location to another?
- Retrieval Time – How quickly do you need to access your data?
- Retrieval Requests – How often do you need to access your data?
There are a variety of options for different use cases including using more file storage, databases, data backup and/or data archives. Having a solid data lifecycle policy will help you estimate these numbers, and ensure you are both right-sizing and using your storage quantity and bandwidth to its greatest potential at all times.
So, you see, each of these six pillars of cloud cost optimization houses many moving parts, and what with public cloud providers constantly modifying their service offerings and pricing, it seems wrangling in your wayward cloud is unlikely. Plus, optimizing only one of the pillars without considering the others offers little to no improvement, and can, in fact, unintentionally cost you more money over time. An efficacious optimization process must take all pillars and the way they overlap into account, institute the right policies and guardrails to ensure cloud sprawl doesn’t continue, and implement the right tools to allow your team regularly to make informed decisions.
The good news is that the future is bright! Once you have completely assessed your current environment, taken the pillars into account, made the changes required to optimize your cloud, and found a method by which to make this process continuous, you can investigate optimization through application refactoring, ephemeral instances, spot instances and serverless architecture.
The promised cost savings of public cloud is reachable, if only you know where to look.
2nd Watch offers a Cloud Cost Optimization service that can help guide you through this process. Our Cloud Cost Optimization service can reduce your current cloud computing costs by as much as 25% to 40%, increasing efficiency and performance. Our proven methodology empowers you to make data driven decisions in context, not relying on tools alone. Cloud cost optimization doesn’t have to be time consuming and challenging. Start your cloud cost optimization plan with our proven method for success at https://offers.2ndwatch.com/optimization.
-Stefana Muller, Sr. Product Manager