The “Great Resignation” was coined during the pandemic to define the trend of people choosing to leave their jobs in search of better opportunities. Also referred to as the Big Quit and the Great Reshuffle, 47 million Americans left their jobs in 2021, the most resignations on record. As of 2022, these rates continue to set records and show no signs of slowing down. For the IT industry, the Great Resignation is compounded by a much longer talent gap that has increased wage expectations in a small pool of qualified candidates.
Every time you open LinkedIn you confront messages from recruiters trying to lure you away from your current employer, or at the very least, have viewed your profile on multiple occasions. While there is absolutely nothing wrong with employees advancing their careers or finding new and better opportunities, employers are struggling to maintain business operations, budgets, and long-term growth goals. The fear of losing skilled engineering team members is something that keeps CTOs up at night.
Of course, every business would love a bottomless budget to entice capable technicians to both join the company, and stay there, but that’s not typically the case. In this article, I’m introducing the alternative solution of Managed Cloud Services (MCS). By partnering with trusted and experienced cloud experts, organizations can utilize their skills and services a la carte, without facing the costs of a full-time employee. Keep reading to see how MCS can be used to alleviate long-term shortages and help companies grow from the inside out.
Keeping Up with the Digital Transformation
Public cloud vendors seem to come up with a new service, feature, or enhancement every 16.47 seconds, and maintaining a team to keep up with everything is an ongoing struggle by itself. Now factor in employee turnover. Just when you have fully trained your team and they are humming along, operating like a well-oiled machine, a few decide to leave, and your productivity comes to a screeching halt. Immediately, you must create job requisitions, schedule interviews, and negotiate salaries. That alone can be quite a feat with the talent gap in IT creating high competition for qualified candidates.
At the end of 2021, Gartner named talent shortages as the biggest barrier to emerging technologies adoption. So not only is it hard to find good team members, but if you’re unable to attract the professionals required for business achievement, it could cost you in the market. As your competitors move ahead at an accelerated speed thanks to the automation and innovation of today’s technology, your business could be on a downward trajectory in comparison.
Hiring is One Thing; Maintaining Employees is Another
Once a company can hire the right employee, they must provide training and impart knowledge share to bring them up to speed to fulfill their responsibilities successfully. This process requires existing team members to stop what they’re doing and provide the training required. That slows productivity, costs resources, and distracts techs from larger business goals. Unfortunately, what happens all too often is, that as soon as the new team is back up to full productivity, someone else decides to leave. The process repeats itself in a never-ending cycle of rinse and repeat.
When a resignation occurs, not only is there a significant loss of knowledge, especially if an employee has been with the company for a substantial amount of time but cost structure can be severely affected. Turnover is unpredictable and sometimes inevitable if the business is unable or unwilling to provide the salary, benefits, environment, etc. desired by your talent pool. Without the ability to properly forecast the costs of hiring, turnover can compromise your business objectives quickly.
The highly skilled cloud engineers’ business wants and needs are hard to come by in general, and maintaining a team of them only compounds the issue. In addition to labor costs, businesses also must consider tooling expenses for things like monitoring, patching, ticketing, alerting, code repositories, security, databases, and the list goes on and on. If you’re feeling inundated with these concerns, know that you’re not alone, and there is a way off the hamster wheel.
Do These Issues Sound Familiar?
We’re paying a fortune for cloud engineers and developers who are constantly troubleshooting and fixing issues with the cloud itself. We need them to be able to focus on developing applications that will drive our business forward.
We had a middle of the night outage that went unnoticed, shut our business down for several hours, and we lost hundreds of thousands of dollars – all because we didn’t realize we were down until the next morning.
Our only database administrator just left for another opportunity, and we have no one to administer our database.
We’ve been modernizing our cloud deployment and greatly increasing our application performance by moving into containers, serverless, and CI/CD pipeline deployment. Our main DevOps engineer just left, and our Windows/Linux admins don’t have the right skill set to operate our constant code deployment.
A malicious hacker was able to breach our security with a DDoS attack that brought down our main revenue-generating website without us knowing.
My lead cloud engineer just left for another company, and I have no idea what Amazon EC2 and Amazon S3 even mean, let alone how to create an AWS account, what am I going to do?
Managed Cloud Services: The “Employee” You’ve Been Searching For
Managed Cloud Services (MCS) can alleviate many of these concerns and allow your team to focus on what is most important – running your business. No matter where you are in your cloud journey, a MCS provider monitors and maintains your environment to relieve IT teams from day-to-day cloud operations. If you are utilizing more traditional services, like infrastructure as a service (IaaS), experienced MCS providers can take over the burden of “day 2” operations – including monitoring, patching, security, database administration, reporting, monolith applications, and remediation. If your environment utilizes advanced features, such as serverless, containers, and infrastructure as code, and operates in more of a DevOps type of model, MCS can cover that as well.
Not only are these daily tasks taken off your plate, but by outsourcing your cloud management to an MCS provider, they take on the responsibility of ensuring resource availability. Now, instead of your company carrying the cost of finding, attracting, negotiating, hiring, training, and maintaining all of these employees – much of it is done for you. Learn more about how MCS can help your organization stay competitive in, Managed Cloud Services: Optimize, Reduce Costs, and Efficiently Achieve Your Business Goals.
Day 2 IT Operations and Beyond with 2nd Watch
2nd Watch provides a variety of MCS with dedicated and designated resources that partner directly with businesses to not only understand your cloud environment but to understand your business requirements as well. We operate your managed services using a holistic approach to cloud management leveraging both cloud native technology and architectures as well as best in breed customized management tools. 2nd Watch partners get a 24/7 year-round service delivery manager and committed engineering team to work collaboratively for goal achievement. We have found this comprehensive method, combined with intense knowledge sharing during employee onboarding, provides a seamless experience for our managed clients. If resignation and turnover are obstacles to achieving your business outcomes, see how 2nd Watch service offerings can help. Contact us to take the next step in your cloud journey.
-By Jeff Collins | Solutions Management, Managed Cloud Services
Cloud adoption is becoming more popular across all industries, as it has proven to be reliable, efficient, and more secure as a software service. As cloud adoption increases, companies are faced with the issue of managing these new environments and their operations, ultimately impacting day-to-day business operations. Not only are IT professionals faced with the challenge of juggling their everyday work activities with managing their company’s cloud platforms but must do so in an timely, cost-efficient manner. Often, this requires hiring and training additional IT people—resources that are getting more and more difficult to find.
Managing your cloud operations on your own can seem like a daunting, tedious task that distracts from strategic business goals. A cloud managed service provider (MSP) monitors and maintains your cloud environments relieving IT from the day-to-day cloud operations, ensuring your business operates efficiently. This is not to say IT professionals are incapable of performing these responsibilities, but rather, outsourcing allows the IT professionals within your company to concentrate on the strategic operations of the business. In other words, you do what you do best, and the service provider takes care of the rest.
The alternative to an MSP is hiring and developing within your company the expertise necessary to keep up with the rapidly evolving cloud environment and cloud native technologies. Doing it yourself factors in a hiring process, training, and payroll costs.
While possible, maintaining your cloud environments internally might not be the most feasible option in the long run. Additionally, a private cloud environment can be costly and requires your applications are handled internally. Migrating to the public cloud or adopting hybrid cloud model allows companies flexibility, as they allow a service provider either partial or full control of their network infrastructure.
What are Managed Cloud Services?
Managed cloud services are the IT functions you give your service provider to handle, while still allowing you to handle the functions you want. Some examples of the management that service providers offer include:
Managed cloud database: A managed database puts some of your company’s most valuable assets and information into the hands of a complete team of experienced Database Administrators (DBAs). DBAs are available 24/7/365 to perform tasks such as database health monitoring, database user management, capacity planning and management, etc.
Managed cloud security services: The public cloud has many benefits, but with it also comes security risks. Security management is another important MSP service to consider for your business. A cloud managed service provider can prevent and detect security threats before they occur, while fully optimizing the benefits provided by a cloud environment.
Managed cloud optimization: The cloud can be costly, but only as costly as you allow it to be. An MSP can optimize cloud spend through consulting, implementation, tools, reporting services, and remediation.
Managed governance & compliance: Without proper governance, your organization can be exposed to security vulnerabilities. Should a disaster occur within your business, such as a cyberattack on a data center, MSPs offer disaster recovery services to minimize recovery downtime and data loss. A managed governance and compliance service with 2nd Watch helps your Chief Security and Compliance Officers maintain visibility and control over your public cloud environment to help achieve on-going, continuous compliance.
At 2nd Watch, our foundational services include a fully managed cloud environment with 24/7/365 support and industry leading SLAs. Our foundational services address the key needs to better manage spend, utilization, and operations.
What are the Benefits of a Cloud Managed Service Provider?
Using a Cloud Managed Service Provider comes with many benefits if you choose the right one.
Some of these benefits include, but are not limited to:
Cost savings: MSPs have experts that know how to efficiently utilize the cloud, so you get the most out of your resources while reducing cloud computing costs.
Increased data security: MSPs ensure proper safeguards are utilized while proactively monitoring and preventing potential threats to your security.
Increased employee production: With less time spent managing the cloud, your IT managers can focus on the strategic business operations.
24/7/365 management: Not only do MSPs take care of cloud management for you but do so 100% of the time.
Overall business improvement: When your cloud infrastructure is managed by a trusted cloud advisor, they can optimize your environments while simultaneously allowing time for you to focus on core business operations. They can also recommend cloud native solutions to further support the business agility required to compete.
Why Our Cloud Management Platform?
With cloud adoption increasing in popularity, choosing a managed cloud service provider to help with this process can be overwhelming. While there are many options, choosing one you can trust is important to the success of your business. 2nd Watch provides multi-cloud management across AWS, Azure, and GCP, and has a special emphasis of putting our customers before the cloud. Additionally, we use industry standard, cloud native tooling to prevent platform lock in.
The solutions we create at 2nd Watch are tailored to your business needs, creating a large and lasting impact on our clients. For example:
On average, 2nd Watch saves customers 41% more than if they managed the cloud themselves (based on customer data)
Customers experience increased efficiency in launching applications, adding an average 240 hours of productivity per year for your business
On average, we save customers 21% more than our competitors
Next Steps
2nd Watch helps customers at every step in their cloud journey, whether that’s cloud adoption or optimizing your current cloud environment to reduce costs. We can effectively manage your cloud, so you don’t have to. Contact us to get the most out of your cloud environment with a managed cloud service provider you can trust.
Dealing with Windows patching can be a royal pain as you may know. At least once a month Windows machines are subject to system security and stability patches, thanks to Microsoft’s Patch Tuesday. With Windows 10 (and its derivatives), Microsoft has shifted towards more of a Continuous Delivery model in how it manages system patching. It is a welcome change, however, it still doesn’t guarantee that Windows patching won’t require a system reboot.
Rebooting an EC2 instance that is a member of an Auto Scaling Group (depending upon how you have your Auto Scaling health-check configured) is something that will typically cause an Elastic Load Balancing (ELB) HealthCheck failure and result in instance termination (this occurs when Auto Scaling notices that the instance is no longer reporting “in service” with the load balancer). Auto Scaling will of course replace the terminated instance with a new one, but the new instance will be launched using an image that is presumably unpatched, thus leaving your Windows servers vulnerable.
The next patch cycle will once again trigger a reboot and the vicious cycle continues. Furthermore, if the patching and reboots aren’t carefully coordinated, it could severely impact your application performance and availability (think multiple Auto Scaling Group members rebooting simultaneously). If you are running an earlier version of Windows OS (e.g. Windows Server 2012r2), rebooting at least once a month on Patch Tuesday is an almost certainty.
Another major problem with utilizing the AWS stock Windows AMIs with Auto Scaling is that AWS makes those AMIs unavailable after just a few months. This means that unless you update your Auto Scaling Launch Configuration to use the newer AMI IDs on a continual basis, future Auto Scaling instance launches will fail as they try to access an AMI that is no longer accessible. Anguish.
Automatically and Reliably Patch your Auto-Scaled Windows instances
Given the aforementioned scenario, how on earth are you supposed to automatically and reliably patch your Auto-Scaled Windows instances?!
One approach would be to write some sort of an orchestration layer that detects when Auto Scaling members have been patched and are awaiting their obligatory reboot, suspend Auto Scaling processes that would detect and replace perceived failed instances (e.g. HealthCheck), and then reboot the instances one-by-one. This would be rather painful to orchestrate and has a potentially severe drawback that cluster capacity is reduced by N-1 during the rebooting (maybe more if you don’t take into account service availability between reboots).
Reducing capacity to N-1 might not be a big deal if you have a cluster of 20 instances but if you are running a smaller cluster of something— say 4, 3, or 2 instances—then that has a significant impact to your overall cluster capacity. And, if you are running on an Auto Scaling group with a single instance (not as uncommon as you might think) then your application is completely down during the reboot of that single member. This of course doesn’t solve the issue of expired stock AWS AMIs.
Another approach is to maintain and patch a “golden image” that the Auto Scaling Launch Configuration uses to create new instances from. If you are unfamiliar with the term, a golden-image is an operating system image that has everything pre-installed, configured, and saved in a pre-baked image file (an AMI in the case of Amazon EC2). This approach requires a significant amount of work to make this happen in a reasonably automated fashion and has numerous potential pitfalls.
While it prevents a service outage by replacing the unavailable public AMI with a stock AMI, you still need a way to reliably and automatically handle this process. Using a tool like Hashicorp’s Packer can get you partially there, but you would still have to write a number of Providers to handle the installation of Windows Update and anything else you need to do in order to prep the system for imaging. In the end, you would still have to develop or employ a fair number of tools and processes to completely automate the entire process of detecting new Windows Updates, creating a patched AMI with those updates, and orchestrating the update of your Auto Scaling Groups.
A Cloud-Minded Approach
I believe that Auto Scaling Windows servers intelligently requires a paradigm shift. One assumption we have to make is that some form of configuration management (e.g. Puppet, Chef)—or at least a basic bootstrap script executed via cfn-init/UserData—is automating the configuration of the operating system, applications, and services upon instance launch. If configuration management or bootstrap scripts are not in play, then it is likely that a golden-image is being utilized. Without one of these two approaches, you don’t have true Auto Scaling because it would require some kind of human interaction to configure a server (ergo, not “auto”) every time a new instance was created.
Both approaches (launch-time configuration vs. golden-image) have their pros and cons. I generally prefer launch-time configuration as it allows for more flexibility, provides for better governance/compliance, and enables pushing changes dynamically. But…(and this is especially true of Windows servers) sometimes launch-time configuration simply takes longer to happen than is acceptable, and the golden-image approach must be used to allow for a more rapid deployment of new Auto Scaling group instances.
Either approach can be easily automated using a solution like to the one I am about to outline, and thankfully AWS publishes new stock Windows Server AMIs immediately following every Patch Tuesday. This means, if you aren’t using a golden-image, patching your instances is as simple as updating your Auto Scaling Launch Configuration to use the new AMI(s) and preforming a rolling replacement of the instances. Even if you are using a golden-image or applying some level of customization to the stock AMI, you can easily integrate Packer into the process to create a new patched image that includes your customizations.
The Solution
At a high level, the solution can be summarized as:
An Orchestration Layer (e.g. AWS SNS and Lambda, Jenkins, AWS Step Functions) that detects and responds when new patched stock Windows AMIs have been released by Amazon.
A Packer Launcher process that manages launching Packer jobs in order to create custom AMIs. Note: This step is only required If copying AWS stock AMIs to your own AWS account is desired OR if you want to apply customization to the stock AMI. Either use case requires that the custom images are available indefinitely. We solved this problem by creating a Packer Launcher process by creating an EC2 instance with a Python UserData script that launches Packer jobs (in parallel) to create copies of the new stock AMIs into our AWS account. Note: if you are using something like Jenkins, this could be handled by having Jenkins launch a local script or even a Docker container to manage launching Packer jobs.
A New AMI Messaging Layer (e.g. Amazon SNS) to publish notifications when new/patched AMIs have been created
Some form of an Auto Scaling Group Rolling Updater will be required to replace exiting Auto Scaling Group instances with new ones based on the Patched AMI.
Great news for anyone using AWS CloudFormation… CFT inherently supports Rolling Updates for Auto Scaling Groups! Utilizing it requires attaching an UpdatePolicy and adding a UserData or cfn-init script to notify CloudFormation when the instance has finished its configuration and is reporting as healthy (e.g. InService on the ELB). There are some pretty good examples of how to accomplish this using CloudFormation out there, but here is one specifically that AWS provides as an example.
If you aren’t using CloudFormation, all hope is not lost. With Hashicorp Terraform’s ever increasing popularity for deploying and managing AWS infrastructure as code, Terraform has still yet to implement a Rolling Update feature for AWS Auto Scaling Groups. There is a Terraform feature request from a few years ago for this exact feature, but as of today, it is not yet available, nor do the Terraform developers have any short-term plans to implement it. However, several people (including Hashicorp’s own engineers) have developed a number of ways to work around the lack of an integrated Auto Scaling Group Rolling Updater in Terraform. Here are a few I like:
Of course, you can always roll your own solution using a combination of AWS services (e.g. SNS, Lambda, Step Functions), or whatever tooling best fits your needs. Creating your own solution will allow you added flexibility if you have additional requirements that can’t be met by CloudFormation, Terraform, or other orchestration tool.
The following is an example framework for performing automated Rolling Updates to Auto Scaling Groups utilizing AWS SNS and AWS Lambda:
a. An Auto Scaling Launch Config Modifier worker that subscribes to the New AMI messaging layer performs an update to the Auto Scaling Launch Configuration(s) when a new AMI is released. In this use case, we are using an AWS Lambda function to subscribe to an SNS topic. Upon notification of new AMIs, the worker must then update the predefined (or programmatically derived) Auto Scaling Launch Configurations to use the new AMI. This is best handled by using infrastructure templating tools like CloudFormation or Terraform to make updating the Auto Scaling Launch Configuration ImageId as simple as updating a parameter/variable in the template and performing an update/apply operation.
b. An Auto Scaling Group Instance Cycler messaging layer (again, an Amazon SNS topic) to be notified when an Auto Scaling Launch Configuration ImageId has been updated by the worker.
c. An Auto Scaling Group Instance Cycler worker that will perform replacing the Auto Scaling Group instances in a safe, reliable, and automated fashion. For example, another AWS Lambda function that will subscribe to the SNS topic and trigger new instances by increasing the Auto Scaling Desired Instance count to a value of twice the current number of ASG instances.
d. Once the scale-up event generated by the Auto Scaling Group Instance Cycler worker has completed and the new instances are reporting as healthy, another message will be published to the Auto Scaling Group Instance Cycler SNS topic indicating scale-up has completed.
e. The Auto Scaling Group Instance Cycler worker will respond to the prior event and return the Auto Scaling group back to its original size which will terminate the older instances leaving the Auto Scaling Group with only the patched instances launched from the updated AMI. This assumes that we are utilizing the default AWS Auto Scaling Termination Policy which ensures that instances launched from the oldest Launch Configurations are terminated first.
NOTE: The AWS Auto Scaling default termination policy will not guarantee that the older instances are terminated first! If the Auto Scaling Group is spanned across multiple Availability Zones (AZ) and there is an imbalance in the number of instances in each AZ, it will terminate the extra instance(s) in that AZ before terminating based on the oldest Launch Configuration. Terminating on Launch Configuration age will certainly ensure that the oldest instances will be replaced first. My recommendation is to use the OldestInstance termination policy to make absolutely certain that the oldest (i.e. unpatched) instances are terminated during the Instance Cycler scale-down process. Consult the AWS documentation on the Auto Scaling termination policies for more on this topic.
In Conclusion
Whichever solution you choose to implement to handle the Rolling Updates to your Auto Scaling Group, the solution outlined above will provide you with a sure-fire way to ensure your Windows Auto Scaled servers are always patched automatically and minimize the operational overhead for ensuring patch compliance and server security. And the good news is that the heavy lifting is already being handled by AWS Auto Scaling and Hashicorp Packer. There is a bit of trickery to getting the Packer configs and provisioners working just right with the EC2 Config service and Windows Sysprep, but there are a number of good examples out on github to get you headed in the right direction. The one I referenced in building our solution can be found here.
One final word of caution... if you do not disable the EC2Config Set Computer Name option when baking a custom AMI, your Windows hostname will ALWAYS be reset to the EC2Config default upon reboot. This is especially problematic for configuration management tools like Puppet or Chef which may use the hostname as the SSL Client Certificate subject name (default behavior), or for deriving the system role/profile/configuration.
Here is my ec2config.ps1 Packer provisioner script which disables the Set Computer Name option:
Hopefully, at this point, you have a pretty good idea of how you can leverage existing software, tools, and services—combined with a bit of scripting and automation workflow—to reliably and automatically manage the patching of your Windows Auto Scaling Group EC2 instances! If you require additional assistance, are resource-bound for getting something implemented, or you would just like the proven Cloud experts to manage Automating Windows Patching of your EC2 Autoscaling Group Instances, contact 2nd Watch today!
Disclaimer
We strongly advise that processes like the ones described in this article be performed on a environment prior to production to properly validate that the changes have not negatively affected your application’s functionality, performance, or availability.
This is something that your orchestration layer in the first step should be able to handle. This is also something that should integrate well with a Continual Integration and/or Delivery workflow.
-Ryan Kennedy, Principal Cloud Automation Architect, 2nd Watch
In April 2017, we sponsored an online survey focused on cloud automation in order to understand if—and how—corporate IT departments are using automation to develop and deliver new workloads and applications. More than 1,000 IT professionals from US companies with at least 1,000 employees participated in the survey. The majority of respondents (56%) said that at least half of their deployment pipelines are now automated, and 63% said they can deploy new applications in less than six weeks.
According to the results of the survey, companies that have embraced cloud automation can deploy new applications and workloads faster and more frequently, while recovering from failures with more agility than organizations that struggle to adopt automated processes, ing and monitoring. Furthermore, per the survey results, 41% of corporate IT departments are producing more than 10 new cloud workloads every year, and 56% have automated at least half of all their artifact creation and deployment pipelines. Another 66% said that at least half of all their quality assessments (lint, unit s, etc.) are automated.
“The survey results reiterate what we’re hearing from clients and prospects: automation, driven by cloud technologies, is critical to the rapid delivery of new workloads and applications,” says Jeff Aden, EVP of Marketing & Strategic Business Development & Co-Founder at 2nd Watch. “Companies are automating everything from artifact creation to deployment pipelines and process, which includes metrics, documentation and data. The result is faster time-to-market for new applications, and less application downtime.”
More survey results:
63% said that deploying new applications takes less than six weeks
44% said that deploying new code to production takes a day or less
54% said they are deploying new code changes at least once a week
50% said it takes a day or less to recover from application failure
55% said they are measuring application quality by ing everything
Download the infographic highlighting the results of the Cloud Automation survey here. For questions about how 2nd Watch can help you embrace cloud automation, please contact us today!