Amazon Web Services (AWS) Outage Constitutes Multi-Region Infrastructure

When Amazon’s cloud computing platform, Amazon Web Services (AWS), suffered an outage on December 7, 2021, the magnitude of the event was felt globally. What happened, and how can your business learn from this significant outage?

Why was there an AWS outage?

Reported issues within the AWS infrastructure began around 12:00 ET/17:00 GMT on Dec. 7, according to data from real-time outage monitoring service DownDetector.

Amazon reported that the “US-East-1” region went down in North Virginia on Tuesday, which disrupted Amazon’s own applications and multiple third-party services that also rely on AWS. The issue was an “impairment of several network devices” that resulted in several API errors and ultimately, impacted many critical AWS services.

What were the effects of the AWS outage?

The effects of the AWS outage were massive because any problem affecting Amazon impacts hundreds of millions of end-users. AWS constitutes 41% of the global cloud-computing business, and many of the largest companies in the world are dependent on AWS’s cloud computing services. These businesses rent computing, storage, and network capabilities from AWS, which means the outage prevented end-users ‘ access to a variety of sites and apps across the Internet.

The major websites and apps that suffered from the outage are ones we turn to on a daily basis: Xfinity, Venmo, Google, and Disney+, just to name a few.

On Tuesday morning, users were reporting that they couldn’t log on to a variety of vital accounts. Most of us were going through our normal daily routine of checking the news, our financial accounts, or our Amazon orders, only to frustratingly realize that we couldn’t do so. 

With so many large organizations relying on AWS, when the outage occurred, it felt like the entire Internet went down. 

Benefits of a High Availability Multi-Region Cloud Application Architecture

Even though the outage was a major headache, it serves as an important lesson for those who are relying on a cloud-based infrastructure. As they say, you should learn from mistakes.

So how can your business mitigate, or even avoid, the effects of a major failure within your cloud provider?

At 2nd Watch, we are in favor of a high availability multi-region cloud approach. We advise our clients to build out multi-region application architecture not only because it will support your mission-critical services during an outage, but also because it will make your applications more resilient and improve your end-user experiences by keeping latencies low for a distributed

user base. Below is how we think about a multi-region cloud approach and why we believe it is a strong strategy

1. Increase your Fault Tolerance

Fault tolerance is the ability of a system to endure some kind of failure and continue to operate properly. 

Unfortunately, things happen that are beyond our control (i.e. natural disasters) or things slip through the cracks (i.e. human error), which can impact a data center, an availability zone, or an entire region. However, just because a failure happens doesn’t mean an outage has to happen.

By architecting a multi-region application structure, if there is a regional failure similar to AWS’s east region failure, your company can avoid a complete outage. Having a multi-region architecture grants your business the redundancy required to increase availability and resiliency, ensure business continuity, and support disaster recovery plans.

2. Lower latency requirements for your worldwide customer base

The benefits of a multi-region approach go beyond disaster recovery and business continuity. By adopting a multi-region application architecture, your company can deliver low latency by keeping data closer to all of your users, even those who are across the globe.

In an increasingly impatient world, keeping latency low is vital for a good user experience, and the only way to maintain low latency is to keep your users close to the data.

3. Comply with Data Privacy Laws & Regulations

“Are you GDPR compliant?” is a question you probably hear frequently. Hopefully, your business is, and you want to remain that way. With a multi-region architecture, you can ensure that you are storing data within the legal boundaries. Also, with signs that there will be more regulations each year, you will stay a step ahead with data compliance if you utilize a multi-region approach.

How Can I Implement a Multi-Region Infrastructure Deployment Solution?

A multi-region cloud approach is a proactive way to alleviate potential headaches and grow your business, but without guidance, it can seem daunting in terms of adoption strategy, platform selection, and cost modeling. 

2nd Watch helps you mitigate the risks of potential public cloud outages and deploy a multi-region cloud infrastructure. Through our Cloud Advisory Services, we serve as your trusted advisor for answering key questions, defining strategy, managing change, and providing impartial advice for a wide range of organizational, process, and technical issues critical to successful cloud modernization.

Contact us today to discuss a multi-region application architecture for your business needs!


The Cloudcast Podcast with Jeff Aden, Co-Founder and EVP at 2nd Watch

The Cloudcast’s Aaron and Brian talk with Jeff Aden, Co-Founder and EVP at 2nd Watch, about the evolution of 2nd Watch as a Cloud Integrator as AWS has grown and shifted its focus from startups to enterprise customers. Listen to the podcast at http://www.thecloudcast.net/2019/02/evolution-of-public-cloud-integrator.html.

Topic 1 – Welcome to the show Jeff. Tell us about your background, the founding of 2nd Watch, and how the company has evolved over the last few years.

Topic 2 – We got to know 2nd Watch at one of the first AWS re:Invent shows, as they had one of the largest booths on the floor. At the time, they were listed as one of AWS’s best partners. Today, 2nd Watch provides management tools, migration tools, and systems-integration capabilities. How does 2nd Watch think of themselves?

Topic 3 –  What are the concerns of your customers today, and how does 2nd Watch think about matching customer demands and the types of tools/services/capabilities that you provide today?

Topic 4 – We’d like to pick your brain about the usage and insights you’re seeing from your customers’ usage of AWS. It’s mentioned that 100% are using DynamoDB, 53% are using Elastic Kubernetes, and a fast growing section is using things likes Athena, Glue and Sagemaker. What are some of the types of applications that you’re seeing customer build that leverage these new models? 

Topic 5 – With technologies like Outpost being announced, after so many years of AWS saying “Cloud or legacy Data Center,” how do you see this impacting the thought process of customers or potential customers?


The Top 7 Things to Avoid at AWS re:Invent 2017

The sixth annual AWS re:Invent is less than a week away, taking place November 27-December 1 in Las Vegas, Nevada.  Designed for AWS customers, enthusiasts, and even cloud computing newcomers. The nearly week-long conference is a great source of information and education for attendees of all skill levels. AWS re:Invent is THE place to connect, engage, and discuss current AWS products and services via breakout sessions ranging from introductory and advanced to expert as well as to hear the news and announcements from key AWS executives, partners, and customers. This year’s agenda offers a full additional day of content, boot camps, hands-on labs, workshops, new Alexa Hack Your Office and Smart Cities hackathons, a Robocar Rally, and the first ever Deep Learning Summit.  Designed for developers to learn about the latest in deep learning research and emerging trends, attendees of the Deep Learning Summit will hear from members of the academic and venture capital communities who will share their perspectives in a series of thirty-minute lightening talks. To offer all of its great educational content, networking opportunities and recreational activities, AWS is practically taking over the Las Vegas strip, offering an expanded campus with a larger re:Invent footprint and more venues (not to mention a shuttle service!).

2nd Watch is proud to be a 2017 Platinum Sponsor and attending AWS re:Invent for the sixth consecutive year. With every re:Invent conference we attend, we continue to gain unique insight into what attendees can expect.  Similar to last year, our seasoned re:Invent alumni have compiled a list of The Top 7 Things to Avoid at re:Invent 2017 and we hope you find the following information useful as you prepare to attend AWS re:Invent next week.

1.  Avoid the long lines at Registration (and at the Swag Counter!)

The re:Invent Registration Desk will open early again this year starting Sunday afternoon from 1pm-10pm, giving attendees a few extra hours to check in and secure their conference badges.  Registration Desks are located in four locations this year—Aria, MGM Grand, Mirage, and The Venetian—so no matter where your hotel room is along the strip, you’re sure to find a Registration Desk close by.  This is particularly helpful so that you don’t have to schlepp around all that conference swag you will receive upon check in.  As always, you can’t attend any part of re:Invent until you have your conference badge so be sure you check into Registration as early as possible.  This will also ensure that you get the size shirt you want from the Swag Counter!

Expert Tip:  Like last year, AWS has added an additional level of security and will be printing each attendee’s photograph onto their badge.  Avoid creating a backlog at the registration line because you have to have your photo taken on site.  Take a few minutes to upload your photo prior to re:Invent here.  BONUS: By uploading your own photo, you make sure to put your best face forward for the week.

2.  Avoid Arriving Without a Plan:

The worst thing you can do at re:Invent is show up without a plan for how you will spend your week in Vegas—that includes the breakout sessions you want to attend.  With expanded venues and a total of over 1,000 sessions (twice as many as 2016), more hands-on labs, boot camps and one-on-one engagement opportunities, AWS re:Invent 2017 offers more breadth and depth and more chances to learn from the experts than ever before.

If you haven’t already done so, be sure to check out the AWS Event Catalogue and start selecting the sessions that matter most to you.  While you’re building your session schedule, might I recommend adding 2nd Watch’s breakout session—Continuous Compliance on AWS at Scale—to your list of must attend sessions? This session will be led by cloud security experts Peter Meister and Lars Cromley and will focus on the need for continuous security and compliance in cloud migrations. Attendees will learn how a managed cloud provider can use automation and cloud expertise to successfully control these issues at scale in a constantly changing cloud environment.  Find it in the Event Catalog by searching for SID313 and then add it to your session agenda.  Or, click here to skip the search and go directly to the session page.

Expert Tip: Be sure to download the AWS re:Invent Mobile App. Leveraging the mobile app is like having your own, personal re:Invent assistant for the week and will hold your re:Invent schedule, maps from venue to venue, all other activities and reminders, providing a super helpful resource as you navigate the conference.  Android users click here to download. Apple users click here to download.

3.  Avoid Avoiding the Waitlist

AWS re:Invent 2017 is SOLD OUT and we anticipate nearly 50,000 people to be in attendance this year.  That means, if you haven’t already built your session agenda for the week, you’re likely to find that the *ONE SESSION* you needed to attend is already at capacity.  Avoid missing out on sessions by adding yourself to the waitlist for any sessions that you really want to attend.  You will be surprised by the number of people that “no-show” to sessions that they have registered for so don’t be afraid to stand in line for that all-too-important session.

4.  Avoid Not Knowing Where to Go

As mentioned previously, the re:Invent campus has expanded, yet again, this year and there are a few more venues to note when preparing your event schedule.  Spanning the length of the Las Vegas strip, events will occur at the MGM Grand, Aria, Mirage, Venetian, Palazzo, Sands Expo Hall, the Linq Parking Lot, and the Encore.  Each venue will host tracts devoted to specific topics so to help you get organized—and map out your week, literally—here’s what you can expect to find at each venue:

MGM Grand: Business Apps, Enterprise, Security, Compliance, Identity, and Windows.
Aria: Analytics & Big Data, Alexa, Container, IoT, AI & Machine Learning, and Serverless.
Mirage: Bootcamps, Certifications, and Certification Exams.
Venetian / Palazzo / Sands Expo Hall: Architecture, AWS Marketplace & Service Catalog, Compute, Content Delivery, Database, DevOps, Mobile, Networking, and Storage.
Linq Lot: Alexa Hackathons, Gameday, Jam Sessions, re:Play Party, and Speaker Meet & Greets.
Encore: Bookable meeting space.

Once you’ve nailed down where you need to be, be sure to allow enough time to get from session to session.  While there are breaks between sessions, some venues can be a bit of a hike from others so be sure to plan accordingly.  You’ll want to factor in the time it takes to walk between venues as well as the number of people that will be doing the same.  As re:Invent continues to grow in size, you can certainly expect that escalators, elevators, hallways, sidewalks and lengthy shuttle lines are going to be difficult to navigate. To help you get a sense of standard walking times between venues, AWS has put together a nifty chart that details all the travel information you might need (minus any stops on casino floors or crowds of folks clogging your path).

This year, AWS is offering a shuttle service between venues if you don’t want to walk or need to get to your next destination quickly.

AWS recommends allowing yourself 30 minutes to travel between venues and is providing the following shuttle schedule to help you get from Point A to Point B:

Sunday, November 26: 12PM-1:30AM
Monday, November 27: 6AM-12:30AM
Tuesday, November 28: 6AM-10PM
Wednesday, November 29: 6AM-12:30AM
Thursday, November 30: 6AM-12:30AM
Friday, December 1: 6AM-3PM

NOTE: BELLAGIO SHUTTLES RUN ONLY DURING AM AND PM PEAK HOURS (SUNDAY 10PM — 1:30AM, MONDAY — THURSDAY 6AM — 10AM & 4PM — 7:30PM, FRIDAY 6AM — 10AM)

Expert Tip: If you need to get from The Palazzo to The Venetian and want to avoid navigating the casino floors, restaurant row and the crowds around the entrance to The Sands Convention Center, head to the Canyon Ranch Spa from either hotel. From the Palazzo, the spa is located on the 4th floor and from the Venetian it is located on the 3rd floor.  The spa connects both venues through a series of long, colorful and rarely traveled corridors making the trip quick and easy for those who don’t mind taking the road less traveled.  Not to mention, this route can also offer a moment of peaceful sanity!

5.  Avoid Sleeping In, Being Late, or Skipping Out Entirely

With so many learning and networking opportunities, it’s easy to get caught up in exciting—yet exhaustive—days full of breakout sessions, hands-on labs, training sessions, and of course, after-hours activities and parties.  Only you know how to make the most of your time at re:Invent, but if we can offer some advice…be sure to get plenty of sleep and avoid sleeping in, getting to sessions late or worse…skipping out on morning sessions entirely.  Especially when it comes to the keynote sessions on Wednesday and Thursday morning!

AWS CEO, Andy Jassy, will present the Wednesday morning keynote, while Amazon CTO, Werner Vogels, will present on Thursday morning.  Both Keynotes will be full of exciting product announcements, enhancements, and feature additions as well as cool technical content and enterprise customer success stories.  Don’t be the last to know because you inadvertently over slept and/or partied a little too hard the night before!

Customers don’t need to reserve a seat in either of the keynotes, however, there is a cap on the number of attendees who can watch the keynote in the keynote hall. Keynotes are offered on a first come, first served basis, so be sure to get there early.

Expert Tip: If you don’t want to wait in line to sit in the keynote hall, AWS will have many options for watching the keynote in overflow rooms. If you’re still groggy from the previous night’s events, the overflow rooms are an ideal place where you can watch the keynote with a bloody mary, mimosa, or coffee.

6.  Avoid Being Anti-Social

AWS re:Invent is one of the best locations to network and connect with like-minded peers and cloud experts, discover new partner offerings and, of course, let loose at the quirky after-hours experiences, attendee parties, and partner-sponsored events.

Avoid being anti-social by taking advantage of the many opportunities to network with others and meet new people. AWS has some great activities planned for conference goers.  To help you play hard while working harder, here is a list of all the fun activities that are planned for re:Invent 2017:

Harley Ride
When: Sunday, November 26, 12PM-6PM
Where: The Venetian

Robocar Rally Mixer
When: Sunday, November 26, 6PM-10PM
Where: Aria

Non-Profit Hackathon Mixer
When: Sunday, November 26, 7PM-9PM
Where: The Venetian

Midnight Madness
When: Sunday, November 26, 10:30PM-1AM
Where: The Venetian

AWS re:Invent 4K
When: Tuesday, November 28, 6AM-8AM
Where: The Mirage

Welcome Reception
When: Tuesday, November 28, 5PM-7PM
Where: The Venetian & The Linq Lot

Boomball
When: Tuesday, November 28, 5PM-7PM
Where: The Linq Lot

Cyclenation Spin Challenge
When: Wednesday, November 29, (three timeslots) 7AM, 8AM, 5PM
Where: The Mirage

Pub Crawl
When: Wednesday, November 29, 5:30PM-7:30PM
Where: MGM Grand & The Venetian

Tatonka Challenge
When: Wednesday, November 29, 5:30PM-7:30PM
Where: MGM Grand & The Venetian

2nd Watch After Party
When: Wednesday, November 29, 9PM-12AM
Where: Rockhouse at the Palazzo
Click here to be added to the 2nd Watch After Party Waitlist (see “What to Avoid #3 above if you’re hesitant to be waitlisted)!

Fitness Bootcamp
When: Thursday, November 30, (three timeslots) 7AM, 8AM, 5PM
Where: The Mirage

re:Play Party
When: Thursday, November 30, 8PM-12AM
Where: The Park at the Linq Lot

Expert Tip: Don’t forget to bring plenty of business cards.  With so many people to meet, opportunities to connect with peers and experts, and after-hours parties to attend, you’ll want to make sure to pack extra cards to avoid running out early in the week.  When you receive a business card from someone else, try to immediately take a photo of it with your smartphone and save it to a photo album dedicated solely to networking.  This will ensure that you have the details stored somewhere should you happen to misplace an important contact’s business card.

7.  Avoid Forgetting to Pack That All-Too-Important Item

Whether you’re staying at The Venetian, Mirage, Encore, or other property, your hotel room will be your home away from home for nearly an entire week.  Of course, every hotel will have in-room amenities and travel essentials, but inevitably, we all will forget that one important item that we won’t be able to live without, especially for an entire week.  Our experts have pulled together a check list to help you pack for your trip and ensure you have all the comforts of home and office during your week in Vegas.

Your Favorite Toiletries: 

Not everyone is in love with the in-room toiletries that hotels have to offer in each of their suites. If you have a favorite, be sure to bring it. Here is a quick list to ensure you don’t forget something:

  • Shampoo
  • Conditioner
  • Soap
  • Shave Cream
  • After Shave
  • Razor
  • Deodorant
  • Lotion
  • Toothbrush
  • Toothpaste
  • Mouthwash
  • Floss
  • Hair Styling Products (if that’s your thing)
  • Contact Case & Solution
  • Spare Pair of Contacts
  • Cologne/Perfume/Body Spray

First Aid: 

Whether your headache or hangover cure calls for Aspirin, Ibuprophen, or something stronger, it’s a good idea to pack your preferred treatment along with any other first aid remedies and prescription medications you might need. Band Aids, blister protectors, and anti-histamines are also recommended.

Chapstick & Lotion: 

It is the desert, after all, and with dry air circulating throughout the venues, your skin (including your lips) is bound to dry out.  We recommend bringing medicated Chapstick and fragrance-free lotion (fragrances in most lotions can often dry out your skin even more!) and keeping a spare with you at all times.

Breath Mints and/or Mint-flavored Gum:

No explanation necessary.

Business cards:

This is a repeat from one of our other tips but an important one to remember, so we don’t mind mentioning it again.

Chargers & Battery Packs: 

Nothing is worse than being in between sessions with a 10% cell phone or laptop battery and realizing you left your chargers back in your room. We recommend bringing at least two phone chargers and two laptop chargers: One for your room and one for the backpack or briefcase you’ll be carrying throughout the conference.  Additionally, while there will be several charging stations throughout re:Invent (and outlets on most every wall), it’s a good idea to bring a battery pack with several hours of charging time just in case you can’t find an open spot to plug in.

Water Bottle:

You will definitely want to stay hydrated throughout the week, and the tiny cups offered at the water stations just won’t quench your thirst quite the way you will need them to.  It’s a good idea to pack a water bottle (we recommend one that can hold 17 oz) so that you avoid having to refill often and have plenty of thirst-quenching liquid to keep you hydrated throughout the day.

Comfortable shoes: 

Your shoes will be your saving grace by the end of each day, so be sure to bring a pair or two that you feel comfortable walking several thousands of steps in.

Appropriate Attire: 

While business casual attire is often recommended at re:Invent, there can be many interpretations of what is appropriate.  Our advice is to pack clothing that you would feel confident wearing should you run into your boss or someone you wish to impress.  Jeans are perfectly acceptable in either case, but make sure to use good judgement overall when selecting your attire for sessions, dinners and parties you plan to attend.

Cash: 

In addition to needing cash for meals on the go, bar tabs or that faux diamond-encrusted figurine you’ve been eyeing in the gift shop, you’ll want to bring a little extra cash if you plan to try your luck at the casinos.  There are ATMs on the casino floors, but they typically charge a service fee in the amount of $3-$5 in addition to your bank’s own service fees.

Notebook & Pen/Pencil:

It’s always a good idea to bring a good ole’ fashioned notebook with you to your sessions.  Not only is it a fail-proof way to capture the handy tips and tricks you’ll be learning, it’s also the quietest way to track those notable items that you don’t want to forget.  Think about it – if 100 people in your breakout session were all taking notes on a laptop, it would be pretty distracting.  Be bold. Be respectful. Be the guy/gal that uses paper and a pen.

A Few Final Thoughts

Whether this is your first trip to AWS re:Invent or you’re a seasoned re:Invent pro, you’re sure to walk away with an increased knowledge of how cloud computing can better help your business, tips and tricks for navigating new AWS products and features, and a week’s worth of memories that will last a lifetime.  We hope you make the most of your re:Invent 2017 experience and take advantage of the incredible education and networking opportunities that AWS has in store this year.

Last but certainly not least, we hope you take a moment during your busy week to visit 2nd Watch in booth #1104 of the Expo Hall where we will be showcasing our customers’ successes.  You can explore 2nd Watch’s Managed Cloud Solutions, pick up a coveted 2nd Watch t-shirt and find out how you can win one of our daily contest giveaways—a totally custom, totally rad 2nd Watch skateboard!

Expert Tip: Make sure you get time with one of 2nd Watch’s Cloud Journey Masters while at re:Invent.  Plan ahead and schedule a meeting with one of 2nd Watch’s AWS Professional Certified Architects, DevOps, or Engineers.  Last but not least, 2nd Watch will be hosting its annual re:Invent after party on Wednesday, November 29. If you haven’t RSVP’d for THE AWS re:Invent Partner Party, click here to request to be added to our waitlist.  We look forward to seeing you at AWS re:Invent 2017!

 

-Katie Ellis, Marketing Manager

 


2nd Watch Enterprise Cloud Expertise On Display at AWS re:Invent 2017

AWS re:Invent is less than twenty days away and 2nd Watch is proud to be a 2017 Platinum Sponsor for the sixth consecutive year.  As an Amazon Web Services (AWS) Partner Network Premier Consulting Partner, we look forward to attending and demonstrating the strength of our cloud design, migration, and managed services offerings for enterprise organizations at AWS re:Invent 2017 in Las Vegas, Nevada.

About AWS re:Invent

Designed for AWS customers, enthusiasts and even cloud computing newcomers, the nearly week-long conference is a great source of information and education for attendees of all skill levels. AWS re:Invent is THE place to connect, engage, and discuss current AWS products and services via breakout sessions ranging from introductory and advanced to expert as well as to hear the latest news and announcements from key AWS executives, partners, and customers. This year’s agenda offers a full additional day of content for even more learning opportunities, more than 1,000 breakout sessions, an expanded campus, hackathons, boot camps, hands-on labs, workshops, expanded Expo hours, and the always popular Amazonian events featuring broomball, Tatonka Challenge, fitness activities, and the attendee welcome party known as re:Play.

2nd Watch at re:Invent 2017

 2nd Watch has been a Premier Consulting Partner in the AWS Partner Network (APN) since 2012 and was recently named a leader in Gartner’s Magic Quadrant for Public Cloud Infrastructure Managed Service Providers, Worldwide (March 2017). We hold AWS Competencies in Financial Services, Migration, DevOps, Marketing, and Commerce, Life Sciences and Microsoft Workloads, and have recently completed the AWS Managed Service Provider (MSP) Partner Program Audit for the third year in a row. Over the past decade, 2nd Watch has migrated and managed AWS deployments for companies such as Crate & Barrel, Condé Nast, Lenovo, Motorola, and Yamaha.

The 2nd Watch breakout session—Continuous Compliance on AWS at Scale—will be led by cloud security experts Peter Meister and Lars Cromley. The session will focus on the need for continuous security and compliance in cloud migrations, and attendees will learn how a managed cloud provider can use automation and cloud expertise to successfully control these issues at scale in a constantly changing cloud environment. Registered re:Invent Full Conference Pass holders can add the session to their agendas here.

In addition to our breakout session, 2nd Watch will be showcasing our customers’ successes in the Expo Hall located in the Sands Convention Center (between The Venetian and The Palazzo hotels).  We invite you to stop by booth #1104 where you can explore 2nd Watch’s Managed Cloud Solutions, pick up a coveted 2nd Watch t-shirt and find out how you can win one of our daily contest giveaways—a totally custom 2nd Watch skateboard!

Want to make sure you get time with one of 2nd Watch’s Cloud Journey Masters while at re:Invent?  Plan ahead and schedule a meeting with one of 2nd Watch’s AWS Professional Certified Architects, DevOps, or Engineers.  Last but not least, 2nd Watch will be hosting its annual re:Invent after party on Wednesday, November 29. If you haven’t RSVP’d for THE AWS re:Invent Partner Party, click here to request your invitation (Event has passed)

AWS re:Invent is sure to be a week full of great technical learning, networking, and social opportunities.  We know you will have a packed schedule but look forward to seeing you there!  Be on the lookout for my list of “What to Avoid at re:Invent 2017” in the coming days…it’s sure to help you plan for your trip and get the most out of your AWS re:Invent experience.

 

–Katie Laas-Ellis, Marketing Manager, 2nd Watch

 

Gartner Disclaimer

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

About 2nd Watch

2nd Watch is an AWS Premier tier Partner in the AWS Partner Network (APN) providing managed cloud to enterprises. The company’s subject matter experts, software-enabled services and cutting-edge solutions provide companies with tested, proven, and trusted solutions, allowing them to fully leverage the power of the cloud. 2nd Watch solutions are high performing, robust, increase operational excellence, decrease time to market, accelerate growth and lower risk. Its patent-pending, proprietary tools automate everyday workload management processes for big data analytics, digital marketing, line-of-business and cloud native workloads. 2nd Watch is a new breed of business which helps enterprises design, deploy and manage cloud solutions and monitors business critical workloads 24×7. 2nd Watch has more than 400 enterprise workloads under its management and more than 200,000 instances in its managed public cloud. The venture-backed company is headquartered in Seattle, Washington. To learn more about 2nd Watch, visit www.2ndwatch.com or call 888-317-7920.


Best Practices: Developing a Tag Strategy for AWS Billing

Tag Strategy is key to Cost Allocation for Cloud Applications.

Have you ever been out to dinner with a group of friends and at the end of the dinner the waiter comes back with one bill?  Most of us have experienced this.  Depending on the group of friends it’s not a big deal and everyone drops in a credit card so the bill can be split evenly.  Other times, someone invites Harry and Sally, and they scrutinize the bill line by line.  Inevitably they protest that they only had one glass of wine and Sally only had the salad.  You recall that Sally was  a little ‘handsy’ with the sampler platter, but you sit quietly.  It’s in that moment you remember, that’s why the group didn’t include Harry and Sally to last year’s New Year’s dinner.  No need to start the new year with an audit, am I right?

This situation can be eerily similar in many ways to cloud billing in a large enterprise.  The fact that Amazon Web Services (AWS) has changed the way that an organization uses computing resources is evident.  However, AWS has also delivered on the promise of truly enabling ‘chargeback’ or ‘showback’ in the enterprise so that the business units themselves are stakeholders in what was traditionally silo’d in an IT Department budget.

Now multiple stake holders from many organizations have a stake in the cost and usage of an app that resides in AWS.  Luckily there are tools like 2nd Watch’s Cloud Management Platform (CMP) that can easily provide visibility to the cost of their app, or even what their entire infrastructure is costing them at the click of a button.

2nd Watch’s CMP tools are great for showing an organization’s costs and can even be used to set budget notifications so that the business unit doesn’t inadvertently spend more than is budgeted on an environment.   CMP is a powerful tool that can deliver powerful insights to your business and can be made more powerful by implementing a thorough tagging strategy.

Tag your it…

We live in a world of tags and hashtags.  Seemingly overnight tags have made their way into everyday language.  This is not by accident as cloud interactions with Facebook and Twitter have become so commonplace, they have altered the world’s language.

Beyond their emergence in our everyday vernacular, they have a key function. In AWS, applying tags to various cloud resources like EC2 and RDS is key to having quality accounting for allocating charges.  Our team of experts at 2nd Watch can work with you to ensure that your tagging strategy is implemented in the most effective manner for your organization.  After all, a tagging strategy can and will vary by organization.  It depends on you and how you want to be able to report on your resources.  Do you want to be able to report on your resources used by cost center, application, environment type (like dev or prod), owner, department, geographic area, or if this resource was managed by a managed service provider like 2nd Watch?

Without having a well thought out tagging strategy your invoicing discussions will sound much like the fictional dinner described above.  Who pays for what and why?

Tag Strategy and Hygiene…

Implementing a sound tagging strategy at the outset when a resource or environment is deployed is the first step.  At the inception it’s important to know some “gotchas” that can derail a tagging implementation.  One of these is that tags are case sensitive.  For example, mktg will report separately from Mktg.  Also keep in mind, that in today’s ever changing business environment organizations are forced to adjust and reorganize themselves to stay competitive.

Revisiting your tagged resource strategy will need to be done from time to time to ensure tag relevance.  If a stake holder moves out of a role, gets promoted, or retires from the organization altogether, you will need to stay on top of the tagging for their environment to be sure that it is still relevant to the new organization.

What about the un-taggables?

Having standardization and a tag plan works great for AWS resources like EC2 and RDS as explained before.  What about untaggable resources, Network transfer charges, and items like a NAT gateway or a VPC Endpoint?   There will be shared resources like these in your applications environment. It is best to review these shared untagged resources early on, and decide where to best allocate that cost.

At 2nd Watch, we have these very discussions with our clients on a regular basis. We can easily guide them through the resources associated with the app and where to allocate each cost.  With a tool like CMP we can configure a client’s cost allocation hierarchy so they can view their ongoing costs in real time.

For it’s part, Amazon does a great job providing an up-to-date user guide for what resources can be tagged.  Click here for great reference documentation to help while you develop your tag strategy.

Rinse and repeat as necessary

Your tagging strategy can’t be a ‘fire and forget’ pronouncement.  To be effective your organization will need to enforce it on a consistent basis. For instance, as new devops personnel are brought into an organization, it will be key to insuring it stays under control.

These are the types of discussions that 2nd Watch adds a lot of value to.  Our cloud expertise in AWS for large enterprises will insure that you are able to precisely account for your cloud infrastructure spend at the click of a button through CMP.

After all, we all want to enjoy our meal and move on with the next activity. Stop by and visit us at re:Invent booth #1104 for more help.

— Paul Wells, Account Manager, 2nd Watch


What to Expect at AWS re:Invent 2017

The annual Amazon Web Services (AWS) re:Invent conference is just around the corner (the show kicks off November 27 in Las Vegas). Rest assured, there will be lots of AWS-related products, partners, and customer news. Not to mention, more than a few parties. Here’s what to expect at AWS re:Invent 2017—and a few more topics we hope to hear about.

1.)  Focus on IOT, Machine Learning, and Big Data

IOT, Machine Learning, and Big Data are top of mind with much of the industry—insert your own Mugatu “so hot right now” meme here – and we expect all three to be front and center at this year’s re:Invent conference. These Amazon Web Services are ripe for adoption, as most IT shops lack the capabilities to deploy these types of services on their own.  We expect to see advancements in AWS IOT usability and features. We’ve already seen some early enhancements to AWS Greengrass, most notably support for additional programming languages, and would expect additional progress to be displayed at re:Invent. Other products that we expect to see advancement made are with AWS Athena and AWS Glue.

In the Machine Learning space, we were certainly excited about the recent partnership between Amazon Web Services and Microsoft around Gluon, and expect a number of follow-up announcements geared toward making it easier to adopt ML into one’s applications. As for Big Data, we imagine Amazon Web Service to continue sniping at open source tools that can be used to develop compelling services. We also would be eager to see more use of AWS Lambda for in-flight ETL work, and perhaps a long-running Lambda option for batch jobs.

2.)  Enterprise Security

To say that data security has been a hot topic these past several months, would be a gross understatement. From ransomware to the Experian breach to the unsecured storage of private keys, data security has certainly been in the news. In our September Enterprise Security Survey, 73% of respondents who are IT professionals don’t fully understand the public cloud shared responsibility model.

Last month, we announced our collaboration with Palo Alto Networks to help enterprises realize the business and technical benefits of securely moving to the public cloud. The 2nd Watch Enterprise Cloud Security Service blends 2nd Watch’s Amazon Web Services expertise and architectural guidance with Palo Alto Networks’ industry-leading VM series of security products. To learn more about security and compliance, join our re:Invent breakout session—Continuous Compliance on AWS at Scale— by registering for ID number SID313 from the AWS re:Invent Session Catalogue. The combination delivers a proven enterprise cloud security offering that is designed to protect customer organizations from cyberattacks, in hybrid or cloud architectures. 2nd Watch is recognized as the first public cloud-native managed security provider to join the Palo Alto Networks, NextWave Channel Partner Program. We are truly excited about this new service and collaboration, and hope you will visit our booth (#1104) or Palo Alto Networks (#2409) to learn more.

As for Amazon Web Services, we fully expect to see a raft of announcements. Consistent with our expectations around ML and Big Data, we expect to hear about enhanced ML-based anomaly detection, logging and log analytics, and the like. We also expect to see advancements to AWS Shield and AWS Organizations, which were both announced at last year’s show. Similarly, we wouldn’t be surprised by announced functionality to their web app firewall, AWS WAF. A few things we know customers would like are easier, less labor-intensive management and even greater integration into SecDevOps workflows. Additionally, customers are looking for better integration with third-party and in-house security technologies – especially   application scanning and SIEM solutions – for a more cohesive security monitoring, analysis, and compliance workflow.

The dynamic nature of the cloud creates specific challenges for security. Better security and visibility for ephemeral resources such as containers, and especially for AWS Lambda, are a particular challenge, and we would be extremely surprised not to see some announcements in this area.

Lastly, General Data Protection Regulations (GDPR) will be kicking in soon, and it is critical that companies get on top of this. We expect Amazon Web Service to make several announcements about improved, secure storage and access, especially with respect to data sovereignty. More broadly, we expect that Amazon Web Service will announce improved tools and services around compliance and governance, particularly with respect to mapping deployed or planned infrastructure against the control matrices of various regulatory schemes.

3.)  Parties!

We don’t need to tell you that AWS’ re:Play Party is always an amazing, veritable visual, and auditory playground.  Last year, we played classic Street Fighter II while listening to Martin Garrix bring the house down (Coin might have gotten ROFLSTOMPED playing Ken, but it was worth it!).  Amazon Web Services always pulls out all the stops, and we expect this year to be the best yet.

2nd Watch will be hosting its annual party for customers at the Rockhouse at the Palazzo.  There will be great food, an open bar, an awesome DJ, and of course, a mechanical bull. If you’re not yet on the guest list, request your invitation TODAY! We’d love to connect with you, and it’s a party you will not want to miss.

Bonus: A wish list of things 2nd Watch would like to see released at AWS re:Invent 2017

Blockchain – Considering the growing popularity of blockchain technologies, we wouldn’t be surprised if Amazon Web Service launched a Blockchain as a Service (BaaS) offering, or at least signaled their intent to do so, especially since Azure already has a BaaS offering.

Multi-region Database Option – This is something that would be wildly popular but is incredibly hard to accomplish. Having an active-active database strategy across regions is critical for production workloads that operate nationwide and require high uptime.  Azure already offers it with their Cosmos DB (think of it as a synchronized, multi-region DynamoDB), and we doubt Amazon Web Service will let that challenge stand much longer. It is highly likely that Amazon Web Service has this pattern operating internally, and customer demand is how Amazon Web Service services are born.

Nifi – The industry interest in Nifi data-flow orchestration, often analogized to the way parcel services move and track packages, has been accelerating for many reasons, including its applicability to IoT and for its powerful capabilities around provenance. We would love to see AWS DataPipeline re-released as Nifi, but with all the usual Amazon Web Services provider integrations built in.

If even half our expectations for this year’s re:Invent are met, you can easily see why the 2nd Watch team is truly excited about what Amazon Web Services has in store for everyone. We are just as excited about what we have to offer to our customers, and so we hope to see you there!

Schedule a meeting with one of our AWS Professional Certified Architects, DevOps or Engineers and don’t forget to come visit us in booth #1104 in the Expo Hall!  See you at re:Invent 2017!

 

— Coin Graham, Senior Cloud Consultant and John Lawler, Senior Product Manager, 2nd Watch


Migrating to Terraform v0.10.x

When it comes to managing cloud-based resources, it’s hard to find a better tool than Hashicorp’s Terraform. Terraform is an ‘infrastructure as code’ application, marrying configuration files with backing APIs to provide a nearly seamless layer over your various cloud environments. It allows you to declaratively define your environments and their resources through a process that is structured, controlled, and collaborative.

One key advantage Terraform provides over other tools (like AWS CloudFormation) is having a rapid development and release cycle fueled by the open source community. This has some major benefits: features and bug fixes are readily available, new products from resource providers are quickly incorporated, and you’re able to submit your own changes to fulfill your own requirements.

Hashicorp recently released v0.10.0 of Terraform, introducing some fundamental changes in the application’s architecture and functionality. We’ll review the three most notable of these changes and how to incorporate them into your existing Terraform projects when migrating to Terraform v.0.10.x.

  1. Terraform Providers are no longer distributed as part of the main Terraform distribution
  2. New auto-approve flag for terraform apply
  3. Existing terraform env commands replaced by terraform workspace

A brief note on Terraform versions:

Even though Terraform uses a style of semantic versioning, their ‘minor’ versions should be treated as ‘major’ versions.

1. Terraform Providers are no longer distributed as part of the main Terraform distribution

The biggest change in this version is the removal of provider code from the core Terraform application.

Terraform Providers are responsible for understanding API interactions and exposing resources for a particular platform (AWS, Azure, etc). They know how to initialize and call their applications or CLIs, handle authentication and errors, and convert HCL into the appropriate underlying API calls.

It was a logical move to split the providers out into their own distributions. The core Terraform application can now add features and release bug fixes at a faster pace, new providers can be added without affecting the existing core application, and new features can be incorporated and released to existing providers without as much effort. Having split providers also allows you to update your provider distribution and access new resources without necessarily needing to update Terraform itself. One downside of this change is that you have to keep up to date with features, issues, and releases of more projects.

The provider repos can be accessed via the Terraform Providers organization in GitHub. For example, the AWS provider can be found here.

Custom Providers

An extremely valuable side-effect of having separate Terraform Providers is the ability to create your own, custom providers. A custom provider allows you to specify new or modified attributes for existing resources in existing providers, add new or unsupported resources in existing providers, or generate your own resources for your own platform or application.

You can find more information on creating a custom provider from the Terraform Provider Plugin documentation.

1.1 Configuration

The nicest part of this change is that it doesn’t really require any additional modifications to your existing Terraform code if you were already using a Provider block.

If you don’t already have a provider block defined, you can find their configurations from the Terraform Providers documentation.

You simply need to call the terraform init command before you can perform any other action. If you fail to do so, you’ll receive an error informing you of the required actions (img 1a).

After successfully reinitializing your project, you will be provided with the list of providers that were installed as well as the versions requested (img 1b).

You’ll notice that Terraform suggests versions for the providers we are using – this is because we did not specify any specific versions of our providers in code. Since providers are now independently released entities, we have to tell Terraform what code it should download and use to run our project.

(Image 1a: Notice of required reinitialization)

 

 

 

 

 

 

 

 

(Image 1b: Response from successful reinitialization)

 

 

 

 

 

 

 

 

Providers are released separately from Terraform itself, and maintain their own version numbers.

You can specify the version(s) you want to target in your existing provider blocks by adding the version property (code block 1). These versions should follow the semantic versioning specification (similar to node’s package.json or python’s requirements.txt).

For production use, it is recommended to limit the acceptable provider versions to ensure that new versions with breaking changes are not automatically installed.

(Code Block 1: Provider Config)

provider "aws" {
  version = "0.1.4"
  allowed_account_ids = ["1234567890"]
  region = "us-west-2"
}

 (Image 1c: Currently defined provider configuration)

 

 

 

 

 

 

 

 

2. New auto-approve flag for terraform apply

In previous versions, running terraform apply would immediately apply any changes between your project and saved state.

Your normal workflow would likely be:
run terraform plan followed by terraform apply and hope nothing changed in between.

This version introduced a new auto-approve flag which will control the behavior of terraform apply.

Deprecation Notice

This flag is set to true to maintain backwards compatibility, but will quickly change to false in the near future.

2.1 auto-approve=true (current default)

When set to true, terraform apply will work like it has in previous versions.

If you want to maintain this functionality, you should upgrade your scripts, build systems, etc now as this default value will change in a future Terraform release.

(Code Block 2: Apply with default behavior)

# Apply changes immediately without plan file
terraform apply --auto-approve=true

2.2 auto-approve=false

When set to false, Terraform will present the user with the execution plan and pause for interactive confirmation (img 2a).

If the user provides any response other than yes, terraform will exit without applying any changes.

If the user confirms the execution plan with a yes response, Terraform will then apply the planned changes (and only those changes).

If you are trying to automate your Terraform scripts, you might want to consider producing a plan file for review, then providing explicit approval to apply the changes from the plan file.

(Code Block 3: Apply plan with explicit approval)

# Create Plan
terraform plan -out=tfplan

# Apply approved plan
terraform apply tfplan --auto-approve=true

(Image 2a: Terraform apply with execution plan)

 

 

 

 

 

 

3. Existing terraform env commands replaced by terraform workspace

The terraform env family of commands were replaced with terraform workspace to help alleviate some confusion in functionality. Workspaces are very useful, and can do much more than just split up environment state (which they aren’t necessarily used for). I recommend checking them out and seeing if they can improve your projects.

There is not much to do here other than switch the command invocation, but the previous commands still currently work for now (but are deprecated).

 

License Warning

You are using an UNLICENSED copy of Scroll Office.

Do you find Scroll Office useful?
Consider purchasing it today: https://www.k15t.com/software/scroll-office

 

— Steve Byerly, Principal SDE (IV), Cloud, 2nd Watch


High Performance Computing (HPC) – An Introduction

When we talk about high performance computing ( HPC ) we are typically trying to solve some type of problem. These problems will generally fall into one of four types:

  • Compute Intensive – A single problem requiring a large amount of computation.
  • Memory Intensive – A single problem requiring a large amount of memory.
  • Data Intensive – A single problem operating on a large data set.
  • High Throughput – Many unrelated problems that are be computed in bulk.

 

In this post, I will provide a detailed introduction to High Performance Computing ( HPC ) that can help organizations solve the common issues listed above.

Compute Intensive Workloads

First, let us take a look at compute intensive problems. The goal is to distribute the work for a single problem across multiple CPUs to reduce the execution time as much as possible. In order for us to do this, we need to execute steps of the problem in parallel. Each process­—or thread—takes a portion of the work and performs the computations concurrently. The CPUs typically need to exchange information rapidly, requiring specialization communication hardware. Examples of these types of problems are those that can be found when analyzing data that is relative to tasks like financial modeling and risk exposure in both traditional business and healthcare use cases. This is probably the largest portion of HPC problem sets and is the traditional domain of HPC.

When attempting to solve compute intensive problems, we may think that adding more CPUs will reduce our execution time. This is not always true. Most parallel code bases have what we call a “scaling limit”. This is in no small part due to the system overhead of managing more copies, but also to more basic constraints.

This is summed up brilliantly in Amdahl’s law.

In computer architecture, Amdahl’s law is a formula which gives the theoretical speedup in latency of the execution of a task at fixed workload that can be expected of a system whose resources are improved. It is named after computer scientist Gene Amdahl, and was presented at the AFIPS Spring Joint Computer Conference in 1967.

Amdahl’s law is often used in parallel computing to predict the theoretical speedup when using multiple processors. For example, if a program needs 20 hours using a single processor core, and a particular part of the program which takes one hour to execute cannot be parallelized, while the remaining 19 hours (p = 0.95) of execution time can be parallelized, then regardless of how many processors are devoted to a parallelized execution of this program, the minimum execution time cannot be less than that critical one hour. Hence, the theoretical speedup is limited to at most 20 times (1/(1 − p) = 20). For this reason, parallel computing with many processors is useful only for very parallelizable programs.

– Wikipedia

Amdahl’s law can be formulated the following way:

 

where

  • S latency is the theoretical speedup of the execution of the whole task.
  • s is the speedup of the part of the task that benefits from improved system resources.
  • p is the proportion of execution time that the part benefiting from improved resources originally occupied.

Chart Example: If 95% of the program can be parallelized, the theoretical maximum speed up using parallel computing would be 20 times.

Bottom line: As you create more sections of your problem that are able to run concurrently, you can split the work between more processors and thus, achieve more benefits. However, due to complexity and overhead, eventually using more CPUs becomes detrimental instead of actually helping.

There are libraries that help with parallelization, like OpenMP or Open MPI, but before moving to these libraries, we should strive to optimize performance on a single CPU, then make p as large as possible.

Memory Intensive Workloads

Memory intensive workloads require large pools of memory rather than multiple CPUs. In my opinion, these are some of the hardest problems to solve and typically require great care when building machines for your system. Coding and porting is easier because memory will appear seamless, allowing for a single system image.  Optimization becomes harder, however, as we get further away from the original creation date of your machines because of component uniformity. Traditionally, in the data center, you don’t replace every single server every three years. If we want more resources in our cluster, and we want performance to be uniform, non-uniform memory produces actual latency. We also have to think about the interconnect between the CPU and the memory.

Nowadays, many of these concerns have been eliminated by commodity servers. We can ask for thousands of the same instance type with the same specs and hardware, and companies like Amazon Web Services are happy to let us use them.

Data Intensive Workloads

This is probably the most common workload we find today, and probably the type with the most buzz. These are known as “Big Data” workloads. Data Intensive workloads are the type of workloads suitable for software packages like Hadoop or MapReduce. We distribute the data for a single problem across multiple CPUs to reduce the overall execution time. The same work may be done on each data segment, though not always the case. This is essentially the inverse of a memory intensive workload in that rapid movement of data to and from disk is more important than the interconnect. The type of problems being solved in these workloads tend to be Life Science (genomics) in the academic field and have a wide reach in commercial applications, particularly around user data and interactions.

High Throughput Workloads

Batch processing jobs (jobs with almost trivial operations to perform in parallel as well as jobs with little to no inter-CPU communication) are considered High Throughput workloads. In high throughput workloads, we create an emphasis on throughput over a period rather than performance on any single problem. We distribute multiple problems independently across multiple CPUs to reduce overall execution time. These workloads should:

  • Break up naturally into independent pieces.
  • Have little or no inter CPU communcation
  • Be performed in separate processes or threads on a separate CPU (concurrently)

 

Workloads that are compute intensive jobs can likely be broken into high throughput jobs, however, high throughput jobs do not necessarily mean they are CPU intensive.

HPC On Amazon Web Services

Amazon Web Services (AWS) provides on-demand scalability and elasticity for a wide variety of computational and data-intensive workloads, including workloads that represent many of the world’s most challenging computing problems: engineering simulations, financial risk analyses, molecular dynamics, weather prediction, and many more.   

– AWS: An Introduction to High Performance Computing on AWS

Amazon literally has everything you could possibly want in an HPC platform. For every type of workload listed here, AWS has one or more instance classes to match and numerous sizes in each class, allowing you to get very granular in the provisioning of your clusters.

Speaking of provisioning, there is even a tool called CfnCluster which creates clusters for HPC use. CfnCluster is a tool used to build and manage High Performance Computing (HPC) clusters on AWS. Once created, you can log into your cluster via the master node where you will have access to standard HPC tools such as schedulers, shared storage, and an MPI environment.

For data intensive workloads, there a number of options to help get your data closer to your computer resources.

  • S3
  • Redshift
  • DynamoDB
  • RDS

 

EBS is even a viable option for creating large scale parallel file systems to meet high-volume, high-performance, and throughput requirements of workloads.

HPC Workloads & 2nd Watch

2nd Watch can help you solve complex science, engineering, and business problems using applications that require high bandwidth, enhanced networking, and very high compute capabilities.

Increase the speed of research by running high performance computing ( HPC ) in the cloud and reduce costs by paying for only the resources that you use, without large capital investments. With 2nd Watch, you have access to a full-bisection, high bandwidth network for tightly coupled, IO-intensive workloads, which enables you to scale out across thousands of cores for throughput-oriented applications. Contact us today to learn more about High Performance Computing ( HPC )

2nd Watch Customer Success

Celgene is an American biotechnology company that manufactures drug therapies for cancer and inflammatory disorders. Read more about their cloud journey and how they went from doing research jobs that previously took weeks or months, to just hours. Read the case study.

We have also helped a global finance & insurance firm prove their liquidity time and time again in the aftermath of the 2008 recession. By leveraging the batch computing solution that we provided for them, they are now able to scale out their computations across 120,000 cores while validating their liquidity with no CAPEX investment. Read the case study.

 

– Lars Cromley, Director of Engineering, Automation, 2nd Watch


Standardizing & Automating Infrastructure Development Processes

Introduction

Let’s start with a small look at the current landscape of technology and how we arrived here. There aren’t very many areas of tech that have not been, or are not currently, in a state of fluctuation. Everything from software delivery vehicles and development practices, to infrastructure creation has experienced some degree of transformation over the past several years. From VMs to Containers, it seems like almost every day the technology tool belt grows a little bigger, and our world gets a little better (though perhaps more complex) due to these advancements. For me, this was incredibly apparent when I began to delve into configuration management which later evolved into what we now call “infrastructure as code”.

The transformation of the development process began with simple systems that we once used to manage a few machines (like bash scripts or Makefiles) which then morphed into more complex systems (CF Engine, Puppet, and Chef) to manage thousands of systems. As configuration management software became more mature, engineers and developers began leaning on them to do more things. With the advent of hypervisors and the rise of virtual machines, it was only a short time before hardware requests changed to API requests and thus the birth of infrastructure as a service (IaaS). With all the new capabilities and options in this brave new world, we once again started to lean on our configuration management systems—this time for provisioning, and not just convergence.

Provisioning & Convergence

I mentioned two terms that I want to clarify; provisioning and convergence. Say you were a car manufacturer and you wanted to make a car. Provisioning would be the step in which you request the raw materials to make the parts for your automobile. This is where we would use tools like Terraform, CloudFormation, or Heat. Whereas convergence is the assembly line by which we check each part and assemble the final product (utilizing config management software).

By and large, the former tends to be declarative with little in the way of conditionals or logic, while the latter is designed to be robust and malleable software that supports all the systems we run and plan on running. This is the frame for the remainder of what we are going to talk about.

By separating the concerns of our systems, we can create a clear delineation of the purpose for each tool so we don’t feel like we are trying to jam everything into an interface that doesn’t have the most support for our platform or more importantly our users. The remainder of this post will be directed towards the provisioning aspect of configuration management.

Standards and Standardization

These are two different things in my mind. Standardization is extremely prescriptive and can often seem particularly oppressive to professional knowledge workers, such as engineers or developers. It can be seen as taking the innovation away from the job. Whereas standards provide boundaries, frame the problem, and allow for innovative ways of approaching solutions. I am not saying standardization in some areas is entirely bad, but we should let the people who do the work have the opportunity to grow and innovate in their own way with guidance. The topic of standards and standardization is part of a larger conversation about culture and change. We intend to follow up with a series of blog articles relating to organizational change in the era of the public cloud in the coming weeks.

So, let’s say that we make a standard for our new EC2 instances running Ubuntu. We’ll say that all instances must be running the la official Canonical Ubuntu 14.04 AMI and must have these three tags; Owner, Environment, and Application. How can we enforce that in development of our infrastructure? On AWS, we can create AWS Config Rules, but that is reactive and requires ad-hoc remediation. What we really want is a more prescriptive approach bringing our standards closer to the development pipeline. One of the ways I like to solve this issue is by creating an abstraction. Say we have a terraform template that looks like this:

# Create a new instance of the la Ubuntu 14.04 on an
provider "aws" { region = "us-west-2"
}

data "aws_ami" "ubuntu" { most_recent = true

filter {
name	= "name" values =
["ubuntu/images/hvm-ssd/ubuntu-trusty-1 4.04-amd64-server-*"]
}

filter {
name	= "virtualization-type" values = ["hvm"]
}

owners = ["099720109477"] # Canonical
}

resource "aws_instance" "web" { ami	=
"${data.aws_ami.ubuntu.id}" instance_type = "t2.micro"

tags {
Owner	= "DevOps Ninja" Environment = "Dev" Application = "Web01"
}
}

This would meet the standard that we have set forth, but we are relying on the developer or engineer to adhere to that standard. What if we enforce this standard by codifying it in an abstraction? Let’s take that existing template and turn it into a terraform module instead.

Module

# Create a new instance of the la Ubuntu 14.04 on an

variable "aws_region" {} variable "ec2_owner" {} variable "ec2_env" 
{} variable "ec2_app" {}
variable "ec2_instance_type" {}

provider "aws" {
region = "${var.aws_region}"
}

data "aws_ami" "ubuntu" { most_recent = true

filter {
name	= "name" values =
["ubuntu/images/hvm-ssd/ubuntu-trusty-1 4.04-amd64-server-*"]
}

filter {
name	= "virtualization-type" values = ["hvm"]
}

owners = ["099720109477"] # Canonical
}

resource "aws_instance" "web" { ami	=
"${data.aws_ami.ubuntu.id}" instance_type =
"${var.ec2_instance_type}"

tags {
Owner	= "${var.ec2_owner}" Environment = "${var.ec2_env}" Application = 
"${var.ec2_app}"
}
}

Now we can have our developers and engineers leverage our tf_ubuntu_ec2_instance module.

New Terraform Plan

module "Web01" { source =
"git::ssh://git@github.com/SomeOrg/tf_u buntu_ec2_instance"

aws_region = "us-west-2" ec2_owner = "DevOps Ninja" ec2_env	= "Dev"
ec2_app	= "Web01"
}

This doesn’t enforce the usage of the module, but it does create an abstraction that provides an easy way to maintain standards without a ton of overhead, it also provides an example for further creation of modules that enforce these particular standards.

This leads us into another method of implementing standards but becomes more prescriptive and falls into the category of standardization (eek!). One of the most underutilized services in the AWS product stable has to be Service Catalog.

AWS Service Catalog allows organizations to create and manage catalogs of IT services that are approved for use on AWS. These IT services can include everything from virtual machine images, servers, software, and databases to complete multi-tier application architectures. AWS Service Catalog allows you to centrally manage commonly deployed IT services, and helps you achieve consistent governance and meet your compliance requirements, while enabling users to quickly deploy only the approved IT services they need.

The Interface

Once we have a few of these projects in place (e.g. a service catalog or a repo full of composable modules for infrastructure that meet our standards) how do we serve them out? How you spur adoption of these tools and how they are consumed can be very different depending on your organization structure. We don’t want to upset workflow and how work gets done, we just want it to go faster and be more reliable. This is what we talk about when we mention the interface. Whichever way work flows in, we should supplement it with some type of software or automation to link those pieces of work together. Here are a few examples of how this might look (depending on your organization):

1.) Central IT Managed Provisioning

If you have an organization that manages requests for infrastructure, having this new shift in paradigm might seem daunting. The interface in this case is the ticketing system. This is where we would create an integration with our ticketing software to automatically pull the correct project from service catalog or module repo based on some criteria in the ticket. The interface doesn’t change but is instead supplemented by some automation to answer these requests, saving time and providing faster delivery of service.

2.) Full Stack Engineers

If you have engineers that develop software and the infrastructure that runs their applications this is the easiest scenario to address in some regards and the hardest in others. Your interface might be a build server, or it could simply be the adoption of an internal open source model where each team develops modules and shares them in a common place, constantly trying to save time and not re-invent the wheel.

Supplementing with software or automation can be done in a ton of ways. Check out an example Kelsey Hightower wrote using Jira.

“A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.” – John Gall

All good automation starts out with a manual and well-defined process. Standardizing & automating infrastructure development processes begins with understanding how our internal teams can be best organized.  This allows us to efficiently perform work before we can begin automating. Work with your teammates to create a value stream map to understand the process entirely before doing anything towards the efforts of automating a workflow.

With 2nd Watch designs and automation you can deploy quicker, learn faster and modify as needed with Continuous Integration / Continuous Deployment (CI/CD). Our Workload Solutions transform on-premises workloads to digital solutions in the public cloud with next generation products and services.  To accelerate your infrastructure development so that you can deploy faster, learn more often and adapt to customer requirements more effectively, speak with a 2nd Watch cloud deployment expert today.

– Lars Cromley, Director of Engineering, Automation, 2nd Watch


Automating Windows Patching of EC2 Autoscaling Group Instances

Background

Dealing with Windows patching can be a royal pain as you may know.  At least once a month Windows machines are subject to system security and stability patches, thanks to Microsoft’s Patch Tuesday. With Windows 10 (and its derivatives), Microsoft has shifted towards more of a Continuous Delivery model in how it manages system patching. It is a welcome change, however, it still doesn’t guarantee that Windows patching won’t require a system reboot.

Rebooting an EC2 instance that is a member of an Auto Scaling Group (depending upon how you have your Auto Scaling health-check configured) is something that will typically cause an Elastic Load Balancing (ELB) HealthCheck failure and result in instance termination (this occurs when Auto Scaling notices that the instance is no longer reporting “in service” with the load balancer). Auto Scaling will of course replace the terminated instance with a new one, but the new instance will be launched using an image that is presumably unpatched, thus leaving your Windows servers vulnerable.

The next patch cycle will once again trigger a reboot and the vicious cycle continues. Furthermore, if the patching and reboots aren’t carefully coordinated, it could severely impact your application performance and availability (think multiple Auto Scaling Group members rebooting simultaneously). If you are running an earlier version of Windows OS (e.g. Windows Server 2012r2), rebooting at least once a month on Patch Tuesday is an almost certainty.

Another major problem with utilizing the AWS stock Windows AMIs with Auto Scaling is that AWS makes those AMIs unavailable after just a few months. This means that unless you update your Auto Scaling Launch Configuration to use the newer AMI IDs on a continual basis, future Auto Scaling instance launches will fail as they try to access an AMI that is no longer accessible. Anguish.

Automatically and Reliably Patch your Auto-Scaled Windows instances

Given the aforementioned scenario, how on earth are you supposed to automatically and reliably patch your Auto-Scaled Windows instances?!

One approach would be to write some sort of an orchestration layer that detects when Auto Scaling members have been patched and are awaiting their obligatory reboot, suspend Auto Scaling processes that would detect and replace perceived failed instances (e.g. HealthCheck), and then reboot the instances one-by-one. This would be rather painful to orchestrate and has a potentially severe drawback that cluster capacity is reduced by N-1 during the rebooting (maybe more if you don’t take into account service availability between reboots).

Reducing capacity to N-1 might not be a big deal if you have a cluster of 20 instances but if you are running a smaller cluster of something— say 4, 3, or 2 instances—then that has a significant impact to your overall cluster capacity. And, if you are running on an Auto Scaling group with a single instance (not as uncommon as you might think) then your application is completely down during the reboot of that single member. This of course doesn’t solve the issue of expired stock AWS AMIs.

Another approach is to maintain and patch a “golden image” that the Auto Scaling Launch Configuration uses to create new instances from. If you are unfamiliar with the term, a golden-image is an operating system image that has everything pre-installed, configured, and saved in a pre-baked image file (an AMI in the case of Amazon EC2). This approach requires a significant amount of work to make this happen in a reasonably automated fashion and has numerous potential pitfalls.

While it prevents a service outage by replacing the unavailable public AMI with a stock AMI, you still need a way to reliably and automatically handle this process. Using a tool like Hashicorp’s Packer can get you partially there, but you would still have to write a number of Providers to handle the installation of Windows Update and anything else you need to do in order to prep the system for imaging. In the end, you would still have to develop or employ a fair number of tools and processes to completely automate the entire process of detecting new Windows Updates, creating a patched AMI with those updates, and orchestrating the update of your Auto Scaling Groups.

A Cloud-Minded Approach

I believe that Auto Scaling Windows servers intelligently requires a paradigm shift. One assumption we have to make is that some form of configuration management (e.g. Puppet, Chef)—or at least a basic bootstrap script executed via cfn-init/UserData—is automating the configuration of the operating system, applications, and services upon instance launch. If configuration management or bootstrap scripts are not in play, then it is likely that a golden-image is being utilized. Without one of these two approaches, you don’t have true Auto Scaling because it would require some kind of human interaction to configure a server (ergo, not “auto”) every time a new instance was created.

Both approaches (launch-time configuration vs. golden-image) have their pros and cons. I generally prefer launch-time configuration as it allows for more flexibility, provides for better governance/compliance, and enables pushing changes dynamically. But…(and this is especially true of Windows servers) sometimes launch-time configuration simply takes longer to happen than is acceptable, and the golden-image approach must be used to allow for a more rapid deployment of new Auto Scaling group instances.

Either approach can be easily automated using a solution like to the one I am about to outline, and thankfully AWS publishes new stock Windows Server AMIs immediately following every Patch Tuesday.  This means, if you aren’t using a golden-image, patching your instances is as simple as updating your Auto Scaling Launch Configuration to use the new AMI(s) and preforming a rolling replacement of the instances. Even if you are using a golden-image or applying some level of customization to the stock AMI, you can easily integrate Packer into the process to create a new patched image that includes your customizations.

The Solution

At a high level, the solution can be summarized as:

  1. An Orchestration Layer (e.g. AWS SNS and Lambda, Jenkins, AWS Step Functions) that detects and responds when new patched stock Windows AMIs have been released by Amazon.
  2. A Packer Launcher process that manages launching Packer jobs in order to create custom AMIs. Note: This step is only required If copying AWS stock AMIs to your own AWS account is desired OR if you want to apply customization to the stock AMI. Either use case requires that the custom images are available indefinitely. We solved this problem by creating a Packer Launcher process by creating an EC2 instance with a Python UserData script that launches Packer jobs (in parallel) to create copies of the new stock AMIs into our AWS account. Note: if you are using something like Jenkins, this could be handled by having Jenkins launch a local script or even a Docker container to manage launching Packer jobs.
  3. A New AMI Messaging Layer (e.g. Amazon SNS) to publish notifications when new/patched AMIs have been created
  4. Some form of an Auto Scaling Group Rolling Updater will be required to replace exiting Auto Scaling Group instances with new ones based on the Patched AMI.

Great news for anyone using AWS CloudFormation… CFT inherently supports Rolling Updates for Auto Scaling Groups! Utilizing it requires attaching an UpdatePolicy and adding a UserData or cfn-init script to notify CloudFormation when the instance has finished its configuration and is reporting as healthy (e.g. InService on the ELB). There are some pretty good examples of how to accomplish this using CloudFormation out there, but here is one specifically that AWS provides as an example.

If you aren’t using CloudFormation, all hope is not lost. With Hashicorp Terraform’s ever increasing popularity for deploying and managing AWS infrastructure as code, Terraform has still yet to implement a Rolling Update feature for AWS Auto Scaling Groups. There is a Terraform feature request from a few years ago for this exact feature, but as of today, it is not yet available, nor do the Terraform developers have any short-term plans to implement it. However, several people (including Hashicorp’s own engineers) have developed a number of ways to work around the lack of an integrated Auto Scaling Group Rolling Updater in Terraform. Here are a few I like:

Of course, you can always roll your own solution using a combination of AWS services (e.g. SNS, Lambda, Step Functions), or whatever tooling best fits your needs. Creating your own solution will allow you added flexibility if you have additional requirements that can’t be met by CloudFormation, Terraform, or other orchestration tool.

The following is an example framework for performing automated Rolling Updates to Auto Scaling Groups utilizing AWS SNS and AWS Lambda:

a.  An Auto Scaling Launch Config Modifier worker that subscribes to the New AMI messaging layer performs an update to the Auto Scaling Launch Configuration(s) when a new AMI is released. In this use case, we are using an AWS Lambda function to subscribe to an SNS topic. Upon notification of new AMIs, the worker must then update the predefined (or programmatically derived) Auto Scaling Launch Configurations to use the new AMI. This is best handled by using infrastructure templating tools like CloudFormation or Terraform to make updating the Auto Scaling Launch Configuration ImageId as simple as updating a parameter/variable in the template and performing an update/apply operation.

b.  An Auto Scaling Group Instance Cycler messaging layer (again, an Amazon SNS topic) to be notified when an Auto Scaling Launch Configuration ImageId has been updated by the worker.

c.  An Auto Scaling Group Instance Cycler worker that will perform replacing the Auto Scaling Group instances in a safe, reliable, and automated fashion. For example, another AWS Lambda function that will subscribe to the SNS topic and trigger new instances by increasing the Auto Scaling Desired Instance count to a value of twice the current number of ASG instances.

d.  Once the scale-up event generated by the Auto Scaling Group Instance Cycler worker has completed and the new instances are reporting as healthy, another message will be published to the Auto Scaling Group Instance Cycler SNS topic indicating scale-up has completed.

e.  The Auto Scaling Group Instance Cycler worker will respond to the prior event and return the Auto Scaling group back to its original size which will terminate the older instances leaving the Auto Scaling Group with only the patched instances launched from the updated AMI. This assumes that we are utilizing the default AWS Auto Scaling Termination Policy which ensures that instances launched from the oldest Launch Configurations are terminated first.

NOTE: The AWS Auto Scaling default termination policy will not guarantee that the older instances are terminated first! If the Auto Scaling Group is spanned across multiple Availability Zones (AZ) and there is an imbalance in the number of instances in each AZ, it will terminate the extra instance(s) in that AZ before terminating based on the oldest Launch Configuration. Terminating on Launch Configuration age will certainly ensure that the oldest instances will be replaced first. My recommendation is to use the OldestInstance termination policy to make absolutely certain that the oldest (i.e. unpatched) instances are terminated during the Instance Cycler scale-down process.  Consult the AWS documentation on the Auto Scaling termination policies for more on this topic.

In Conclusion

Whichever solution you choose to implement to handle the Rolling Updates to your Auto Scaling Group, the solution outlined above will provide you with a sure-fire way to ensure your Windows Auto Scaled servers are always patched automatically and minimize the operational overhead for ensuring patch compliance and server security. And the good news is that the heavy lifting is already being handled by AWS Auto Scaling and Hashicorp Packer. There is a bit of trickery to getting the Packer configs and provisioners working just right with the EC2 Config service and Windows Sysprep, but there are a number of good examples out on github to get you headed in the right direction. The one I referenced in building our solution can be found here.

One final word of caution... if you do not disable the EC2Config Set Computer Name option when baking a custom AMI, your Windows hostname will ALWAYS be reset to the EC2Config default upon reboot. This is especially problematic for configuration management tools like Puppet or Chef which may use the hostname as the SSL Client Certificate subject name (default behavior), or for deriving the system role/profile/configuration.

Here is my ec2config.ps1 Packer provisioner script which disables the Set Computer Name option:

$EC2SettingsFile="C:\\Program
Files\\Amazon\\Ec2ConfigService\\Settin
gs\\Config.xml"
$xml = [xml](get-content
$EC2SettingsFile)
$xmlElement =
$xml.get_DocumentElement()
$xmlElementToModify =
$xmlElement.Plugins
foreach ($element in
$xmlElementToModify.Plugin)
{
if ($element.name -eq
"Ec2SetPassword")
{
$element.State="Enabled"
}
elseif ($element.name -eq
"Ec2SetComputerName")
{
$element.State="Disabled"
}
elseif ($element.name -eq
"Ec2HandleUserData")
{
$element.State="Enabled"
}
elseif ($element.name -eq
"Ec2DynamicBootVolumeSize")
{
$element.State="Enabled"
}
}
$xml.Save($EC2SettingsFile)

Hopefully, at this point, you have a pretty good idea of how you can leverage existing software, tools, and services—combined with a bit of scripting and automation workflow—to reliably and automatically manage the patching of your Windows Auto Scaling Group EC2 instances!  If you require additional assistance, are resource-bound for getting something implemented, or you would just like the proven Cloud experts to manage Automating Windows Patching of your EC2 Autoscaling Group Instances, contact 2nd Watch today!

 

Disclaimer

We strongly advise that processes like the ones described in this article be performed on a environment prior to production to properly validate that the changes have not negatively affected your application’s functionality, performance, or availability.

 This is something that your orchestration layer in the first step should be able to handle. This is also something that should integrate well with a Continual Integration and/or Delivery workflow.

 

-Ryan Kennedy, Principal Cloud Automation Architect, 2nd Watch