How to Federate Amazon Redshift Access with Azure Active Directory

Single sign-on (SSO) is a tool that solves fundamental problems, especially in midsize and large organizations with lots of users.

End users do not want to have to remember too many username and password combinations. IT administrators do not want to have to create and manage too many different login credentials across enterprise systems. It is a far more manageable and secure approach to federate access and authentication through a single identity provider (IdP).

As today’s enterprises rely on a wide range of cloud services and legacy systems, they have increasingly adopted SSO via an IdP as a best practice for IT management. All access and authentication essentially flows through the IdP wherever it is supported. Employees do not have to remember multiple usernames and passwords to access the tools they need to do their jobs. Just as importantly, IT teams prevent an administrative headache. They manage a single identity per user, which makes tasks like removing access when a person leaves the organization much simpler and less prone to error.

The same practice extends to AWS. As we see more customers migrate to the cloud platform, we hear a growing need for the ability to federate access to Amazon Redshift when they use it for their data warehouse needs.

Database administration used to be a more complex effort. Administrators had to figure out which groups a user belonged to, which objects a user or group were authorized to use, and other needs—in manual fashion. These user and group lists—and their permissions—were traditionally managed within the database itself, and there was often a lot of drift between the database and the company directory.

Amazon Redshift administrators face similar challenges if they opt to manage everything within Redshift itself. There is a better way, though. They can use an enterprise IdP to federate Redshift access, managing users and groups within the IdP and passing the credentials to Amazon Redshift at login.

We increasingly hear from our clients, “We use Azure Active Directory (AAD) for identity management—can we essentially bring it with us as our IdP to Amazon Redshift?”

They want to use AAD with Redshift the way they use it elsewhere, to manage their users and groups in a single place to reduce administrative complexity. With Redshift, specifically, they also want to be able to continue managing permissions for those groups in the data warehouse itself. The good news is you can do this and it can be very beneficial.

Without a solution like this, you would approach database administration in one of two alternative ways:

  1. You would provision and manage users using AWS Identity and Access Management (IAM). This means, however, you will have another identity provider to maintain—credentials, tokens, and the like—separate from an existing IdP like AAD.
  2. You would do all of this within Redshift itself, creating users (and their credentials) and groups and doing database-level management. But this creates similar challenges to legacy database management, and when you have thousands of users, it simply does not scale.

Our technical white paper outlines how to federate access to Amazon Redshift using Azure Active Directory as your IdP, passing user and group information through to the database at login.

Download the technical white paper

-Rob Whelan, Data & Analytics Practice Director

How to Federate Amazon Redshift Access with Okta

Single sign-on (SSO) is a tool that solves fundamental problems, especially in midsize and large organizations with lots of users.

End users do not want to have to remember too many username and password combinations. IT administrators do not want to have to create and manage too many different login credentials across enterprise systems. It is a far more manageable and secure approach to federate access and authentication through a single identity provider (IdP).

As today’s enterprises rely on a wide range of cloud services and legacy systems, they have increasingly adopted SSO via an IdP as a best practice for IT management. All access and authentication essentially flows through the IdP wherever it is supported. Employees do not have to remember multiple usernames and passwords to access the tools they need to do their jobs. Just as importantly, IT teams prevent an administrative headache: They manage a single identity per user, which makes tasks like removing access when a person leaves the organization much simpler and less prone to error.

The same practice extends to AWS. As we see more customers migrate to the cloud platform, we hear a growing need for the ability to federate access to Amazon Redshift when they use it for their data warehouse needs.

Database administration used to be a more complex effort. Administrators had to figure out which groups a user belonged to, which objects a user or group were authorized to use, and other needs—in manual fashion. These user and group lists—and their permissions—were traditionally managed within the database itself, and there was often a lot of drift between the database and the company directory.

Amazon Redshift administrators face similar challenges if they opt to manage everything within Redshift itself. There is a better way, though. They can use an enterprise IdP to federate Redshift access, managing users and groups within the IdP and passing the credentials to Amazon Redshift at login.

We increasingly hear from our clients, “We use Okta for identity management—can we essentially bring it with us as our IdP to Amazon Redshift?” They want to use Okta with Redshift the way they use it elsewhere, to manage their users and groups in a single place to reduce administrative complexity. With Redshift, specifically, they also want to be able to continue managing permissions for those groups in the data warehouse itself. The good news is you can do this and it can be very beneficial.

Without a solution like this, you would approach database administration in one of two alternative ways:

  1. You would provision and manage users using AWS Identity and Access Management (IAM). This means, however, you will have another identity provider to maintain—credentials, tokens, and the like—separate from an existing IdP like Okta.
  2. You would do all of this within Redshift itself, creating users (and their credentials) and groups and doing database-level management. But this creates similar challenges to legacy database management, and when you have thousands of users, it simply does not scale.

Our technical white paper covers how to federate access to Amazon Redshift using Okta as your IdP, passing user and group information through to the database at login. We outline the step-by-step process we follow when we implement this solution for 2nd Watch clients, including the modifications we found were necessary to ensure everything worked properly. We explain how to set up a trial account at Okta.com, build users and groups within the organization’s directory, and enable single sign-on (SSO) into Amazon redshift.

Download the technical white paper

-Rob Whelan, Data & Analytics Practice Director

Top Enterprise IT Trends for 2021

Between the global pandemic and the resulting economic upheaval, it’s fair to say many businesses spent 2020 in survival mode. Now, as we turn the page to 2021, we wonder what life will look like in this new normalcy. Whether it is employees working from home, the shift from brick and mortar to online sales and delivery, or the need to accelerate digital transformation efforts to remain competitive, 2021 will be a year of re-invention for most companies.

How might the new normal impact your company? Here are five of the top technology trends we predict will drive change in 2021:

1. The pace of cloud migration will accelerate

Most companies, by now, have started the journey to the public cloud or to a hybrid cloud environment. The events of 2020 have added fuel to the fire, creating an urgency to maximize cloud usage within companies that now understand that the speed, resilience, security and universal access provided by cloud services is vital to the success of the organization.

“By the end of 2021, based on lessons learned in the pandemic, most enterprises will put a mechanism in place to accelerate their shift to cloud-centric digital infrastructure and application services twice as fast as before the pandemic,” says Rick Villars, group vice president, worldwide research at IDC. “Spending on cloud services, the hardware and software underpinning cloud services, and professional and managed services opportunities around cloud services will surpass $1 trillion in 2024,”

The progression for most companies will be to ensure customer-facing applications take priority. In the next phase of cloud migration, back-end functionality embodied in ERP-type applications will move to the cloud. The easiest and fastest way to move applications to the cloud is the simple lift-and-shift, where applications remain essentially unchanged. Companies looking to improve and optimize business processes, though, will most likely refactor, containerize, or completely re-write applications. They will turn to “cloud native” approaches to their applications.

2. Artificial intelligence (AI) and machine learning (ML) will deliver business insight

Faced with the need to boost revenue, cut waste, and squeeze out more profits during a period of economic and competitive upheaval, companies will continue turning to AI and machine learning to extract business insight from the vast trove of data most collect routinely, but don’t always take advantage of.

According to a recent PwC survey of more than 1,000 executives, 25% of companies reported widespread adoption of AI in 2020, up from 18% in 2019. Another 54% are moving quickly toward AI. Either they have started implementing limited use cases or they are in the proof-of-concept phase and are looking to scale up. Companies report the deployment of AI is proving to be an effective response to the challenges posed by the pandemic.

Ramping up AI and ML capabilities in-house can be a daunting task, but the major hyperscale cloud providers have platforms that enable companies to perform AI and ML in the cloud. Examples include Amazon’s SageMaker, Microsoft’s Azure AI and Google’s Cloud AI.

Edge computing will take on greater importance

For companies that can’t move to the cloud because of regulatory or data security concerns, edge computing is emerging as an attractive option. With edge computing, data processing is performed where the data is generated, which reduces latency and provides actionable intelligence in real time. Common use cases include manufacturing facilities, utilities, transportation, oil and gas, healthcare, retail and hospitality.

The global edge computing market is expected to reach $43.4 billion by 2027, fueled by an annual growth rate of nearly 40%, according to a report from Grand View Research.

The underpinning of edge computing is IoT, the instrumentation of devices (everything from autonomous vehicles to machines on the factory floor to a coffee machine in a fast-food restaurant) and the connectivity between the IoT sensor and the analytics platform. IoT platforms generate a vast amount of real-time data, which must be processed at the edge because it would too expensive and impractical to transmit that data to the cloud.

Cloud services providers recognize this reality and are now bringing forth specific managed service offerings for edge computing scenarios, such as Amazon’s new IoT Greengrass service that extends cloud capabilities to local devices, or Microsoft’s Azure IoT Edge.

4. Platform-as-a-Service will take on added urgency

To increase the speed of business, companies are shifting to cloud platforms for application development, rather than developing apps in-house. PaaS offers a variety of benefits, including the ability to take advantage of serverless computing delivering scalability, flexibility and quicker time to develop and release new apps. Popular serverless platforms include Amazon Lambda and Microsoft’s Azure Functions.

5. IT Automation will increase

Automating processes across the entire organization is a key trend for 2021, with companies prioritizing and allocating money for this effort. Automation can cut costs and increase efficiency in a variety of areas – everything from Robotics Process Automation (RPA) to automate low-level business processes, to the automation of security procedures such as anomaly detection or incident response, to automating software development functions with new DevOps tools.

Gartner predicts that, through 2024, enhancements in analytics and automatic remediation capabilities will refocus 30% of IT operations efforts from support to continuous engineering. And by 2023, 40% of product and platform teams will use AIOps for automated change risk analysis in DevOps pipelines, reducing unplanned downtime by 20%.

Tying it all together

These trends are not occurring in isolation.  They’re all part of the larger digital transformation effort that is occurring as companies pursue a multi-cloud strategy encompassing public cloud, private cloud and edge environments. Regardless of where the applications live or where the processing takes place, organizations are seeking ways to use AI and machine learning to optimize processes, conduct predictive maintenance and gain critical business insight as they try to rebound from the events of 2020 and re-invent themselves for 2021 and beyond.

Where will 2021 take you? Contact us for guidance on how you can take hold of these technology trends to maximize your business results and reach new goals.

-Mir Ali, Field CTO

Demystifying DevOps

My second week at 2nd Watch, it happened.  I was minding my own business, working to build a framework for our products and how they’d be released when suddenly, an email dinged into my inbox. I opened it and gasped. It read in sum:

  • We want to standardize on our DevOps tooling
  • We have this right now https://landscape.cncf.io/images/landscape.png
  • We need to pick one from each bucket
  • We want to do this next week with everyone in the room, so we don’t create silos

This person sent me the CNCF Landscape and asked us to choose which tool to use from each area to enable “DevOps” in their org. Full stop.

I spent the better part of the day attempting to respond to the message without condescension or simply saying ‘no.’ I mean, the potential client had a point. He said they had a “planet of tools” and wanted to narrow it down to 1 for each area. A noble cause. They wanted a single resource from 2nd Watch who had used all the tools to help them make a decision. They had very honorable intentions while they were clearly misguided from the start.

The thing is, this situation hasn’t just come up once in my career in the DevOps space. It’s a rampant misunderstanding of the part tooling, automation, and standardization play into the DevOps transformation. Normalizing the tech stack across your enterprise wastes time and creates enemies. This is not DevOps.

In this blog, I’ll attempt to share my views on how to approach your transformation. Hint: We’re not spending time in the CNCF Landscape.

Let’s start with a definition.

DevOps is a set of cultural values and organizational practices that improve business outcomes by increasing collaboration and feedback between Business Stakeholders, Development, QA, IT Operations, and Security.

A true DevOps transformation includes an evolution of your company culture, automation and tooling, processes, collaboration, measurement systems, and organizational structure.

DevOps does not have an ‘end state’ or level of maturity. The goal is to create a culture of continuous improvement.

DevOps methods were initially formed to bridge the gap between Development and Operations so that teams could increase speed to delivery as well as quality of product at the same time. So, it does seem fair that many feel a tool can bridge this gap. However, without increasing communication and shaping company culture to enable reflection and improvement, tooling ends up being a one-time fix for a consistently changing and evolving challenge. Imagine trying to patch a hole in a boat with the best patch available while the rest of the crew is poking holes in the hull. That’s a good way to imagine how implementing tooling to fix cultural problems works.

Simply stated – No, you cannot implement DevOps by standardizing on tooling.

Why is DevOps so difficult?

Anytime you have competing priorities between departments you end up with conflict and difficulty. Development is focused on speed and new features while Operations is required to keep the systems up and running. The two teams have different goals, some of which undo each other.

For example: As the Dev team is trying to get more features to market faster, the Ops team is often causing a roadblock so that they can keep to their Service Level Agreement (SLA). An exciting new feature delivered by Dev may be unstable, causing downtime in Ops. When there isn’t a shared goal of uptime and availability, both teams suffer.

CHART: Dev vs. Ops

Breaking Down Walls

Many equate the goal of conflict between Dev and Ops to a wall. It’s a barrier between the two teams, and oftentimes Dev, in their effort to increase velocity, will throw the new feature or application over the proverbial wall to Ops and expect them to deal with running the app at scale.

Companies attempt to fix this problem by increasing communications between the two teams, though, without shared responsibility and goals, the “fast to market” Dev team and the “keep the system up and running” Ops team are still at odds.

CI/CD ≠ DevOps

I’ll admit, technology can enable your DevOps transformation, though don’t be fooled by the marketing that suggests simply implementing continuous integration and continuous delivery (CI/CD) tooling will kick-start your DevOps journey. Experience shows us that implementing automation without changing culture just results in speeding up our time to failure.

CI/CD can enable better communication, automate the toil or rework that exists, and make your developers and ops team happier. Though, it takes time and effort to train them on the tooling, evaluate processes and match or change your companies’ policies, and integrate security. One big area that is often overlooked is executive leadership acceptance and support of these changes.

The Role of Leadership in DevOps

The changes implemented during a DevOps Transformation all have a quantifiable impact on a company’s profit, productivity and market share. It is no wonder that organizational leaders and executives have impact on these changes. The challenge is their impact is often overlooked up front in the process causing a transformation to stall considerably and producing mixed results.

According to the book Accelerate: Building and Scaling High Performing Technology Organizations by Nicole Forsgren, PhD, Jez Humble, and Gene Kim, the “characteristics of transformational leadership are highly correlated with software delivery performance.” Leaders need to up their game by enabling cross-functional collaboration, building a climate of learning, and supporting effective use and choice of tools (not choosing for the team like we saw at the beginning of this blog).

Leaders should be asking themselves upfront “how can I create a culture of learning and where do we start?” The answer to this question is different for every organization, though some similarities exist. One of the best ways to figure out where to start is to figure out where you are right now.

2nd Watch offers a 2-week DevOps Assessment and Strategy engagement that helps you identify and assess a single value stream to pinpoint the areas for growth and transformation. This is a good start to identify key areas for improvement. This assessment follows the CALMS framework originally developed by Damon Edwards and John Willis at DevOpsDays 2010 and then enhanced by Jez Humble. It’s an acronym representing the fundamental elements of DevOps:  Culture, Automation, Lean, Metrics, and Sharing.

  • Culture – a culture of shared responsibility, owned by everyone
  • Automation – eliminating toil, make automation continuous
  • Lean – visualize work in progress, limit batch sizes and manage queue lengths
  • Metrics – collect data on everything and provide visibility into systems and data
  • Sharing – encourage ongoing communication, build channels to enable this

While looking at each of these pillars we overlay the people, process and technology aspects to ensure full coverage of the methodology.

  • People:We assess your current team organizational structure, culture, and the role security plays. We also assess what training may be required to get your team off on the right foot. ​
  • Process:We review your collaboration methods, processes and best practices. ​
  • Technology:We identify any gaps in your current tool pipeline including image creation, CI/CD, monitoring, alerting, log aggregation, security and compliance and configuration management. ​

What’s Next?

DevOps is over 10 years old and it’s moving out of the phase of concept and ‘nice to have’ into necessity for companies to keep up in the digital economy. New technology like serverless, containers, and machine learning and a focus on security is shifting the landscape, making it more essential that companies adopt the growth mindset expected in a DevOps culture.

Don’t be left behind. DevOps is no longer a fad diet, it’s a lifestyle change. And if you can’t get enough of Demystifying Devops, check out our webinar on ‘Demystifying Devops: Identifying Your Path to Faster Releases and Higher Quality’ on-demand.

-Stefana Muller, Sr Product Manager

Troubleshooting VMware HCX

I was on a project recently where we had to set up VMware HCX in an environment to connect the on-premises datacenter to VMware Cloud on AWS for a Proof of Concept migration.  The workloads were varied, ranging from 100 MB to 5TB in size.  The customer wanted to stretch two L2 subnets and have the ability to migrate slowly, in waves.  During the POC, we found problems with latency between the two stretched networks and decided that the best course of action would be to NOT stretch the networks and instead, do an all-at-once migration.

While setting up this POC, I had occasion to do some troubleshooting on HCX due to connectivity issues.  I’m going to walk through some of the troubleshooting I needed to do.

The first thing we did was enable SSH on the NSX manager.  To perform this action, you go into the HCX manager appliance GUI and under Appliance Summary, start the SSH service.  Once SSH is enabled, you can then login to the appliance CLI, which is where the real troubleshooting can begin.

You’ll want to login to the appliance using “admin” as the user name and the password entered when you installed the appliance.  SU to “root” and enter the “root” password.  This gives you access to the appliance, which has a limited set of Linux commands.

You’ll want to enter the HCX Central CLI (CCLI) to use the HCX commands.  Since you’re already logged in as “root,” you just type “ccli” at the command prompt.  After you’re in the CCLI, you can type “help” to get a list of commands.

One of the first tests to run would be the Health Checker. Type “hc” at the command prompt, and the HCX manager will run through a series of tests to check on the health of the environment.

“list” will give you a list of the HCX appliances that have been deployed.

You’ll want to connect to an appliance to run the commands specific to that appliance.  As shown above, if you want to connect to the Interconnect appliance, you would type “go 0,” which would connect you to node 0.  From here, you can run a ton of commands, such as “show ipsec status,” which will show a plethora of information related to the tunnel.  Type “q” to exit this command.

You can also run the Health Check on this node from here, export a support bundle (under the debug command), and a multitude of other “show” commands.  Under the “show” command, you can get firewall information, flow runtime information, and a lot of other useful troubleshooting info.

If you need to actually get on the node and run Linux commands for troubleshooting, you’ll enter “debug remoteaccess enable,” which enables SSH on the remote node.  Then you can just type “ssh” and it will connect you to the interconnect node.

Have questions about this process? Contact us or leave a reply.

-Michael Moore, Associate Cloud Consultant

Automating Windows Server Builds with Packer

[et_pb_section fb_built=”1″ _builder_version=”3.22.3″][et_pb_row _builder_version=”3.22.3″ background_size=”initial” background_position=”top_left” background_repeat=”repeat”][et_pb_column type=”4_4″ _builder_version=”3.0.47″][et_pb_text _builder_version=”3.22.7″ background_size=”initial” background_position=”top_left” background_repeat=”repeat”]

It might not be found on a curated list of urban legends, but trust me, IT IS possible (!!!) to fully automate the building and configuration of Windows virtual machines for AWS.  While Windows Server may not be your first choice of operating systems for immutable infrastructure, sometimes you as the IT professional are not given the choice due to legacy environments, limited resources for re-platforming, corporate edicts, or lack of internal *nix knowledge.

Complain all you like, but after the arguing, fretting, crying, and gnashing of teeth is done, at some point you will still need to build an automated deployment pipeline that requires Windows Server 2016.  With some PowerShell scripting and HashiCorp Packer, it is relatively easy and painless to securely build and configure Windows AMIs for your particular environment regardless of environment variables.

Configuration Options

Let’s dig into an example of how to build a custom configured Windows Server 2016 AMI.  You will need access to an AWS account and have sufficient permissions to create and manage EC2 instances.  If you need an AWS account, you can create one for free.

I am using VS Code and built-in terminal with Windows Subsystem for Linux to create the Packer template and run Packer, however Packer is available for several many Linux distros, Mac OS, and Windows.

First, download and unzip Packer:

/*
~/packer-demo:\> wget https://releases.hashicorp.com/packer/1.3.4/packer_1.3.4_linux_amd64.zip 
https://releases.hashicorp.com/packer/1.3.4/packer_1.3.4_linux_amd64.zip 
Resolving releases.hashicorp.com (releases.hashicorp.com)... 151.101.1.183, 151.101.65.183, 151.101.129.183, ... 
Connecting to releases.hashicorp.com (releases.hashicorp.com)|151.101.1.183|:443... connected. 
HTTP request sent, awaiting response... 200 OK 
Length: 28851840 (28M) [application/zip] 
Saving to: ‘packer_1.3.4_linux_amd64.zip’ 
‘packer_1.3.4_linux_amd64.zip’ saved [28851840/28851840] 
~/packer-demo:\> unzip packer_1.3.4_linux_amd64.zip 
Archive:  packer_1.3.4_linux_amd64.zip   
 inflating: packer
*/

/* ~/packer-demo:\> wget https://releases.hashicorp.com/packer/1.3.4/packer_1.3.4_linux_amd64.zip https://releases.hashicorp.com/packer/1.3.4/packer_1.3.4_linux_amd64.zip Resolving releases.hashicorp.com (releases.hashicorp.com)... 151.101.1.183, 151.101.65.183, 151.101.129.183, ... Connecting to releases.hashicorp.com (releases.hashicorp.com)|151.101.1.183|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 28851840 (28M) [application/zip] Saving to: ‘packer_1.3.4_linux_amd64.zip’ ‘packer_1.3.4_linux_amd64.zip’ saved [28851840/28851840] ~/packer-demo:\> unzip packer_1.3.4_linux_amd64.zip Archive:  packer_1.3.4_linux_amd64.zip   inflating: packer */

Now that we have Packer unzipped, verify that it is executable by checking the version:

/*
~/packer-demo:\> packer --version
1.3.3
*/

/* ~/packer-demo:\> packer --version 1.3.3 */

Packer can build machine images for a number of different platforms.  We will focus on the amazon-ebs builder, which will create an EBS-backed AMI.  At a high level, Packer performs these steps:

  1. Read configuration settings from a json file
  2. Uses the AWS API to stand up an EC2 instance
  3. Connect to the instance and provision it
  4. Shut down and snapshot the instance
  5. Create an Amazon Machine Image (AMI) from the snapshot
  6. Clean up the mess

The amazon-ebs builder can create temporary keypairs, security group rules, and establish a basic communicator for provisioning.  However, in the interest of tighter security and control, we will want to be prescriptive for some of these settings and use a secure communicator to reduce the risk of eavesdropping while we provision the machine.

There are two communicators Packer uses to upload scripts and files to a virtual machine: SSH (the default) and WinRM.  While there is a nice Win32 port of OpenSSH for Windows, it is not currently installed by default on Windows machines, but WinRM is available natively in all current versions of Windows, so we will use that to provision our Windows Server 2016 machine.

Let’s create and edit the userdata file that Packer will use to bootstrap the EC2 instance:

/*
 
# USERDATA SCRIPT FOR AMAZON SOURCE WINDOWS SERVER AMIS
# BOOTSTRAPS WINRM VIA SSL
 
Set-ExecutionPolicy -ExecutionPolicy Unrestricted -Scope LocalMachine -Force -ErrorAction Ignore
$ErrorActionPreference = "stop"
 
# Remove any existing Windows Management listeners
Remove-Item -Path WSMan:\Localhost\listener\listener* -Recurse
 
# Create self-signed cert for encrypted WinRM on port 5986
$Cert = New-SelfSignedCertificate -CertstoreLocation Cert:\LocalMachine\My -DnsName "packer-ami-builder"
New-Item -Path WSMan:\LocalHost\Listener -Transport HTTPS -Address * -CertificateThumbPrint $Cert.Thumbprint -Force
 
# Configure WinRM
cmd.exe /c winrm quickconfig -q
cmd.exe /c winrm set "winrm/config" '@{MaxTimeoutms="1800000"}'
cmd.exe /c winrm set "winrm/config/winrs" '@{MaxMemoryPerShellMB="1024"}'
cmd.exe /c winrm set "winrm/config/service" '@{AllowUnencrypted="false"}'
cmd.exe /c winrm set "winrm/config/client" '@{AllowUnencrypted="false"}'
cmd.exe /c winrm set "winrm/config/service/auth" '@{Basic="true"}'
cmd.exe /c winrm set "winrm/config/client/auth" '@{Basic="true"}'
cmd.exe /c winrm set "winrm/config/service/auth" '@{CredSSP="true"}'
cmd.exe /c winrm set "winrm/config/listener?Address=*+Transport=HTTPS" "@{Port=`"5986`";Hostname=`"packer-ami-builder`";CertificateThumbprint=`"$($Cert.Thumbprint)`"}"
cmd.exe /c netsh advfirewall firewall add rule name="WinRM-SSL (5986)" dir=in action=allow protocol=TCP localport=5986
cmd.exe /c net stop winrm
cmd.exe /c sc config winrm start= auto
cmd.exe /c net start winrm
 
*/

/* # USERDATA SCRIPT FOR AMAZON SOURCE WINDOWS SERVER AMIS # BOOTSTRAPS WINRM VIA SSL Set-ExecutionPolicy -ExecutionPolicy Unrestricted -Scope LocalMachine -Force -ErrorAction Ignore $ErrorActionPreference = "stop" # Remove any existing Windows Management listeners Remove-Item -Path WSMan:\Localhost\listener\listener* -Recurse # Create self-signed cert for encrypted WinRM on port 5986 $Cert = New-SelfSignedCertificate -CertstoreLocation Cert:\LocalMachine\My -DnsName "packer-ami-builder" New-Item -Path WSMan:\LocalHost\Listener -Transport HTTPS -Address * -CertificateThumbPrint $Cert.Thumbprint -Force # Configure WinRM cmd.exe /c winrm quickconfig -q cmd.exe /c winrm set "winrm/config" '@{MaxTimeoutms="1800000"}' cmd.exe /c winrm set "winrm/config/winrs" '@{MaxMemoryPerShellMB="1024"}' cmd.exe /c winrm set "winrm/config/service" '@{AllowUnencrypted="false"}' cmd.exe /c winrm set "winrm/config/client" '@{AllowUnencrypted="false"}' cmd.exe /c winrm set "winrm/config/service/auth" '@{Basic="true"}' cmd.exe /c winrm set "winrm/config/client/auth" '@{Basic="true"}' cmd.exe /c winrm set "winrm/config/service/auth" '@{CredSSP="true"}' cmd.exe /c winrm set "winrm/config/listener?Address=*+Transport=HTTPS" "@{Port=`"5986`";Hostname=`"packer-ami-builder`";CertificateThumbprint=`"$($Cert.Thumbprint)`"}" cmd.exe /c netsh advfirewall firewall add rule name="WinRM-SSL (5986)" dir=in action=allow protocol=TCP localport=5986 cmd.exe /c net stop winrm cmd.exe /c sc config winrm start= auto cmd.exe /c net start winrm */

There are four main things going on here:

  1. Set the execution policy and error handling for the script (if an error is encountered, the script terminates immediately)
  2. Clear out any existing WS Management listeners, just in case there are any preconfigured insecure listeners
  3. Create a self-signed certificate for encrypting the WinRM communication channel, and then bind it to a WS Management listener
  4. Configure WinRM to:
    • Require an encrypted (SSL) channel
    • Enable basic authentication (usually this not secure as the password goes across in plain text, but we are forcing encryption)
    • Configure the listener to use port 5986 with the self-signed certificate we created earlier
    • Add a firewall rule to open port 5986

Now that we have userdata to bootstrap WinRM, let’s create a Packer template:

/*
~/packer-demo:\> touch windows2016.json
*/

/* ~/packer-demo:\> touch windows2016.json */

Open this file with your favorite text editor and add this text:

/*
{
    "variables": {
        "build_version": "{{isotime \"2006.01.02.150405\"}}",
        "aws_profile": null,
        "vpc_id": null,
        "subnet_id": null,
        "security_group_id": null
    },
    "builders": [
        {
            "type": "amazon-ebs",
            "region": "us-west-2",
            "profile": "{{user `aws_profile`}}",
            "vpc_id": "{{user `vpc_id`}}",
            "subnet_id": "{{user `subnet_id`}}",
            "security_group_id": "{{user `security_group_id`}}",
            "source_ami_filter": {
                "filters": {
                    "name": "Windows_Server-2016-English-Full-Base-*",
                    "root-device-type": "ebs",
                    "virtualization-type": "hvm"
                },
                "most_recent": true,
                "owners": [
                    "801119661308"
                ]
            },
            "ami_name": "WIN2016-CUSTOM-{{user `build_version`}}",
            "instance_type": "t3.xlarge",
            "user_data_file": "userdata.ps1",
            "associate_public_ip_address": true,
            "communicator": "winrm",
            "winrm_username": "Administrator",
            "winrm_port": 5986,
            "winrm_timeout": "15m",
            "winrm_use_ssl": true,
            "winrm_insecure": true
        }
    ]
}
*/

/* { "variables": { "build_version": "{{isotime \"2006.01.02.150405\"}}", "aws_profile": null, "vpc_id": null, "subnet_id": null, "security_group_id": null }, "builders": [ { "type": "amazon-ebs", "region": "us-west-2", "profile": "{{user `aws_profile`}}", "vpc_id": "{{user `vpc_id`}}", "subnet_id": "{{user `subnet_id`}}", "security_group_id": "{{user `security_group_id`}}", "source_ami_filter": { "filters": { "name": "Windows_Server-2016-English-Full-Base-*", "root-device-type": "ebs", "virtualization-type": "hvm" }, "most_recent": true, "owners": [ "801119661308" ] }, "ami_name": "WIN2016-CUSTOM-{{user `build_version`}}", "instance_type": "t3.xlarge", "user_data_file": "userdata.ps1", "associate_public_ip_address": true, "communicator": "winrm", "winrm_username": "Administrator", "winrm_port": 5986, "winrm_timeout": "15m", "winrm_use_ssl": true, "winrm_insecure": true } ] } */

Couple of things to call out here.  First, the variables block at the top references some values that are needed in the template.  Insert values here that are specific to your AWS account and VPC:

  • aws_profile: I use a local credentials file to store IAM user credentials (the file is shared between WSL and Windows). Specify the name of a credential block that Packer can use to connect to your account.  The IAM user will need permissions to create and modify EC2 instances, at a minimum
  • vpc_id: Packer will stand up the instance in this VPC
  • aws_region: Your VPC should be in this region. As an exercise, change this value to be set by a variable instead
  • user_data_file: We created this file earlier, remember? If you saved it in another location, make sure the path is correct
  • subnet_id: This should belong to the VPC specified above. Use a public subnet if needed if you do not have a Direct Connect
  • security_group_id: This security group should belong to the VPC specified above. This security group will be attached to the instance that Packer stands up.  It will need, at a minimum, inbound TCP 5986 from where Packer is running

Next, let’s validate the template to make sure the syntax is correct, and we have all required fields:

/*
~/packer-demo:\> packer validate windows2016.json
Template validation failed. Errors are shown below.
Errors validating build 'amazon-ebs'. 1 error(s) occurred:
* An instance_type must be specified

/* ~/packer-demo:\> packer validate windows2016.json Template validation failed. Errors are shown below. Errors validating build 'amazon-ebs'. 1 error(s) occurred: * An instance_type must be specified

Whoops, we missed instance_type.  Add that to the template and specify a valid EC2 instance type.  I like using beefier instance types so that the builds get done quicker, but that’s just me.

            "ami_name": "WIN2016-CUSTOM-{{user `build_version`}}",
            "instance_type": "t3.xlarge",
            "user_data_file": "userdata.ps1",
*/

"ami_name": "WIN2016-CUSTOM-{{user `build_version`}}", "instance_type": "t3.xlarge", "user_data_file": "userdata.ps1", */

Now validate again:

/*
~/packer-demo:\> packer validate windows2016.json
Template validated successfully.
*/

/* ~/packer-demo:\> packer validate windows2016.json Template validated successfully. */

Awesome.  Couple of things to call out in the template:

  • source_ami_filter: we are using a base Amazon AMI for Windows Server 2016 server. Note the wildcard in the AMI name (Windows_Server-2016-English-Full-Base-*) and the owner (801119661308).  This filter will always pull the most recent match for the AMI name.  Amazon updates its AMIs about once a month
  • associate_public_ip_address: I specify true because I don’t have Direct Connect or VPN from my workstation to my sandbox VPC. If you are building your AMI in a public subnet and want to connect to it over the internet, do the same
  • ami_name: this is a required property, and it must also be unique within your account. We use a special function here (isotime) to generate a timestamp, ensuring that the AMI name will always be unique (e.g. WIN2016-CUSTOM-2019.03.01.000042)
  • winrm_port: the default unencrypted port for WinRM is 5985. We disabled 5985 in userdata and enabled 5986, so be sure to call it out here
  • winrm_use_ssl: as noted above, we are encrypting communication, so set this to true
  • winrm_insecure: this is a rather misleading property name. It really means that Packer should not check if encryption certificate is trusted.  We are using a self-signed certificate in userdata, so set this to true to skip certificate validation

Now that we have our template, let’s inspect it to see what will happen:

/*
~/packer-demo:\> packer inspect windows2016.json
Optional variables and their defaults:
  aws_profile       = default
  build_version     = {{isotime "2006.01.02.150405"}}
  security_group_id = sg-0e1ca9ba69b39926
  subnet_id         = subnet-00ef2a1df99f20c23
  vpc_id            = vpc-00ede10ag029c31e0
Builders:
  amazon-ebs
Provisioners:
 
Note: If your build names contain user variables or template
functions such as 'timestamp', these are processed at build time,
and therefore only show in their raw form here.
*/

/* ~/packer-demo:\> packer inspect windows2016.json Optional variables and their defaults: aws_profile = default build_version = {{isotime "2006.01.02.150405"}} security_group_id = sg-0e1ca9ba69b39926 subnet_id = subnet-00ef2a1df99f20c23 vpc_id = vpc-00ede10ag029c31e0 Builders: amazon-ebs Provisioners: Note: If your build names contain user variables or template functions such as 'timestamp', these are processed at build time, and therefore only show in their raw form here. */

Looks good, but we are missing a provisioner!  If we ran this template as is, all that would happen is Packer would make a copy of the base Windows Server 2016 AMI and make you the owner of the new private image.

There are several different Packer provisioners, ranging from very simple (the windows-restart provisioner just reboots the machine) to complex (the chef client provisioner installs chef client and does whatever Chef does from there).  We’ll run a basic powershell provisioner to install IIS on the instance, reboot the instance using windows-restart, and then we’ll finish up the provisioning by executing sysprep on the instance to generalize it for re-use.

We will use the PowerShell cmdlet Enable-WindowsOptionalFeature to install the web server role and IIS with defaults, add this text to your template below the builders [ ] section:

/*
"provisioners": [
        {
            "type": "powershell",
            "inline": [
                "Enable-WindowsOptionalFeature -Online -FeatureName IIS-WebServerRole",
                "Enable-WindowsOptionalFeature -Online -FeatureName IIS-WebServer"
            ]
        },
        {
            "type": "windows-restart",
            "restart_check_command": "powershell -command \"& {Write-Output 'Machine restarted.'}\""
        },
        {
            "type": "powershell",
            "inline": [
                "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\InitializeInstance.ps1 -Schedule",
                "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\SysprepInstance.ps1 -NoShutdown"
            ]
        }
    ]
*/

/* "provisioners": [ { "type": "powershell", "inline": [ "Enable-WindowsOptionalFeature -Online -FeatureName IIS-WebServerRole", "Enable-WindowsOptionalFeature -Online -FeatureName IIS-WebServer" ] }, { "type": "windows-restart", "restart_check_command": "powershell -command \"& {Write-Output 'Machine restarted.'}\"" }, { "type": "powershell", "inline": [ "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\InitializeInstance.ps1 -Schedule", "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\SysprepInstance.ps1 -NoShutdown" ] } ] */

Couple things to call out about this section:

  • The first powershell provisioner uses the inline Packer simply appends these lines in order into a file, then transfers and executes the file on the instance using PowerShell
  • The second provisioner, windows-restart, simply reboots the machine while Packer waits. While this isn’t always necessary, it is helpful to catch instances where settings do not persist after a reboot, which was probably not your intention
  • The final powershell provisioner executes two PowerShell scripts that are present on Amazon Windows Server 2016 AMIs and part of the EC2Launch application (earlier versions of Windows use a different application called EC2Config). They are helper scripts that you can use to prepare the machine for generalization, and then execute sysprep

After validating your template again, let’s build the AMI!

/*
~/packer-demo:\> packer validate windows2016.json
Template validated successfully.
*/
/*
~/packer-demo:\> packer build windows2016.json
amazon-ebs output will be in this color.
==> amazon-ebs: Prevalidating AMI Name: WIN2016-CUSTOM-2019.03.01.000042
    amazon-ebs: Found Image ID: ami-0af80d239cc063c12
==> amazon-ebs: Creating temporary keypair: packer_5c78762a-751e-2cd7-b5ce-9eabb577e4cc
==> amazon-ebs: Launching a source AWS instance...
==> amazon-ebs: Adding tags to source instance
    amazon-ebs: Adding tag: "Name": "Packer Builder"
    amazon-ebs: Instance ID: i-0384e1edca5dc90e5
==> amazon-ebs: Waiting for instance (i-0384e1edca5dc90e5) to become ready...
==> amazon-ebs: Waiting for auto-generated password for instance...
    amazon-ebs: It is normal for this process to take up to 15 minutes,
    amazon-ebs: but it usually takes around 5. Please wait.
    amazon-ebs:  
    amazon-ebs: Password retrieved!
==> amazon-ebs: Using winrm communicator to connect: 34.220.235.82
==> amazon-ebs: Waiting for WinRM to become available...
    amazon-ebs: WinRM connected.
    amazon-ebs: #> CLIXML
    amazon-ebs: System.Management.Automation.PSCustomObjectSystem.Object1Preparing modules for first 
use.0-1-1Completed-1 1Preparing modules for first use.0-1-1Completed-1 
==> amazon-ebs: Connected to WinRM!
==> amazon-ebs: Provisioning with Powershell...
==> amazon-ebs: Provisioning with powershell script: /tmp/packer-powershell-provisioner561623209
    amazon-ebs:
    amazon-ebs:
    amazon-ebs: Path          :
    amazon-ebs: Online        : True
    amazon-ebs: RestartNeeded : False
    amazon-ebs:
    amazon-ebs: Path          :
    amazon-ebs: Online        : True
    amazon-ebs: RestartNeeded : False
    amazon-ebs:
==> amazon-ebs: Restarting Machine
==> amazon-ebs: Waiting for machine to restart...
    amazon-ebs: Machine restarted.
    amazon-ebs: EC2AMAZ-LJV703F restarted.
    amazon-ebs: #> CLIXML
    amazon-ebs: System.Management.Automation.PSCustomObjectSystem.Object1Preparing modules for first 
use.0-1-1Completed-1 
==> amazon-ebs: Machine successfully restarted, moving on
==> amazon-ebs: Provisioning with Powershell...
==> amazon-ebs: Provisioning with powershell script: /tmp/packer-powershell-provisioner928106343
    amazon-ebs:
    amazon-ebs: TaskPath                                       TaskName                          State
    amazon-ebs: --------                                       --------                          -----
    amazon-ebs: \                                              Amazon Ec2 Launch - Instance I... Ready
==> amazon-ebs: Stopping the source instance...
    amazon-ebs: Stopping instance, attempt 1
==> amazon-ebs: Waiting for the instance to stop...
==> amazon-ebs: Creating unencrypted AMI WIN2016-CUSTOM-2019.03.01.000042 from instance i-0384e1edca5dc90e5
    amazon-ebs: AMI: ami-0d6026ecb955cc1d6
==> amazon-ebs: Waiting for AMI to become ready...
==> amazon-ebs: Terminating the source AWS instance...
==> amazon-ebs: Cleaning up any extra volumes...
==> amazon-ebs: No volumes to clean up, skipping
==> amazon-ebs: Deleting temporary keypair...
Build 'amazon-ebs' finished.
 
==> Builds finished. The artifacts of successful builds are:
==> amazon-ebs: AMIs were created:
us-west-2: ami-0d6026ecb955cc1d6
*/

/* ~/packer-demo:\> packer validate windows2016.json Template validated successfully. */ /* ~/packer-demo:\> packer build windows2016.json amazon-ebs output will be in this color. ==> amazon-ebs: Prevalidating AMI Name: WIN2016-CUSTOM-2019.03.01.000042 amazon-ebs: Found Image ID: ami-0af80d239cc063c12 ==> amazon-ebs: Creating temporary keypair: packer_5c78762a-751e-2cd7-b5ce-9eabb577e4cc ==> amazon-ebs: Launching a source AWS instance... ==> amazon-ebs: Adding tags to source instance amazon-ebs: Adding tag: "Name": "Packer Builder" amazon-ebs: Instance ID: i-0384e1edca5dc90e5 ==> amazon-ebs: Waiting for instance (i-0384e1edca5dc90e5) to become ready... ==> amazon-ebs: Waiting for auto-generated password for instance... amazon-ebs: It is normal for this process to take up to 15 minutes, amazon-ebs: but it usually takes around 5. Please wait. amazon-ebs: amazon-ebs: Password retrieved! ==> amazon-ebs: Using winrm communicator to connect: 34.220.235.82 ==> amazon-ebs: Waiting for WinRM to become available... amazon-ebs: WinRM connected. amazon-ebs: #> CLIXML amazon-ebs: System.Management.Automation.PSCustomObjectSystem.Object1Preparing modules for first use.0-1-1Completed-1 1Preparing modules for first use.0-1-1Completed-1 ==> amazon-ebs: Connected to WinRM! ==> amazon-ebs: Provisioning with Powershell... ==> amazon-ebs: Provisioning with powershell script: /tmp/packer-powershell-provisioner561623209 amazon-ebs: amazon-ebs: amazon-ebs: Path : amazon-ebs: Online : True amazon-ebs: RestartNeeded : False amazon-ebs: amazon-ebs: Path : amazon-ebs: Online : True amazon-ebs: RestartNeeded : False amazon-ebs: ==> amazon-ebs: Restarting Machine ==> amazon-ebs: Waiting for machine to restart... amazon-ebs: Machine restarted. amazon-ebs: EC2AMAZ-LJV703F restarted. amazon-ebs: #> CLIXML amazon-ebs: System.Management.Automation.PSCustomObjectSystem.Object1Preparing modules for first use.0-1-1Completed-1 ==> amazon-ebs: Machine successfully restarted, moving on ==> amazon-ebs: Provisioning with Powershell... ==> amazon-ebs: Provisioning with powershell script: /tmp/packer-powershell-provisioner928106343 amazon-ebs: amazon-ebs: TaskPath TaskName State amazon-ebs: -------- -------- ----- amazon-ebs: \ Amazon Ec2 Launch - Instance I... Ready ==> amazon-ebs: Stopping the source instance... amazon-ebs: Stopping instance, attempt 1 ==> amazon-ebs: Waiting for the instance to stop... ==> amazon-ebs: Creating unencrypted AMI WIN2016-CUSTOM-2019.03.01.000042 from instance i-0384e1edca5dc90e5 amazon-ebs: AMI: ami-0d6026ecb955cc1d6 ==> amazon-ebs: Waiting for AMI to become ready... ==> amazon-ebs: Terminating the source AWS instance... ==> amazon-ebs: Cleaning up any extra volumes... ==> amazon-ebs: No volumes to clean up, skipping ==> amazon-ebs: Deleting temporary keypair... Build 'amazon-ebs' finished. ==> Builds finished. The artifacts of successful builds are: ==> amazon-ebs: AMIs were created: us-west-2: ami-0d6026ecb955cc1d6 */

The entire process took approximately 5 minutes from start to finish.  If you inspect the output, you can see where the first PowerShell provisioner installed IIS (Provisioning with powershell script: /tmp/packer-powershell-provisioner561623209), where sysprep was executed (Provisioning with powershell script: /tmp/packer-powershell-provisioner928106343) and where Packer yielded the AMI ID for the image (ami-0d6026ecb955cc1d6) and cleaned up any stray artifacts.

Note: most of the issues that occur during a Packer build are related to firewalls and security groups.  Verify that the machine that is running Packer can reach the VPC and EC2 instance using port 5986.  You can also use packer build -debug win2016.json to step through the build process manually, and you can even connect to the machine via RDP to troubleshoot a provisioner.

Now, if you launch a new EC2 instance using this AMI, it will already have IIS installed and configured with defaults.  Building and securely configuring Windows AMIs with Packer is easy!

For help getting started securely building and configuring Windows AMIs for your particular environment, contact us.

-Jonathan Eropkin, Cloud Consultant

[/et_pb_text][/et_pb_column][/et_pb_row][/et_pb_section]

How to use waiters in boto3 and how to write your own

[et_pb_section admin_label=”section”]
[et_pb_row admin_label=”row”]
[et_pb_column type=”4_4″][et_pb_text admin_label=”Text”]

What is Boto3?

Boto3 is the python SDK for interacting with the AWS api. Boto3 makes it easy to use the python programming language to manipulate AWS resources and automation infrastructure.

What are Boto3 Waiters and How Do I Use Them?

A number of requests in AWS using boto3 are not instant. Common examples of boto3 requests are deploying a new server or RDS instance. For some long running requests, we are ok to initiate the request and then check for completion at some later time. But in many cases, we want to wait for the request to complete before we move on to the subsequent parts of the script that may rely on a long running process to have been completed. One example would be a script that might copy an AMI to another account by sharing all the snapshots. After sharing the snapshots to the other account, you would need to wait for the local snapshot copies to complete before registering the AMI in the receiving account. Luckily a snapshot completed waiter already exists, and here’s what that waiter would look like in Python:

As far as the default configuration for the waiters and how long they wait, you can view the information in the boto3 docs on waiters, but it’s 600 seconds in most cases. Each one is configurable to be as short or long as you’d like.

Writing your Own Custom Waiters

As you can see, using boto3 waiters is an easy way to setup a loop that will wait for completion without having to write the code yourself. But how do you find out if a specific waiter exists? The easiest way is to explore the particular boto3 client on the docs page and check out the list of waiters at the bottom. Let’s walk through the anatomy of a boto3 waiter. The waiter is actually instantiated in botocore and then abstracted to boto3. So looking at the code there we can derive what’s needed to generate our own waiter:

  1. Waiter Name
  2. Waiter Config
  3. Delay
  4. Max Attempts
  5. Operation
  6. Acceptors
  7. Waiter Model

The first step is to name your custom waiter. You’ll want it to be something descriptive and, in our example, it will be “CertificateIssued”. This will be a waiter that waits for an ACM certificate to be issued (Note there is already a CertificateValidated waiter, but this is only to showcase the creation of the waiter). Next we pick out the configuration for the waiter which boils down to 4 parts. Delay is the amount of time it will take between tests in seconds. Max Attempts is how many attempts it will try before it fails. Operation is the boto3 client operation that you’re using to get the result your testing. In our example, we’re calling “DescribeCertificate”. Acceptors is how we test the result of the Operation call. Acceptors are probably the most complicated portion of the configuration. They determine how to match the response and what result to return. Acceptors have 4 parts: Matcher, Expected, Argument, State.

  • State: This is what the acceptor will return based on the result of the matcher function.
  • Expected: This is the expected response that you want from the matcher to return this result.
  • Argument: This is the argument sent to the matcher function to determine if the result is expected.
  • Matcher: Matchers come in 5 flavors. Path, PathAll, PathAny, Status, and Error. The Status and Error matchers will effectively check the status of the HTTP response and check for an error respectively. They return failure states and short circuit the waiter so you don’t have to wait until the end of the time period when the command has already failed. The Path matcher will match the Argument to a single expected result. In our example, if you run DescribeCertificate you would get back a “Certificate.Status” as a result. Taking that as an argument, the desired expected result would be “ISSUED”. Notice that if the expected result is “PENDING_VALIDATION” we set the state to “retry” so it will continue to keep trying for the result we want. The PathAny/PathAll matchers work with operations that return a python list result. PathAny will match if any item in the list matches, PathAll will match if all the items in the list match.

 

Boto3 Example

Once we have the configuration complete, we feed this into the Waiter Model call the “create_waiter_with_client” request. Now our custom waiter is ready to wait. If you need more examples of waiters and how they are configured, check out the botocore github and poke through the various services. If they have waiters configured, they will be in a file called waiters-2.json. Here’s the finished code for our customer waiter.

And that’s it. Custom waiters will allow you to create automation in a series without having to build redundant code of complicated loops. Have questions about writing custom waiters or boto3? Contact us for more information

-Coin Graham, Principal Cloud Consultant[/et_pb_text][/et_pb_column]
[/et_pb_row]
[/et_pb_section]

Using Docker Containers to Move Your Internal IT Orgs Forward

Many people are looking to take advantage of containers to isolate their workloads on a single system. Unlike traditional hypervisor-based virtualization, which utilizes the same operating system and packages, Containers allow you to segment off multiple applications with their own set of processes on the same instance.

Let’s walk through some grievances that many of us have faced at one time or another in our IT organizations:

Say, for example, your development team is setting up a web application. They want to set up a traditional 3 tier system with an app, database, and web servers. They notice there is a lot of support in the open source community for their app when it is run on Ubuntu Trusty (Ubuntu 14.04 LTS) and later. They’ve developed the app in their local sandbox with an Ubuntu image they downloaded, however, their company is a RedHat shop.

Now, depending on the type of environment you’re in, chances are you’ll have to wait for the admins to provision an environment for you. This often entails (but is not limited to) spinning up an instance, reviewing the most stable version of the OS, creating a new hardened AMI, adding it to Packer, figuring out which configs to manage, and refactoring provisioning scripts to utilize aptitude and Ubuntu’s directory structure (e.g Debian has over 50K packages to choose from and manage). In addition to that, the most stable version of Ubuntu is missing some newer packages that you’ve tested in your sandbox that need to be pulled from source or another repository. At this point, the developers are procuring configuration runbooks to support the app while the admin gets up to speed with the OS (not significant but time-consuming nonetheless).

You can see my point here. A significant amount of overhead has been introduced, and it’s stagnating development. And think about the poor sysadmins. They have other environments that they need to secure, networking spaces to manage, operations to improve, and existing production stacks they have to monitor and support while getting bogged down supporting this app that is still in the very early stages of development. This could mean that mission-critical apps are potentially losing visibility and application modernization is stagnating. Nobody wins in this scenario.

Now let us revisit the same scenario with containers:

I was able to run my Jenkins build server and an NGINX web proxy, both running on a hardened CentOS7 AMI provided by the Systems Engineers with docker installed.  From there I executed a docker pull  command pointed at our local repository and deployed two docker images with Debian as the underlying OS.

$ docker pull my.docker-repo.com:4443/jenkins
$ docker pull my.docker-repo.com:4443/nginx

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

$ docker ps

7478020aef37 my.docker-repo.com:4443/jenkins/jenkins:lts   “/sbin/tini — /us …”  16 minutes ago   Up 16 minutes ago  8080/tcp, 0.0.0.0:80->80/tcp, 50000/tcp jenkins

d68e3b96071e my.docker-repo.com:4443/nginx/nginx:lts “nginx -g ‘daemon of…” 16 minutes ago Up 16 minutes 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp nginx

$ sudo systemctl status jenkins-docker

jenkins-docker.service – Jenkins
Loaded: loaded (/etc/systemd/system/jenkins-docker.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2018-11-08 17:38:06 UTC; 18min ago
Process: 2006 ExecStop=/usr/local/bin/jenkins-docker stop (code=exited, status=0/SUCCESS)

The processes above were executed on the actual instance. Note how I’m able to execute a cat of the OS release file from within the container

sudo docker exec d68e3b96071e cat /etc/os-release
PRETTY_NAME=”Debian GNU/Linux 9 (stretch)”
NAME=”Debian GNU/Linux”
VERSION_ID=”9″
VERSION=”9 (stretch)”
ID=debian
HOME_URL=”https://www.debian.org/
SUPPORT_URL=”https://www.debian.org/support
BUG_REPORT_URL=”https://bugs.debian.org/

I was able to do so because Docker containers do not have their own kernel, but rather share the kernel of the underlying host via linux system calls (e.g setuid, stat, umount, ls) like any other application. These system calls (or syscalls for short) are standard across kernels, and Docker supports version 3.10 and higher. In the event older syscalls are deprecated and replaced with new ones, you can update the kernel of the underlying host, which can be done independently of an OS upgrade. As far as containers go, the binaries and aptitude management tools are the same as if you installed Ubuntu on an EC2 instance (or VM).

Q: But I’m running a windows environment. Those OS’s don’t have a kernel. 

Yes, developers may want to remove cost overhead associated with Windows licenses by exploring running their apps on Linux OS. Others may simply want to modernize their .NET applications by testing out the latest versions on Containers. Docker allows you to run Linux VM’s on Windows 10 and Server 2016. As docker was written to initially execute on Linux distributions, in order to take advantage of multitenant hosting, you will have to run Hyper-V containers, which provision a thin VM on top of your hosts. You can then manage your mixed environment of Windows and Linux hosts via the –isolate option. More information can be found in the Microsoft and Docker documentation.

Conclusion:

IT teams need to be able to help drive the business forward. Newer technologies and security patches are procured on a daily basis. Developers need to be able to freely work on modernizing their code and applications. Concurrently, Operations needs to be able to support and enhance the pipelines and platforms that get the code out faster and securely. Leveraging Docker containers in conjunction with these pipelines further helps to ensure these are both occurring in parallel without the unnecessary overhead. This allows teams to work independently in the early stages of the development cycle and yet more collaboratively to get the releases out the door.

For help getting started leveraging your environment to take advantage of containerization, contact us.

-Sabine Blair, Systems Engineer & Cloud Consultant

Fully Coded And Automated CI/CD Pipelines: The Weeds

The Why

In my last post we went over why we’d want to go the CI/CD/Automated route and the more cultural reasons of why it is so beneficial. In this post, we’re going to delve a little bit deeper and examine the technical side of tooling. Remember, a primary point of doing a release is mitigating risk. CI/CD is all about mitigating risk… fast.

There’s a Process

The previous article noted that you can’t do CI/CD without building on a set of steps, and I’m going to take this approach here as well. Unsurprisingly, we’ll follow the steps we laid out in the “Why” article, and tackle each in turn.

Step I: Automated Testing

You must automate your testing. There is no other way to describe this. In this particular step however, we can concentrate on unit testing: Testing the small chunks of code you produce (usually functions or methods). There’s some chatter about TDD (Test Driven Development) vs BDD (Behavior Driven Development) in the development community, but I don’t think it really matters, just so long as you are writing test code along side your production code. On our team, we prefer the BDD style testing paradigm. I’ve always liked the symantically descriptive nature of BDD testing over strictly code-driven ones. However, it should be said that both are effective and any is better than none, so this is more of a personal preference. On our team we’ve been coding in golang, and our BDD framework of choice is the Ginkgo/Gomega combo.

Here’s a snippet of one of our tests that’s not entirely simple:

Describe("IsValidFormat", func() {
  for _, check := range AvailableFormats {
    Context("when checking "+check, func() {
      It("should return true", func() {
        Ω(IsValidFormat(check)).To(BeTrue())
      })
    })
  }
 
  Context("when checking foo", func() {
    It("should return false", func() {
      Ω(IsValidFormat("foo")).To(BeFalse())
    })
  })
)

Describe("IsValidFormat", func() { for _, check := range AvailableFormats { Context("when checking "+check, func() { It("should return true", func() { Ω(IsValidFormat(check)).To(BeTrue()) }) }) } Context("when checking foo", func() { It("should return false", func() { Ω(IsValidFormat("foo")).To(BeFalse()) }) }) )

So as you can see, the Ginkgo (ie: BDD) formatting is pretty descriptive about what’s happening. I can instantly understand what’s expected. The function IsValidFormat, should return true given the range (list) of AvailableFormats. A format of foo (which is not a valid format) should return false. It’s both tested and understandable to the future change agent (me or someone else).

Step II: Continuous Integration

Continuous Integration takes Step 1 further, in that it brings all the changes to your codebase to a singular point, and building an object for deployment. This means you’ll need an external system to automatically handle merges / pushes. We use Jenkins as our automation server, running it in Kubernetes using the Pipeline style of job description. I’ll get into the way we do our builds using Make in a bit, but the fact we can include our build code in with our projects is a huge win.

Here’s a (modified) Jenkinsfile we use for one of our CI jobs:

def notifyFailed() {
  slackSend (color: '#FF0000', message: "FAILED: '${env.JOB_NAME} [${env.BUILD_NUMBER}]' (${env.BUILD_URL})")
}
 
podTemplate(
  label: 'fooProject-build',
  containers: [
    containerTemplate(
      name: 'jnlp',
      image: 'some.link.to.a.container:latest',
      args: '${computer.jnlpmac} ${computer.name}',
      alwaysPullImage: true,
    ),
    containerTemplate(
      name: 'image-builder',
      image: 'some.link.to.another.container:latest',
      ttyEnabled: true,
      alwaysPullImage: true,
      command: 'cat'
    ),
  ],
  volumes: [
    hostPathVolume(
      hostPath: '/var/run/docker.sock',
      mountPath: '/var/run/docker.sock'
    ),
    hostPathVolume(
      hostPath: '/home/jenkins/workspace/fooProject',
      mountPath: '/home/jenkins/workspace/fooProject'
    ),
    secretVolume(
      secretName: 'jenkins-creds-for-aws',
      mountPath: '/home/jenkins/.aws-jenkins'
    ),
    hostPathVolume(
      hostPath: '/home/jenkins/.aws',
      mountPath: '/home/jenkins/.aws'
    )
  ]
)
{
  node ('fooProject-build') {
    try {
      checkout scm
 
      wrap([$class: 'AnsiColorBuildWrapper', 'colorMapName': 'XTerm']) {
        container('image-builder'){
          stage('Prep') {
            sh '''
              cp /home/jenkins/.aws-jenkins/config /home/jenkins/.aws/.
              cp /home/jenkins/.aws-jenkins/credentials /home/jenkins/.aws/.
              make get_images
            '''
          }
 
          stage('Unit Test'){
            sh '''
              make test
              make profile
            '''
          }
 
          step([
            $class:              'CoberturaPublisher',
            autoUpdateHealth:    false,
            autoUpdateStability: false,
            coberturaReportFile: 'report.xml',
            failUnhealthy:       false,
            failUnstable:        false,
            maxNumberOfBuilds:   0,
            sourceEncoding:      'ASCII',
            zoomCoverageChart:   false
          ])
 
          stage('Build and Push Container'){
            sh '''
              make push
            '''
          }
        }
      }
 
      stage('Integration'){
        container('image-builder') {
          sh '''
            make deploy_integration
            make toggle_integration_service
          '''
        }
        try {
          wrap([$class: 'AnsiColorBuildWrapper', 'colorMapName': 'XTerm']) {
            container('image-builder') {
              sh '''
                sleep 45
                export KUBE_INTEGRATION=https://fooProject-integration
                export SKIP_TEST_SERVER=true
                make integration
              '''
            }
          }
        } catch(e) {
          container('image-builder'){
            sh '''
              make clean
            '''
          }
          throw(e)
        }
      }
 
      stage('Deploy to Production'){
        container('image-builder') {
          sh '''
            make clean
            make deploy_dev
          '''
        }
      }
    } catch(e) {
      container('image-builder'){
        sh '''
          make clean
        '''
      }
      currentBuild.result = 'FAILED'
      notifyFailed()
      throw(e)
    }
  }
}

def notifyFailed() { slackSend (color: '#FF0000', message: "FAILED: '${env.JOB_NAME} [${env.BUILD_NUMBER}]' (${env.BUILD_URL})") } podTemplate( label: 'fooProject-build', containers: [ containerTemplate( name: 'jnlp', image: 'some.link.to.a.container:latest', args: '${computer.jnlpmac} ${computer.name}', alwaysPullImage: true, ), containerTemplate( name: 'image-builder', image: 'some.link.to.another.container:latest', ttyEnabled: true, alwaysPullImage: true, command: 'cat' ), ], volumes: [ hostPathVolume( hostPath: '/var/run/docker.sock', mountPath: '/var/run/docker.sock' ), hostPathVolume( hostPath: '/home/jenkins/workspace/fooProject', mountPath: '/home/jenkins/workspace/fooProject' ), secretVolume( secretName: 'jenkins-creds-for-aws', mountPath: '/home/jenkins/.aws-jenkins' ), hostPathVolume( hostPath: '/home/jenkins/.aws', mountPath: '/home/jenkins/.aws' ) ] ) { node ('fooProject-build') { try { checkout scm wrap([$class: 'AnsiColorBuildWrapper', 'colorMapName': 'XTerm']) { container('image-builder'){ stage('Prep') { sh ''' cp /home/jenkins/.aws-jenkins/config /home/jenkins/.aws/. cp /home/jenkins/.aws-jenkins/credentials /home/jenkins/.aws/. make get_images ''' } stage('Unit Test'){ sh ''' make test make profile ''' } step([ $class: 'CoberturaPublisher', autoUpdateHealth: false, autoUpdateStability: false, coberturaReportFile: 'report.xml', failUnhealthy: false, failUnstable: false, maxNumberOfBuilds: 0, sourceEncoding: 'ASCII', zoomCoverageChart: false ]) stage('Build and Push Container'){ sh ''' make push ''' } } } stage('Integration'){ container('image-builder') { sh ''' make deploy_integration make toggle_integration_service ''' } try { wrap([$class: 'AnsiColorBuildWrapper', 'colorMapName': 'XTerm']) { container('image-builder') { sh ''' sleep 45 export KUBE_INTEGRATION=https://fooProject-integration export SKIP_TEST_SERVER=true make integration ''' } } } catch(e) { container('image-builder'){ sh ''' make clean ''' } throw(e) } } stage('Deploy to Production'){ container('image-builder') { sh ''' make clean make deploy_dev ''' } } } catch(e) { container('image-builder'){ sh ''' make clean ''' } currentBuild.result = 'FAILED' notifyFailed() throw(e) } } }

There’s a lot going on here, but the important part to notice is that I grabbed this from the project repo. The build instructions are included with the project itself. It’s creating an artifact, running our tests, etc. But it’s all part of our project code base. It’s checked into git. It’s code like all the other code we mess with. The steps are somewhat inconsequential for this level of topic, but it works. We also have it setup to run when there’s a push to github (AND nightly). This ensures that we are continuously running this build and integrating everything that’s happened to the repo in a day. It helps us keep on top of all the possible changes to the repo as well as our environment.

Hey… what’s all that make_ crap?_

Make

Our team uses a lot of tools. We ascribe to the maxim: Use what’s best for the particular situation. I can’t remember every tool we use. Neither can my teammates. Neither can 90% of the people that “do the devops.” I’ve heard a lot of folks say, “No! We must solidify on our toolset!” Let your teams use what they need to get the job done the right way. Now, the fear of experiencing tool “overload” seems like a legitimate one in this scenario, but the problem isn’t the number of tools… it’s how you manage and use use them.

Enter Makefiles! (aka: make)

Make has been a mainstay in the UNIX world for a long time (especially in the C world). It is a build tool that’s utilized to help satisfy dependencies, create system-specific configurations, and compile code from various sources independent of platform. This is fantastic, except, we couldn’t care less about that in the context of our CI/CD Pipelines. We use it because it’s great at running “buildy” commands.

Make is our unifier. It links our Jenkins CI/CD build functionality with our Dev functionality. Specifically, opening up the docker port here in the Jenkinsfile:

volumes: [
  hostPathVolume(
    hostPath: '/var/run/docker.sock',
    mountPath: '/var/run/docker.sock'
  ),

volumes: [ hostPathVolume( hostPath: '/var/run/docker.sock', mountPath: '/var/run/docker.sock' ),

…allows us to run THE SAME COMMANDS WHEN WE’RE DEVELOPING AS WE DO IN OUR CI/CD PROCESS. This socket allows us to run containers from containers, and since Jenkins is running on a container, this allows us to run our toolset containers in Jenkins, using the same commands we’d use in our local dev environment. On our local dev machines, we use docker nearly exclusively as a wrapper to our tools. This ensures we have library, version, and platform consistency on all of our dev environments as well as our build system. We use containers for our prod microservices so production is part of that “chain of consistency” as well. It ensures that we see consistent behavior across the horizon of application development through production. It’s a beautiful thing! We use the Makefile as the means to consistently interface with the docker “tool” across differing environments.

Ok, I know your interest is peaked at this point. (Or at least I really hope it is!)
So here’s a generic makefile we use for many of our projects:

CONTAINER=$(shell basename $$PWD | sed -E 's/^ia-image-//')
.PHONY: install install_exe install_test_exe deploy test
 
install:
    docker pull sweet.path.to.a.repo/$(CONTAINER)
    docker tag sweet.path.to.a.repo/$(CONTAINER):latest $(CONTAINER):latest
 
install_exe:
    if [[ ! -d $(HOME)/bin ]]; then mkdir -p $(HOME)/bin; fi
    echo "docker run -itP -v \$$PWD:/root $(CONTAINER) \"\$$@\"" > $(HOME)/bin/$(CONTAINER)
    chmod u+x $(HOME)/bin/$(CONTAINER)
 
install_test_exe:
    if [[ ! -d $(HOME)/bin ]]; then mkdir -p $(HOME)/bin; fi
    echo "docker run -itP -v \$$PWD:/root $(CONTAINER)-test \"\$$@\"" > $(HOME)/bin/$(CONTAINER)
    chmod u+x $(HOME)/bin/$(CONTAINER)
 
test:
    docker build -t $(CONTAINER)-test .
 
deploy:
    captain push

CONTAINER=$(shell basename $$PWD | sed -E 's/^ia-image-//') .PHONY: install install_exe install_test_exe deploy test install: docker pull sweet.path.to.a.repo/$(CONTAINER) docker tag sweet.path.to.a.repo/$(CONTAINER):latest $(CONTAINER):latest install_exe: if [[ ! -d $(HOME)/bin ]]; then mkdir -p $(HOME)/bin; fi echo "docker run -itP -v \$$PWD:/root $(CONTAINER) \"\$$@\"" > $(HOME)/bin/$(CONTAINER) chmod u+x $(HOME)/bin/$(CONTAINER) install_test_exe: if [[ ! -d $(HOME)/bin ]]; then mkdir -p $(HOME)/bin; fi echo "docker run -itP -v \$$PWD:/root $(CONTAINER)-test \"\$$@\"" > $(HOME)/bin/$(CONTAINER) chmod u+x $(HOME)/bin/$(CONTAINER) test: docker build -t $(CONTAINER)-test . deploy: captain push

This is a Makefile we use to build our tooling images. It’s much simpler than our project Makefiles, but I think this illustrates how you can use Make to wrap EVERYTHING you use in your development workflow. This also allows us to settle on similar/consistent terminology between different projects. %> make test? That’ll run the tests regardless if we are working on a golang project or a python lambda project, or in this case, building a test container, and tagging it as whatever-test. Make unifies “all the things.”

This also codifies how to execute the commands. ie: what arguments to pass, what inputs etc. If I can’t even remember the name of the command, I’m not going to remember the arguments. To remedy, I just open up the Makefile, and I can instantly see.

Step III: Continuous Deployment

After the last post (you read it right?), some might have noticed that I skipped the “Delivery” portion of the “CD” pipeline. As far as I’m concerned, there is no “Delivery” in a “Deployment” pipeline. The “Delivery” is the actual deployment of your artifact. Since the ultimate goal should be Depoloyment, I’ve just skipped over that intermediate step.

Okay, sure, if you want to hold off on deploying automatically to Prod, then have that gate. But Dev, Int, QA, etc? Deployment to those non-prod environments should be automated just like the rest of your code.

If you guessed we use make to deploy our code, you’d be right! We put all our deployment code with the project itself, just like the rest of the code concerning that particular object. For services, we use a Dockerfile that describes the service container and several yaml files (e.g. deployment_<env>.yaml) that describe the configurations (e.g. ingress, services, deployments) we use to configure and deploy to our Kubernetes cluster.

Here’s an example:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: sweet-aws-service
    stage: dev
  name: sweet-aws-service-dev
  namespace: sweet-service-namespace
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: sweet-aws-service
      name: sweet-aws-service
    spec:
      containers:
      - name: sweet-aws-service
        image: path.to.repo.for/sweet-aws-service:latest
        imagePullPolicy: Always
        env:
          - name: PORT
            value: "50000"
          - name: TLS_KEY
            valueFrom:
              secretKeyRef:
                name: grpc-tls
                key: key
          - name: TLS_CERT
            valueFrom:
              secretKeyRef:
                name: grpc-tls
                key: cert

apiVersion: extensions/v1beta1 kind: Deployment metadata: labels: app: sweet-aws-service stage: dev name: sweet-aws-service-dev namespace: sweet-service-namespace spec: replicas: 1 template: metadata: labels: app: sweet-aws-service name: sweet-aws-service spec: containers: - name: sweet-aws-service image: path.to.repo.for/sweet-aws-service:latest imagePullPolicy: Always env: - name: PORT value: "50000" - name: TLS_KEY valueFrom: secretKeyRef: name: grpc-tls key: key - name: TLS_CERT valueFrom: secretKeyRef: name: grpc-tls key: cert

This is an example of a deployment into Kubernetes for dev. That %> make deploy_dev from the Jenkinsfile above? That’s pushing this to our Kubernetes cluster.

Conclusion

There is a lot of information to take in here, but there are two points to really take home:

  1. It is totally possible.
  2. Use a unifying tool to… unify your tools. (“one tool to rule them all”)

For us, Point 1 is moot… it’s what we do. For Point 2, we use Make, and we use Make THROUGH THE ENTIRE PROCESS. I use Make locally in dev and on our build server. It ensures we’re using the same commands, the same containers, the same tools to do the same things. Test, integrate (test), and deploy. It’s not just about writing functional code anymore. It’s about writing a functional process to get that code, that value, to your customers!

And remember, as with anything, this stuff get’s easier with practice. So once you start doing it you will get the hang of it and life becomes easier and better. If you’d like some help getting started, download our datasheet to learn about our Modern CI/CD Pipeline.

-Craig Monson, Sr Automation Architect

 

How We Organize Terraform Code at 2nd Watch

When IT organizations adopt infrastructure as code (IaC), the benefits in productivity, quality, and ability to function at scale are manifold. However, the first few steps on the journey to full automation and immutable infrastructure bliss can be a major disruption to a more traditional IT operations team’s established ways of working. One of the common problems faced in adopting infrastructure as code is how to structure the files within a repository in a consistent, intuitive, and scaleable manner. Even IT operations teams whose members have development skills will still face this anxiety-inducing challenge simply because adopting IaC involves new tools whose conventions differ somewhat from more familiar languages and frameworks.

In this blog post, we’ll go over how we structure our IaC repositories within 2nd Watch professional services and managed services engagements with a particular focus on Terraform, an open-source tool by Hashicorp for provisioning infrastructure across multiple cloud providers with a single interface.

First Things First: README.md and .gitignore

The task in any new repository is to create a README file. Many git repositories (especially on Github) have adopted Markdown as a de facto standard format for README files. A good README file will include the following information:

  1. Overview: A brief description of the infrastructure the repo builds. A high-level diagram is often an effective method of expressing this information. 2nd Watch uses LucidChart for general diagrams (exported to PNG or a similar format) and mscgen_js for sequence diagrams.
  2. Pre-requisites: Installation instructions (or links thereto) for any software that must be installed before building or changing the code.
  3. Building The Code: What commands to run in order to build the infrastructure and/or run the tests when applicable. 2nd Watch uses Make in order to provide a single tool with a consistent interface to build all codebases, regardless of language or toolset. If using Make in Windows environments, Windows Subsystem for Linux is recommended for Windows 10 in order to avoid having to write two sets of commands in Makefiles: Bash, and PowerShell.

It’s important that you do not neglect this basic documentation for two reasons (even if you think you’re the only one who will work on the codebase):

  1. The obvious: Writing this critical information down in an easily viewable place makes it easier for other members of your organization to onboard onto your project and will prevent the need for a panicked knowledge transfer when projects change hands.
  2. The not-so-obvious: The act of writing a description of the design clarifies your intent to yourself and will result in a cleaner design and a more coherent repository.

All repositories should also include a .gitignore file with the appropriate settings for Terraform. GitHub’s default Terraform .gitignore is a decent starting point, but in most cases you will not want to ignore .tfvars files because they often contain environment-specific parameters that allow for greater code reuse as we will see later.

Terraform Roots and Multiple Environments

A Terraform root is the unit of work for a single terraform apply command. We group our infrastructure into multiple terraform roots in order to limit our “blast radius” (the amount of damage a single errant terraform apply can cause).

  • Repositories with multiple roots should contain a roots/ directory with a subdirectory for each root (e.g. VPC, one per-application) tf file as the primary entry point.
  • Note that the roots/ directory is optional for repositories that only contain a single root, e.g. infrastructure for an application team which includes only a few resources which should be deployed in concert. In this case, modules/ may be placed in the same directory as tf.
  • Roots which are deployed into multiple environments should include an env/ subdirectory at the same level as tf. Each environment corresponds to a tfvars file under env/ named after the environment, e.g. staging.tfvars. Each .tfvars file contains parameters appropriate for each environment, e.g. EC2 instance sizes.

Here’s what our roots directory might look like for a sample with a VPC and 2 application stacks, and 3 environments (QA, Staging, and Production):

Terraform modules

Terraform modules are self-contained packages of Terraform configurations that are managed as a group. Modules are used to create reusable components, improve organization, and to treat pieces of infrastructure as a black box. In short, they are the Terraform equivalent of functions or reusable code libraries.

Terraform modules come in two flavors:

  1. Internal modules, whose source code is consumed by roots that live in the same repository as the module.
  2. External modules, whose source code is consumed by roots in multiple repositories. The source code for external modules lives in its own repository, separate from any consumers and separate from other modules to ensure we can version the module correctly.

In this post, we’ll only be covering internal modules.

  • Each internal module should be placed within a subdirectory under modules/.
  • Module subdirectories/repositories should follow the standard module structure per the Terraform docs.
  • External modules should always be pinned at a version: a git revision or a version number. This practice allows for reliable and repeatable builds. Failing to pin module versions may cause a module to be updated between builds by breaking the build without any obvious changes in our code. Even worse, failing to pin our module versions might cause a plan to be generated with changes we did not anticipate.

Here’s what our modules directory might look like:

Terraform and Other Tools

Terraform is often used alongside other automation tools within the same repository. Some frequent collaborators include Ansible for configuration management and Packer for compiling identical machine images across multiple virtualization platforms or cloud providers. When using Terraform in conjunction with other tools within the same repo, 2nd Watch creates a directory per tool from the root of the repo:

Putting it all together

The following illustrates a sample Terraform repository structure with all of the concepts outlined above:

Conclusion

There’s no single repository format that’s optimal, but we’ve found that this standard works for the majority of our use cases in our extensive use of Terraform on dozens of projects. That said, if you find a tweak that works better for your organization – go for it! The structure described in this post will give you a solid and battle-tested starting point to keep your Terraform code organized so your team can stay productive.

Additional resources

  • The Terraform Book by James Turnbull provides an excellent introduction to Terraform all the way through repository structure and collaboration techniques.
  • The Hashicorp AWS VPC Module is one of the most popular modules in the Terraform Registry and is an excellent example of a well-written Terraform module.
  • The source code for James Nugent’s Hashidays NYC 2017 talk code is an exemplary Terraform repository. Although it’s based on an older version of Terraform (before providers were broken out from the main Terraform executable), the code structure, formatting, and use of Makefiles is still current.

For help getting started adopting Infrastructure as Code, contact us.

  • Josh Kodroff, Associate Cloud Consultant