How We Organize Terraform Code at 2nd Watch

When IT organizations adopt infrastructure as code (IaC), the benefits in productivity, quality, and ability to function at scale are manifold. However, the first few steps on the journey to full automation and immutable infrastructure bliss can be a major disruption to a more traditional IT operations team’s established ways of working. One of the common problems faced in adopting infrastructure as code is how to structure the files within a repository in a consistent, intuitive, and scaleable manner. Even IT operations teams whose members have development skills will still face this anxiety-inducing challenge simply because adopting IaC involves new tools whose conventions differ somewhat from more familiar languages and frameworks.

In this blog post, we’ll go over how we structure our IaC repositories within 2nd Watch professional services and managed services engagements with a particular focus on Terraform, an open-source tool by Hashicorp for provisioning infrastructure across multiple cloud providers with a single interface.

First Things First: README.md and .gitignore

The task in any new repository is to create a README file. Many git repositories (especially on Github) have adopted Markdown as a de facto standard format for README files. A good README file will include the following information:

  1. Overview: A brief description of the infrastructure the repo builds. A high-level diagram is often an effective method of expressing this information. 2nd Watch uses LucidChart for general diagrams (exported to PNG or a similar format) and mscgen_js for sequence diagrams.
  2. Pre-requisites: Installation instructions (or links thereto) for any software that must be installed before building or changing the code.
  3. Building The Code: What commands to run in order to build the infrastructure and/or run the tests when applicable. 2nd Watch uses Make in order to provide a single tool with a consistent interface to build all codebases, regardless of language or toolset. If using Make in Windows environments, Windows Subsystem for Linux is recommended for Windows 10 in order to avoid having to write two sets of commands in Makefiles: Bash, and PowerShell.

It’s important that you do not neglect this basic documentation for two reasons (even if you think you’re the only one who will work on the codebase):

  1. The obvious: Writing this critical information down in an easily viewable place makes it easier for other members of your organization to onboard onto your project and will prevent the need for a panicked knowledge transfer when projects change hands.
  2. The not-so-obvious: The act of writing a description of the design clarifies your intent to yourself and will result in a cleaner design and a more coherent repository.

All repositories should also include a .gitignore file with the appropriate settings for Terraform. GitHub’s default Terraform .gitignore is a decent starting point, but in most cases you will not want to ignore .tfvars files because they often contain environment-specific parameters that allow for greater code reuse as we will see later.

Terraform Roots and Multiple Environments

A Terraform root is the unit of work for a single terraform apply command. We group our infrastructure into multiple terraform roots in order to limit our “blast radius” (the amount of damage a single errant terraform apply can cause).

  • Repositories with multiple roots should contain a roots/ directory with a subdirectory for each root (e.g. VPC, one per-application) tf file as the primary entry point.
  • Note that the roots/ directory is optional for repositories that only contain a single root, e.g. infrastructure for an application team which includes only a few resources which should be deployed in concert. In this case, modules/ may be placed in the same directory as tf.
  • Roots which are deployed into multiple environments should include an env/ subdirectory at the same level as tf. Each environment corresponds to a tfvars file under env/ named after the environment, e.g. staging.tfvars. Each .tfvars file contains parameters appropriate for each environment, e.g. EC2 instance sizes.

Here’s what our roots directory might look like for a sample with a VPC and 2 application stacks, and 3 environments (QA, Staging, and Production):

Terraform modules

Terraform modules are self-contained packages of Terraform configurations that are managed as a group. Modules are used to create reusable components, improve organization, and to treat pieces of infrastructure as a black box. In short, they are the Terraform equivalent of functions or reusable code libraries.

Terraform modules come in two flavors:

  1. Internal modules, whose source code is consumed by roots that live in the same repository as the module.
  2. External modules, whose source code is consumed by roots in multiple repositories. The source code for external modules lives in its own repository, separate from any consumers and separate from other modules to ensure we can version the module correctly.

In this post, we’ll only be covering internal modules.

  • Each internal module should be placed within a subdirectory under modules/.
  • Module subdirectories/repositories should follow the standard module structure per the Terraform docs.
  • External modules should always be pinned at a version: a git revision or a version number. This practice allows for reliable and repeatable builds. Failing to pin module versions may cause a module to be updated between builds by breaking the build without any obvious changes in our code. Even worse, failing to pin our module versions might cause a plan to be generated with changes we did not anticipate.

Here’s what our modules directory might look like:

Terraform and Other Tools

Terraform is often used alongside other automation tools within the same repository. Some frequent collaborators include Ansible for configuration management and Packer for compiling identical machine images across multiple virtualization platforms or cloud providers. When using Terraform in conjunction with other tools within the same repo, 2nd Watch creates a directory per tool from the root of the repo:

Putting it all together

The following illustrates a sample Terraform repository structure with all of the concepts outlined above:

Conclusion

There’s no single repository format that’s optimal, but we’ve found that this standard works for the majority of our use cases in our extensive use of Terraform on dozens of projects. That said, if you find a tweak that works better for your organization – go for it! The structure described in this post will give you a solid and battle-tested starting point to keep your Terraform code organized so your team can stay productive.

Additional resources

  • The Terraform Book by James Turnbull provides an excellent introduction to Terraform all the way through repository structure and collaboration techniques.
  • The Hashicorp AWS VPC Module is one of the most popular modules in the Terraform Registry and is an excellent example of a well-written Terraform module.
  • The source code for James Nugent’s Hashidays NYC 2017 talk code is an exemplary Terraform repository. Although it’s based on an older version of Terraform (before providers were broken out from the main Terraform executable), the code structure, formatting, and use of Makefiles is still current.

For help getting started adopting Infrastructure as Code, contact us.

  • Josh Kodroff, Associate Cloud Consultant

CI/CD for Infrastructure as Code with Terraform and Atlantis

In this post, we’ll go over a complete workflow for continuous integration (CI) and continuous delivery (CD) for infrastructure as code (IaC) with just 2 tools: Terraform, and Atlantis.

What is Terraform?

So what is Terraform? According to the Terraform website:

Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. Terraform can manage existing and popular service providers as well as custom in-house solutions.

In practice, this means that Terraform allows you to declare what you want your infrastructure to look like – in any cloud provider – and will automatically determine the changes necessary to make it so. Because of its simple syntax and cross-cloud compatibility, it’s 2nd Watch’s choice for infrastructure as code.

Pain You May Be Experiencing Working With Terraform

When you have multiple collaborators (individ

uals, teams, etc.) working on a Terraform codebase, some common problems are likely to emerge:

  1. Enforcing peer review becomes difficult. In any codebase, you’ll want to ensure that your code is peer reviewed in order to ensure better quality in accordance with The Second Way of DevOps: Feedback. The role of peer review in IaC codebases is even more important. IaC is a powerful tool, but that tool has a double-edge – we are clearly more productive for using it, but that increased productivity also means that a simple typo could potentially cause a major issue with production infrastructure. In order to minimize the potential for bad code to be deployed, you should require peer review on all proposed changes to a codebase (e.g. GitHub Pull Requests with at least one reviewer required). Terraform’s open source offering has no facility to enforce this rule.
  1. Terraform plan output is not easily integrated in code reviews. In all code reviews, you must examine the source code to ensure that your standards are followed, that the code is readable, that it’s reasonably optimized, etc. In this aspect, reviewing Terraform code is like reviewing any other code. However, Terraform code has the unique requirement that you must also examine the effect the code change will have upon your infrastructure (i.e. you must also review the output of a terraform plan command). When you potentially have multiple feature branches in the review process, it becomes critical that you are assured that the terraform plan output is what will be executed when you run terraform apply. If the state of infrastructure changes between a run of terraform plan and a run of terraform apply, the effect of this difference in state could range from inconvenient (the apply fails) to catastrophic (a significant production outage). Terraform itself offers locking capabilities but does not provide an easy way to integrate locking into a peer review process in its open source product.
  1. Too many sets of privileged credentials. Highly-privileged credentials are often required to perform Terraform actions, and the greater the number principals you have with privileged access, the higher your attack surface area becomes. Therefore, from a security standpoint, we’d like to have fewer sets of admin credentials which can potentially be compromised.

What is Atlantis?

And what is Atlantis? Atlantis is an open source tool that allows safe collaboration on Terraform projects by making sure that proposed changes are reviewed and that the proposed change is the actual change which will be executed on your infrastructure. Atlantis is compatible (at the time of writing) with GitHub and Gitlab, so if you’re not using either of these Git hosting systems, you won’t be able to use Atlantis.

How Atlantis Works With Terraform

Atlantis is deployed as a single binary executable with no system-wide dependencies. An operator adds a GitHub or GitLab token for a repository containing Terraform code. The Atlantis installation process then adds hooks to the repository which allows communication to the Atlantis server during the Pull Request process.

You can run Atlantis in a container or a small virtual machine – the only requirement is that the Terraform instance can communicate with both your version control (e.g. GitHub) and infrastructure (e.g. AWS) you’re changing. Once Atlantis is configured for a repository, the typical workflow is:

  1. A developer creates a feature branch in git, makes some changes, and creates a Pull Request (GitHub) or Merge Request (GitLab).
  2. The developer enters atlantis plan in a PR comment.
  3. Via the installed web hooks, Atlantis locally runs terraform plan. If there are no other Pull Requests in progress, Atlantis adds the resulting plan as a comment to the Merge Request.
    • If there are other Pull Requests in progress, the command fails because we can’t ensure that the plan will be valid once applied.
  4. The developer ensures the plan looks good and adds reviewers to the Merge Request.
  5. Once the PR has been approved, the developer enters atlantis apply in a PR comment. This will trigger Atlantis to run terraform apply and the changes will be deployed to your infrastructure.
    • The command will fail if the Merge Request has not been approved.

The following sequence diagram illustrates the sequence of actions described above:

Atlantis sequence diagram

We can see how our pain points in Terraform collaboration are addressed by Atlantis:

  1. In order to enforce code review, you can launch Atlantis with the –require approvals flag: https://github.com/runatlantis/atlantis#approvals
  2. In order to ensure that your terraform plan accurately reflects the change to your infrastructure that will be made when you run terraform apply, Atlantis performs locking on a project or workspace basis: https://github.com/runatlantis/atlantis#locking
  3. In order to prevent creating multiple sets of privileged credentials, you can deploy Atlantis to run on an EC2 instance with a privileged IAM role in its instance profile (e.g. in AWS). In this way, all of your Terraform commands run through a single set of privileged credentials and obviate the need to distribute multiple sets of privileged credentials: https://github.com/runatlantis/atlantis#aws-credentials

Conclusion

You can see that with minimal additional infrastructure you can establish a safe and reliable CI/CD pipeline for your infrastructure as code, enabling you to get more done safely! To find out how you can deploy a CI/CD pipeline in less than 60 days, Contact Us.

-Josh Kodroff, Associate Cloud Consultant


Migrating to Terraform v0.10.x

When it comes to managing cloud-based resources, it’s hard to find a better tool than Hashicorp’s Terraform. Terraform is an ‘infrastructure as code’ application, marrying configuration files with backing APIs to provide a nearly seamless layer over your various cloud environments. It allows you to declaratively define your environments and their resources through a process that is structured, controlled, and collaborative.

One key advantage Terraform provides over other tools (like AWS CloudFormation) is having a rapid development and release cycle fueled by the open source community. This has some major benefits: features and bug fixes are readily available, new products from resource providers are quickly incorporated, and you’re able to submit your own changes to fulfill your own requirements.

Hashicorp recently released v0.10.0 of Terraform, introducing some fundamental changes in the application’s architecture and functionality. We’ll review the three most notable of these changes and how to incorporate them into your existing Terraform projects when migrating to Terraform v.0.10.x.

  1. Terraform Providers are no longer distributed as part of the main Terraform distribution
  2. New auto-approve flag for terraform apply
  3. Existing terraform env commands replaced by terraform workspace

A brief note on Terraform versions:

Even though Terraform uses a style of semantic versioning, their ‘minor’ versions should be treated as ‘major’ versions.

1. Terraform Providers are no longer distributed as part of the main Terraform distribution

The biggest change in this version is the removal of provider code from the core Terraform application.

Terraform Providers are responsible for understanding API interactions and exposing resources for a particular platform (AWS, Azure, etc). They know how to initialize and call their applications or CLIs, handle authentication and errors, and convert HCL into the appropriate underlying API calls.

It was a logical move to split the providers out into their own distributions. The core Terraform application can now add features and release bug fixes at a faster pace, new providers can be added without affecting the existing core application, and new features can be incorporated and released to existing providers without as much effort. Having split providers also allows you to update your provider distribution and access new resources without necessarily needing to update Terraform itself. One downside of this change is that you have to keep up to date with features, issues, and releases of more projects.

The provider repos can be accessed via the Terraform Providers organization in GitHub. For example, the AWS provider can be found here.

Custom Providers

An extremely valuable side-effect of having separate Terraform Providers is the ability to create your own, custom providers. A custom provider allows you to specify new or modified attributes for existing resources in existing providers, add new or unsupported resources in existing providers, or generate your own resources for your own platform or application.

You can find more information on creating a custom provider from the Terraform Provider Plugin documentation.

1.1 Configuration

The nicest part of this change is that it doesn’t really require any additional modifications to your existing Terraform code if you were already using a Provider block.

If you don’t already have a provider block defined, you can find their configurations from the Terraform Providers documentation.

You simply need to call the terraform init command before you can perform any other action. If you fail to do so, you’ll receive an error informing you of the required actions (img 1a).

After successfully reinitializing your project, you will be provided with the list of providers that were installed as well as the versions requested (img 1b).

You’ll notice that Terraform suggests versions for the providers we are using – this is because we did not specify any specific versions of our providers in code. Since providers are now independently released entities, we have to tell Terraform what code it should download and use to run our project.

(Image 1a: Notice of required reinitialization)

 

 

 

 

 

 

 

 

(Image 1b: Response from successful reinitialization)

 

 

 

 

 

 

 

 

Providers are released separately from Terraform itself, and maintain their own version numbers.

You can specify the version(s) you want to target in your existing provider blocks by adding the version property (code block 1). These versions should follow the semantic versioning specification (similar to node’s package.json or python’s requirements.txt).

For production use, it is recommended to limit the acceptable provider versions to ensure that new versions with breaking changes are not automatically installed.

(Code Block 1: Provider Config)

provider "aws" {
  version = "0.1.4"
  allowed_account_ids = ["1234567890"]
  region = "us-west-2"
}

 (Image 1c: Currently defined provider configuration)

 

 

 

 

 

 

 

 

2. New auto-approve flag for terraform apply

In previous versions, running terraform apply would immediately apply any changes between your project and saved state.

Your normal workflow would likely be:
run terraform plan followed by terraform apply and hope nothing changed in between.

This version introduced a new auto-approve flag which will control the behavior of terraform apply.

Deprecation Notice

This flag is set to true to maintain backwards compatibility, but will quickly change to false in the near future.

2.1 auto-approve=true (current default)

When set to true, terraform apply will work like it has in previous versions.

If you want to maintain this functionality, you should upgrade your scripts, build systems, etc now as this default value will change in a future Terraform release.

(Code Block 2: Apply with default behavior)

# Apply changes immediately without plan file
terraform apply --auto-approve=true

2.2 auto-approve=false

When set to false, Terraform will present the user with the execution plan and pause for interactive confirmation (img 2a).

If the user provides any response other than yes, terraform will exit without applying any changes.

If the user confirms the execution plan with a yes response, Terraform will then apply the planned changes (and only those changes).

If you are trying to automate your Terraform scripts, you might want to consider producing a plan file for review, then providing explicit approval to apply the changes from the plan file.

(Code Block 3: Apply plan with explicit approval)

# Create Plan
terraform plan -out=tfplan

# Apply approved plan
terraform apply tfplan --auto-approve=true

(Image 2a: Terraform apply with execution plan)

 

 

 

 

 

 

3. Existing terraform env commands replaced by terraform workspace

The terraform env family of commands were replaced with terraform workspace to help alleviate some confusion in functionality. Workspaces are very useful, and can do much more than just split up environment state (which they aren’t necessarily used for). I recommend checking them out and seeing if they can improve your projects.

There is not much to do here other than switch the command invocation, but the previous commands still currently work for now (but are deprecated).

 

License Warning

You are using an UNLICENSED copy of Scroll Office.

Do you find Scroll Office useful?
Consider purchasing it today: https://www.k15t.com/software/scroll-office

 

— Steve Byerly, Principal SDE (IV), Cloud, 2nd Watch


Standardizing & Automating Infrastructure Development Processes

Introduction

Let’s start with a small look at the current landscape of technology and how we arrived here. There aren’t very many areas of tech that have not been, or are not currently, in a state of fluctuation. Everything from software delivery vehicles and development practices, to infrastructure creation has experienced some degree of transformation over the past several years. From VMs to Containers, it seems like almost every day the technology tool belt grows a little bigger, and our world gets a little better (though perhaps more complex) due to these advancements. For me, this was incredibly apparent when I began to delve into configuration management which later evolved into what we now call “infrastructure as code”.

The transformation of the development process began with simple systems that we once used to manage a few machines (like bash scripts or Makefiles) which then morphed into more complex systems (CF Engine, Puppet, and Chef) to manage thousands of systems. As configuration management software became more mature, engineers and developers began leaning on them to do more things. With the advent of hypervisors and the rise of virtual machines, it was only a short time before hardware requests changed to API requests and thus the birth of infrastructure as a service (IaaS). With all the new capabilities and options in this brave new world, we once again started to lean on our configuration management systems—this time for provisioning, and not just convergence.

Provisioning & Convergence

I mentioned two terms that I want to clarify; provisioning and convergence. Say you were a car manufacturer and you wanted to make a car. Provisioning would be the step in which you request the raw materials to make the parts for your automobile. This is where we would use tools like Terraform, CloudFormation, or Heat. Whereas convergence is the assembly line by which we check each part and assemble the final product (utilizing config management software).

By and large, the former tends to be declarative with little in the way of conditionals or logic, while the latter is designed to be robust and malleable software that supports all the systems we run and plan on running. This is the frame for the remainder of what we are going to talk about.

By separating the concerns of our systems, we can create a clear delineation of the purpose for each tool so we don’t feel like we are trying to jam everything into an interface that doesn’t have the most support for our platform or more importantly our users. The remainder of this post will be directed towards the provisioning aspect of configuration management.

Standards and Standardization

These are two different things in my mind. Standardization is extremely prescriptive and can often seem particularly oppressive to professional knowledge workers, such as engineers or developers. It can be seen as taking the innovation away from the job. Whereas standards provide boundaries, frame the problem, and allow for innovative ways of approaching solutions. I am not saying standardization in some areas is entirely bad, but we should let the people who do the work have the opportunity to grow and innovate in their own way with guidance. The topic of standards and standardization is part of a larger conversation about culture and change. We intend to follow up with a series of blog articles relating to organizational change in the era of the public cloud in the coming weeks.

So, let’s say that we make a standard for our new EC2 instances running Ubuntu. We’ll say that all instances must be running the la official Canonical Ubuntu 14.04 AMI and must have these three tags; Owner, Environment, and Application. How can we enforce that in development of our infrastructure? On AWS, we can create AWS Config Rules, but that is reactive and requires ad-hoc remediation. What we really want is a more prescriptive approach bringing our standards closer to the development pipeline. One of the ways I like to solve this issue is by creating an abstraction. Say we have a terraform template that looks like this:

# Create a new instance of the la Ubuntu 14.04 on an
provider "aws" { region = "us-west-2"
}

data "aws_ami" "ubuntu" { most_recent = true

filter {
name	= "name" values =
["ubuntu/images/hvm-ssd/ubuntu-trusty-1 4.04-amd64-server-*"]
}

filter {
name	= "virtualization-type" values = ["hvm"]
}

owners = ["099720109477"] # Canonical
}

resource "aws_instance" "web" { ami	=
"${data.aws_ami.ubuntu.id}" instance_type = "t2.micro"

tags {
Owner	= "DevOps Ninja" Environment = "Dev" Application = "Web01"
}
}

This would meet the standard that we have set forth, but we are relying on the developer or engineer to adhere to that standard. What if we enforce this standard by codifying it in an abstraction? Let’s take that existing template and turn it into a terraform module instead.

Module

# Create a new instance of the la Ubuntu 14.04 on an

variable "aws_region" {} variable "ec2_owner" {} variable "ec2_env" 
{} variable "ec2_app" {}
variable "ec2_instance_type" {}

provider "aws" {
region = "${var.aws_region}"
}

data "aws_ami" "ubuntu" { most_recent = true

filter {
name	= "name" values =
["ubuntu/images/hvm-ssd/ubuntu-trusty-1 4.04-amd64-server-*"]
}

filter {
name	= "virtualization-type" values = ["hvm"]
}

owners = ["099720109477"] # Canonical
}

resource "aws_instance" "web" { ami	=
"${data.aws_ami.ubuntu.id}" instance_type =
"${var.ec2_instance_type}"

tags {
Owner	= "${var.ec2_owner}" Environment = "${var.ec2_env}" Application = 
"${var.ec2_app}"
}
}

Now we can have our developers and engineers leverage our tf_ubuntu_ec2_instance module.

New Terraform Plan

module "Web01" { source =
"git::ssh://git@github.com/SomeOrg/tf_u buntu_ec2_instance"

aws_region = "us-west-2" ec2_owner = "DevOps Ninja" ec2_env	= "Dev"
ec2_app	= "Web01"
}

This doesn’t enforce the usage of the module, but it does create an abstraction that provides an easy way to maintain standards without a ton of overhead, it also provides an example for further creation of modules that enforce these particular standards.

This leads us into another method of implementing standards but becomes more prescriptive and falls into the category of standardization (eek!). One of the most underutilized services in the AWS product stable has to be Service Catalog.

AWS Service Catalog allows organizations to create and manage catalogs of IT services that are approved for use on AWS. These IT services can include everything from virtual machine images, servers, software, and databases to complete multi-tier application architectures. AWS Service Catalog allows you to centrally manage commonly deployed IT services, and helps you achieve consistent governance and meet your compliance requirements, while enabling users to quickly deploy only the approved IT services they need.

The Interface

Once we have a few of these projects in place (e.g. a service catalog or a repo full of composable modules for infrastructure that meet our standards) how do we serve them out? How you spur adoption of these tools and how they are consumed can be very different depending on your organization structure. We don’t want to upset workflow and how work gets done, we just want it to go faster and be more reliable. This is what we talk about when we mention the interface. Whichever way work flows in, we should supplement it with some type of software or automation to link those pieces of work together. Here are a few examples of how this might look (depending on your organization):

1.) Central IT Managed Provisioning

If you have an organization that manages requests for infrastructure, having this new shift in paradigm might seem daunting. The interface in this case is the ticketing system. This is where we would create an integration with our ticketing software to automatically pull the correct project from service catalog or module repo based on some criteria in the ticket. The interface doesn’t change but is instead supplemented by some automation to answer these requests, saving time and providing faster delivery of service.

2.) Full Stack Engineers

If you have engineers that develop software and the infrastructure that runs their applications this is the easiest scenario to address in some regards and the hardest in others. Your interface might be a build server, or it could simply be the adoption of an internal open source model where each team develops modules and shares them in a common place, constantly trying to save time and not re-invent the wheel.

Supplementing with software or automation can be done in a ton of ways. Check out an example Kelsey Hightower wrote using Jira.

“A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.” – John Gall

All good automation starts out with a manual and well-defined process. Standardizing & automating infrastructure development processes begins with understanding how our internal teams can be best organized.  This allows us to efficiently perform work before we can begin automating. Work with your teammates to create a value stream map to understand the process entirely before doing anything towards the efforts of automating a workflow.

With 2nd Watch designs and automation you can deploy quicker, learn faster and modify as needed with Continuous Integration / Continuous Deployment (CI/CD). Our Workload Solutions transform on-premises workloads to digital solutions in the public cloud with next generation products and services.  To accelerate your infrastructure development so that you can deploy faster, learn more often and adapt to customer requirements more effectively, speak with a 2nd Watch cloud deployment expert today.

– Lars Cromley, Director of Engineering, Automation, 2nd Watch


Migrating Terraform Remote State to a “Backend” in Terraform v.0.9+

(AKA: Where the heck did ‘terraform remote config’ go?!!!)

If you are working with cloud-based architectures or working in a DevOps shop, you’ve no doubt been managing your infrastructure as code. It’s also likely that you are familiar with tools like Amazon CloudFormation and Terraform for defining and building your cloud architecture and infrastructure. For a good comparison on Amazon CloudFormation and Terraform check out Coin Graham’s blog on the matter: AWS CFT vs. Terraform: Advantages and Disadvantages.

If you are already familiar with Terraform, then you may have encountered a recent change to the way remote state is handled, starting with Terraform v0.9. Continue reading to find out more about migrating Terraform Remote State to a “Backend” in Terraform v.0.9+.

First off… if you are unfamiliar with what remote state is check out this page.

Remote state is a big ol’ blob of JSON that stores the configuration details and state of your Terraform configuration and infrastructure that has actually been deployed. This is pretty dang important if you ever plan on changing your environment (which is “likely”, to put it lightly) and especially important if you want to have more than one person managing/editing/maintaining the infrastructure, or if you have even the most basic rationale as it pertains to backup and recovery.

Terraform supports almost a dozen backend types (as of this writing) including:

  • Artifactory
  • Azure
  • Consul
  • Etcd
  • Gcs
  • Http
  • Manta
  • S3
  • Swift
  • Terraform Enterprise (AKA: Atlas)

 

Why not just keep the Terraform state in the same git repo I keep the Terraform code in?

You also don’t want to store the state file in a code repository because it may contain sensitive information like DB passwords, or simply because the state is prone to frequent changes and it might be easy to forget to push those changes to your git repo any time you run Terraform.

So, what happened to terraform remote anyway?

If you’re like me, you probably run the la version of HashiCorp’s Terraform tool as soon as it is available (we actually have a hook in our team Slack channel that notifies us when a new version is released). With the release of Terraform v.0.9 last month, we were endowed with the usual heaping helping of excellent new features and bug-fixes we’ve come to expect from the folks at HashiCorp, but were also met with an unexpected change in the way remote state is handled.

Unless you are religious about reading the release notes, you may have missed an important change in v.0.9 around the remote state. While the release notes don’t specifically call out the removal (not even deprecation, but FULL removal) of the prior method (e.g. with Terraform remote config, the Upgrade Guide specifically calls out the process in migrating from the legacy method to the new method of managing remote state). More specifically, they provide a link to a guide for migrating from the legacy remote state config to the new backend system. The steps are pretty straightforward and the new approach is much improved over the prior method for managing remote state. So, while the change is good, a deprecation warning in v.0.8 would have been much appreciated. At least it is still backwards compatible with the legacy remote state files (up to version 0.10), making the migration process much less painful.

Prior to v.0.9, you may have been managing your Terraform remote state in an S3 bucket utilizing the Terraform remote config command. You could provide arguments like: backend and backend-config to configure things like the S3 region, bucket, and key where you wanted to store your remote state. Most often, this looked like a shell script in the root directory of your Terraform directory that you ran whenever you wanted to initialize or configure your backend for that project.

Something like…

Terraform Legacy Remote S3 Backend Configuration Example

#!/bin/sh
export AWS_PROFILE=myprofile
terraform remote config \
--backend=s3 \
--backend-config="bucket=my-tfstates" \
--backend-config="key=projectX.tfstate" \
--backend-config="region=us-west-2"

This was a bit clunky but functional. Regardless, it was rather annoying having some configuration elements outside of the normal terraform config (*.tf) files.

Along came Terraform v.0.9

The introduction of Terraform v.0.9 with its new fangled “Backends” makes things much more seamless and transparent.  Now we can replicate that same remote state backend configuration with a Backend Resource in a Terraform configuration like so:

Terraform S3 Backend Configuration Example

terraform {
  backend "s3" {
    bucket = "my-tfstates"
    key    = "projectX.tfstate"
    region = "us-west-2"
  }
}

A Migration Example

So, using our examples above let’s walk through the process of migrating from a legacy “remote config” to a “backend”.  Detailed instructions for the following can be found here.

1. (Prior to upgrading to Terraform v.0.9+) Pull remote config with pre v.0.9 Terraform

> terraform remote pull
Local and remote state in sync

2. Backup your terraform.tfstate file

> cp .terraform/terraform.tfstate 
/path/to/backup

3. Upgrade Terraform to v.0.9+

4. Configure the new backend

terraform {
  backend "s3" {
    bucket = "my-tfstates"
    key    = "projectX.tfstate"
    region = "us-west-2"
  }
}

5. Run Terraform init

> terraform init
Downloading modules (if any)...
 
Initializing the backend...
New backend configuration detected with legacy remote state!
 
Terraform has detected that you're attempting to configure a new backend.
At the same time, legacy remote state configuration was found. Terraform will
first configure the new backend, and then ask if you'd like to migrate
your remote state to the new backend.
 
 
Do you want to copy the legacy remote state from "s3"?
  Terraform can copy the existing state in your legacy remote state
  backend to your newly configured backend. Please answer "yes" or "no".
 
  Enter a value: no
  
Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
 
Terraform has been successfully initialized!
 
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
 
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your environment. If you forget, other
commands will detect it and remind you to do so if necessary.

6. Verify the new state is copacetic

> terraform plan
 
...
 
No changes. Infrastructure is up-to-date.
 
This means that Terraform did not detect any differences between your
configuration and real physical resources that exist. As a result, Terraform
doesn't need to do anything.

7.  Commit and push

In closing…

Managing your infrastructure as code isn’t rocket science, but it also isn’t trivial.  Having a solid understanding of cloud architectures, the Well Architected Framework, and DevOps best practices can greatly impact the success you have.  A lot goes into architecting and engineering solutions in a way that maximizes your business value, application reliability, agility, velocity, and key differentiators.  This can be a daunting task, but it doesn’t have to be!  2nd Watch has the people, processes, and tools to make managing your cloud infrastructure as code a breeze! Contact us today to find out how.

 

— Ryan Kennedy, Principal Cloud Automation Architect, 2nd Watch

 


2nd Watch Meets Customer Demands and Prepares for Continued Growth and Acceleration with Amazon Aurora

The Product Development team at 2nd Watch is responsible for many technology environments that support our software and solutions—and ultimately, our customers. These environments need to be easily built, maintained, and kept in sync. In 2016, 2nd Watch performed an analysis on the amount of AWS billing data that we had collected and the number of payer accounts we had processed over the course of the previous year.  Our analysis showed that these measurements had more than tripled from 2015 and projections showed that we will continue to grow at the same, rapid pace with AWS usage and client onboarding increasing daily. Knowing that the storage of data is critical for many systems, our Product Development team underwent an evaluation of the database architecture used to house our company’s billing data—a single SQL Server instance running a Web edition of SQL Server with the maximum number of EBS volumes attached.

During the evaluation, areas such as performance, scaling, availability, maintenance and cost were considered and deemed most important for future success. The evaluation revealed that our current billing database architecture could not meet the criteria laid out to keep pace with growth.  Considerations were made to increase the storage capacity by one VM to the maximum family size or potentially upgrade to MS SQL Enterprise. In either scenario, the cost of the MS SQL instance doubled.  The only option for scaling without substantially increasing our cost was to scale vertically, however, to do so would result in diminishing performance gains. Maintenance of the database had become a full-time job that was increasingly difficult to manage.

Ultimately, we chose the cloud-native solution, Amazon Aurora, for its scalability, low-risk, easy-to-use technology.  Amazon Aurora is a MySQL relational database that provides speed and reliability while being delivered at a lower cost. It offers greater than 99.99% availability and can store up to 64TB of data. Aurora is self-healing and fully managed, which, along with the other key features, made Amazon Aurora an easy choice as we continue to meet the AWS billing usage demands of our customers and prepare for future growth.

The conversion from MS SQL to Amazon Aurora was successfully completed in early 2017 and, with the benefits and features that Amazon Aurora offers, many gains were made in multiple areas. Product Development can now reduce the complexity of database schemas because of the way Aurora stores data. For example, a database with one hundred tables and hundreds of stored procedures was reduced to one table with 10 stored procedures. Gains were made in performance as well. The billing system produces thousands of queries per minute and Amazon Aurora handles the load with the ability to scale to accommodate the increasing number of queries. Maintenance of the Amazon Aurora system is now virtually managed. Tasks such as database backups are automated without the complicated task of managing disks. Additionally, data is copied across six replicas in three availability zones which ensures availability and durability.

With Amazon Aurora, every environment is now easily built and setup using Terraform. All infrastructure is automatically setup—from the web tier to the database tier—with Amazon CloudWatch logs to alert the company when issues occur. Data can easily be imported using automated processes and even anonymized if there is sensitive data or the environment is used to demo to our customers. With the conversion of our database architecture from a single MS SQL Service instance to Amazon Aurora, our Product Development team can now focus on accelerating development instead of maintaining its data storage system.

 

 

 

 


AWS CFT vs. Terraform: Advantages and Disadvantages

UPDATE:  AWS Cloudformation now supports YAML.  To be sure, this is a huge improvement over JSON in terms of formatting and use of comments.  This will also simplify windows and linux userdata scripts.  So for teams that are just starting with AWS and don’t need any of the additional benefits of Terraform, YAML would be the best place to start.  Existing teams will likely still have a cache of JSON templates that they will need to recreate and should consider whether the other benefits of Terraform warrant a move away from CFT.

If you’re familiar with AWS CloudFormation Templates (CFTs) and how they work but have been considering Terraform, this guide is for you.  This basic guide will introduce you to some of the advantages and disadvantages of Terraform in comparison to CFT to determine if you should investigate further and try it yourself.  If you don’t have a rudimentary familiarity with Terraform, head over to https://www.terraform.io/intro/index.html for a quick overview.

Advantages

Formatting – This is far and away the strongest advantage of Terraform.  JSON is not a coding language, and it shows.  It’s common for CFTs to be 3000 lines long, and most of that is just JSON braces and bracket.  Terraform has a simple (but custom) HCL for creating templates and makes it easy to document and comment your code.  Whole sections can be moved to a folder structure for design and clarity.  This makes your infrastructure feel a bit more like actual code.  Lastly, you won’t need to convert Userdata bash and PowerShell scripts to JSON only to deploy and discover you forgot one last escaping backslash.  Userdata scripts can be written in separate files exactly as you would write them on the server locally.  For an example, here’s a comparison of JSON to Terraform for creating an instance:

Instance in CFT


"StagingInstance": {
  "Type": "AWS::EC2::Instance",
  "Properties": {
    "UserData": {
      "Fn::Base64": {
        "Fn::Join": ["", [
          "#!/bin/bash -v\n",
          "yum update -y aws*\n",
          "yum update --sec-severity=critical -y\n",
          "yum install -y aws-cfn-bootstrap\n",
          "# download data and install file\n",
          "/opt/aws/bin/cfn-init -s ", {
            "Ref": "AWS::StackName"
          }, " -r StagingInstance ",
          "    --region ", {
            "Ref": "AWS::Region"
          },
          " || error_exit 'Failed to run cfn-init'\n"
        ]]
      }
    },
    "SecurityGroupIds": [{
      "Ref": "StagingSecurityGroup"
    }],
    "ImageId": {
      "Ref": "StagingAMI"
    },
    "KeyName": {
      "Ref": "InstancePrivateKeyName"
    },
    "InstanceType": {
      "Ref": "StagingInstanceType"
    },
    "IamInstanceProfile": {
      "Ref": "StagingInstanceProfile"
    },
    "Tags": [{
      "Key": "Name",
      "Value": {
        "Fn::Join": ["-", [
          "staging", {
            "Ref": "AWS::StackName"
          }, "app-instance"
        ]]
      }
    }],
    "SubnetId": {
      "Ref": "PrivateSubnet1"
    }
  }
}

Instance in Terraform


#
Create the staging instance
resource "aws_instance"
"staging" {
  ami = "${var.staging_instance_ami}"
  instance_type =
    "${var.staging_instance_type}"
  subnet_id =
    "${var.private_subnet_id_1}"
  vpc_security_group_ids = [
    "${aws_security_group.staging.id}"
  ]
  iam_instance_profile =
    "${aws_iam_instance_profile.staging.name}"
  key_name =
    "${var.instance_private_key_name}"
  tags {
    Name =
      "staging-${var.stack_name}-instance"
  }
  user_data = "${file("
  instances / staginguserdatascript.sh ")}"
}

Managing State – This is the second advantage for Terraform.  Terraform knows the state of the environment from the last run, so you can run “terraform plan” and see exactly what has changed with the items that Terraform has created.  With an update to a CFT, you only know that an item will be “Modified,” but not how.  At that point you’ll need to audit the modified item and manually compare to the existing CFT to determine what needs to be updated.

Multi-Provider Support – Depending on how you utilize AWS and other providers, this can be a very big deal.  Terraform gives you a centralized location to manage multiple providers.  Maybe your DNS is in Azure but your servers are in AWS.  You could build an ELB and update the Azure DNS all in the same run.  Or maybe you want to update your AWS infrastructure and also update your DataDog monitoring too.  If you needed a provider they didn’t have, you could presumably add it since the code is open source.

Short learning curve – While they did introduce custom formatting for Terraform templates, the CFT and API nomenclature is *mostly* preserved.  For example, when creating an instance in CFT you need an InstanceType and KeyName. In Terraform this is instance_type and key_name.  Words are separated by underscores and all lowercase.  This makes it somewhat easy to migrate existing CFTs.  All told, it took about a day of experimentation with Terraform to feel comfortable.

Open Source – The general terraform tool is open source, which brings all the good and bad to the table that you normally associate with open source.  As mentioned previously, if you have GoLang resources, the world is your oyster.  Terraform can be made to do whatever you want it to do, and adding back to the repository will enhance it for everyone else.  You can check out the git repo to see that it has pretty active development.

Challenges

Cost – The free version of Terraform is free, but the enterprise version is expensive.  Of course the enterprise version adds a lot of bells and whistles, but I would recommend doing a serious evaluation to determine if they are worth the cost.

No Rollback – Rolling back a CFT deployment or upgrade is sometimes a blessing and sometimes a curse, but with CFT at least you have an option.  With Terraform, there is never an automatic rollback.  You have to figure out what went wrong and plow forward, or first rollback your code then re-deploy.  Either way it can be messy.  However, rollback for AWS CFT can be messy too.  Especially when changes are introduced that make CFT deployment and reconfiguration incompatible.  This invariably leads to the creation of an AWS support ticket to make adjustments to the CFT that is not possible otherwise.

CFT is “tightly coupled” with AWS, while Terraform is not.  This is the YANG to the open source YIN.  Amazon has a dedicated team to continue to improve and update CFTs.  They won’t just focus on the most popular items and will have access to internal resources to vet and prove out their approach.

Conclusion

While this article only scratches the surface of the differences between utilizing AWS CFT and Terraform, it provides a good starting point when evaluating both.  If you’re looking for a better “infrastructure as code,” state management, or multi-provider support, Terraform is definitely worth a look.  We are here to help our customers, so if you need help developing a cloud-first strategy, contact us here.

-Coin Graham, Sr Cloud Consultant, 2nd Watch