When IT organizations adopt infrastructure as code (IaC), the benefits in productivity, quality, and ability to function at scale are manifold. However, the first few steps on the journey to full automation and immutable infrastructure bliss can be a major disruption to a more traditional IT operations team’s established ways of working. One of the common problems faced in adopting infrastructure as code is how to structure the files within a repository in a consistent, intuitive, and scaleable manner. Even IT operations teams whose members have development skills will still face this anxiety-inducing challenge simply because adopting IaC involves new tools whose conventions differ somewhat from more familiar languages and frameworks.
In this blog post, we’ll go over how we structure our IaC repositories within 2nd Watch professional services and managed services engagements with a particular focus on Terraform, an open-source tool by Hashicorp for provisioning infrastructure across multiple cloud providers with a single interface.
First Things First: README.md and .gitignore
The task in any new repository is to create a README file. Many git repositories (especially on Github) have adopted Markdown as a de facto standard format for README files. A good README file will include the following information:
- Overview: A brief description of the infrastructure the repo builds. A high-level diagram is often an effective method of expressing this information. 2nd Watch uses LucidChart for general diagrams (exported to PNG or a similar format) and mscgen_js for sequence diagrams.
- Pre-requisites: Installation instructions (or links thereto) for any software that must be installed before building or changing the code.
- Building The Code: What commands to run in order to build the infrastructure and/or run the tests when applicable. 2nd Watch uses Make in order to provide a single tool with a consistent interface to build all codebases, regardless of language or toolset. If using Make in Windows environments, Windows Subsystem for Linux is recommended for Windows 10 in order to avoid having to write two sets of commands in Makefiles: Bash, and PowerShell.
It’s important that you do not neglect this basic documentation for two reasons (even if you think you’re the only one who will work on the codebase):
- The obvious: Writing this critical information down in an easily viewable place makes it easier for other members of your organization to onboard onto your project and will prevent the need for a panicked knowledge transfer when projects change hands.
- The not-so-obvious: The act of writing a description of the design clarifies your intent to yourself and will result in a cleaner design and a more coherent repository.
All repositories should also include a .gitignore file with the appropriate settings for Terraform. GitHub’s default Terraform .gitignore is a decent starting point, but in most cases you will not want to ignore .tfvars files because they often contain environment-specific parameters that allow for greater code reuse as we will see later.
Terraform Roots and Multiple Environments
A Terraform root is the unit of work for a single terraform apply command. We group our infrastructure into multiple terraform roots in order to limit our “blast radius” (the amount of damage a single errant terraform apply can cause).
- Repositories with multiple roots should contain a roots/ directory with a subdirectory for each root (e.g. VPC, one per-application) tf file as the primary entry point.
- Note that the roots/ directory is optional for repositories that only contain a single root, e.g. infrastructure for an application team which includes only a few resources which should be deployed in concert. In this case, modules/ may be placed in the same directory as tf.
- Roots which are deployed into multiple environments should include an env/ subdirectory at the same level as tf. Each environment corresponds to a tfvars file under env/ named after the environment, e.g. staging.tfvars. Each .tfvars file contains parameters appropriate for each environment, e.g. EC2 instance sizes.
Here’s what our roots directory might look like for a sample with a VPC and 2 application stacks, and 3 environments (QA, Staging, and Production):
Terraform modules are self-contained packages of Terraform configurations that are managed as a group. Modules are used to create reusable components, improve organization, and to treat pieces of infrastructure as a black box. In short, they are the Terraform equivalent of functions or reusable code libraries.
Terraform modules come in two flavors:
- Internal modules, whose source code is consumed by roots that live in the same repository as the module.
- External modules, whose source code is consumed by roots in multiple repositories. The source code for external modules lives in its own repository, separate from any consumers and separate from other modules to ensure we can version the module correctly.
In this post, we’ll only be covering internal modules.
- Each internal module should be placed within a subdirectory under modules/.
- Module subdirectories/repositories should follow the standard module structure per the Terraform docs.
- External modules should always be pinned at a version: a git revision or a version number. This practice allows for reliable and repeatable builds. Failing to pin module versions may cause a module to be updated between builds by breaking the build without any obvious changes in our code. Even worse, failing to pin our module versions might cause a plan to be generated with changes we did not anticipate.
Here’s what our modules directory might look like:
Terraform and Other Tools
Terraform is often used alongside other automation tools within the same repository. Some frequent collaborators include Ansible for configuration management and Packer for compiling identical machine images across multiple virtualization platforms or cloud providers. When using Terraform in conjunction with other tools within the same repo, 2nd Watch creates a directory per tool from the root of the repo:
Putting it all together
The following illustrates a sample Terraform repository structure with all of the concepts outlined above:
There’s no single repository format that’s optimal, but we’ve found that this standard works for the majority of our use cases in our extensive use of Terraform on dozens of projects. That said, if you find a tweak that works better for your organization – go for it! The structure described in this post will give you a solid and battle-tested starting point to keep your Terraform code organized so your team can stay productive.
- The Terraform Book by James Turnbull provides an excellent introduction to Terraform all the way through repository structure and collaboration techniques.
- The Hashicorp AWS VPC Module is one of the most popular modules in the Terraform Registry and is an excellent example of a well-written Terraform module.
- The source code for James Nugent’s Hashidays NYC 2017 talk code is an exemplary Terraform repository. Although it’s based on an older version of Terraform (before providers were broken out from the main Terraform executable), the code structure, formatting, and use of Makefiles is still current.
For help getting started adopting Infrastructure as Code, contact us.
-Josh Kodroff, Associate Cloud Consultant
With increased focus on security and governance in today’s digital economy, I want to highlight a simple but important use case that demonstrates how to use AWS Identity and Access Management (IAM) with Security Token Service (STS) to give trusted AWS accounts access to resources that you control and manage.
Security Token Service is an extension of IAM and is one of several web services offered by AWS that does not incur any costs to use. But, unlike IAM, there is no user interface on the AWS console to manage and interact with STS. Rather all interaction is done entirely through one of several extensive SDKs or directly using common HTTP protocol. I will be using Terraform to create some simple resources in my sandbox account and .NET Core SDK to demonstrate how to interact with STS.
The main purpose and function of STS is to issue temporary security credentials for AWS resources to trusted and authenticated entities. These credentials operate identically to the long-term keys that typical IAM users have, with a couple of special characteristics:
- They automatically expire and become unusable after a short and defined period of time elapses
- They are issued dynamically
These characteristics offer several advantages in terms of application security and development and are useful for cross-account delegation and access. STS solves two problems for owners of AWS resources:
- Meets the IAM best-practices requirement to regularly rotate access keys
- You do not need to distribute access keys to external entities or store them within an application
One common scenario where STS is useful involves sharing resources between AWS accounts. Let’s say, for example, that your organization captures and processes data in S3, and one of your clients would like to push large amounts of data from resources in their AWS account to an S3 bucket in your account in an automated and secure fashion.
While you could create an IAM user for your client, your corporate data policy requires that you rotate access keys on a regular basis, and this introduces challenges for automated processes. Additionally, you would like to limit the distribution of access keys to your resources to external entities. Let’s use STS to solve this!
To get started, let’s create some resources in your AWS cloud. Do you even Terraform, bro?
Let’s create a new S3 bucket and set the bucket ACL to be private, meaning nobody but the bucket owner (that’s you!) has access. Remember that bucket names must be unique across all existing buckets, and they should comply with DNS naming conventions. Here is the Terraform HCL syntax to do this:
Great! We now have a bucket… but for now, only the owner can access it. This is a good start from a security perspective (i.e. “least permissive” access).
What an empty bucket may look like
Let’s create an IAM role that, once assumed, will allow IAM users with access to this role to have permissions to put objects into our bucket. Roles are a secure way to grant trusted entities access to your resources. You can think about roles in terms of a jacket that an IAM user can wear for a short period of time, and while wearing this jacket, the user has privileges that they wouldn’t normally have when they aren’t wearing it. Kind of like a bright yellow Event Staff windbreaker!
For this role, we will specify that users from our client’s AWS account are the only ones that can wear the jacket. This is done by including the client’s AWS account ID in the Principal statement. AWS Account IDs are not considered to be secret, so your client can share this with you without compromising their security. If you don’t have a client but still want to try this stuff out, put your own AWS account ID here instead.
Great, now we have a role that our trusted client can wear. But, right now our client can’t do anything except wear the jacket. Let’s give the jacket some special powers, such that anyone wearing it can put objects into our S3 bucket. We will do this by creating a security policy for this role. This policy will specify what exactly can be done to S3 buckets that it is attached to. Then we will attach it to the bucket we want our client to use. Here is the Terraform syntax to accomplish this:
A couple things to note about this snippet – First, we are using Terraform interpolation to inject values from previous terraform statements into a couple of places in the policy – specifically the ARN from the role and bucket we created previously. Second, we are specifying a condition for the s3 policy – one that requires a specific object ACL for the action s3:PutObject, which is accomplished by including the HTTP request header x-amz-acl to have a value of bucket-owner-full-control with the PUT object request. By default, objects PUT in S3 are owned by the account that created them, even if it is stored in someone else’s bucket. For our scenario, this condition will require your client to explicitly grant ownership of objects placed in your bucket to you, otherwise the PUT request will fail.
So, now we have a bucket, a policy in place on our bucket, and a role that assumes that policy. Now your client needs to get to work writing some code that will allow them to assume the role (wear the jacket) and start putting objects into your bucket. Your client will need to know a couple of things from you before they get started:
- The bucket name and the region it was created in (the example above created a bucket named d4h2123b9-xaccount-bucket in us-west-2)
- The ARN for the role (Terraform can output this for you). It will look something like this but will have your actual AWS Account ID: arn:aws:iam::123456789012:role/sts-delegate-role
They will also need to create an IAM User in their account and attach a policy allowing the user to assume roles via STS. The policy will look similar to this:
Let’s help your client out a bit and provide some C# code snippets for .NET Core 2.0 (available for Windows, macOS and LinuxTo get started, install the .NET SDK for your OS, then fire up a command prompt in a favorite directory and run these commands:
The first command will create a new console app in the subdirectory s3cli. Then switch context to that directory and import the AWS SDK for .NET Core, and then add packages for SecurityToken and S3 services.
Once you have the libraries in place, fire up your favorite IDE or text editor (I use Visual Studio Code), then open Program.cs and add some code:
This snippet sends a request to STS for temporary credentials using the specified ARN. Note that the client must provide IAM user credentials to call STS, and that IAM user must have a policy applied that allows it to assume a role from STS.
This next snippet takes the STS credentials, bucket name, and region name, and then uploads the Program.cs file that you’re editing and assigns it a random key/name. Also note that it explicitly applies the Canned ACL that is required by the sts-delegate-role:
So, to put this all together, run this code block and make the magic happen! Of course, you will have to define and provide proper variable values for your environment, including securely storing your credentials.
Try it out from the command prompt:
If all goes well, you will have a copy of Program.cs in the bucket. Not very useful itself, but it illustrates how to accomplish the task.
What a bucket with something in it may look like
Here is a high-level document of what we put together:
Putting it all together
- Your client uses their IAM user to call AWS STS and requests the role ARN you gave them
- STS authenticates the client’s IAM user and verifies the policy for the ARN role, then issues a temporary credential to the client.
- The client can use the temporary credentials to access your S3 bucket (they will expire soon), and since they are now wearing the Event Staff jacket, they can successfully PUT stuff in your bucket!
There are many other use-cases for STS. This is just one very simplistic example. However, with this brief introduction to the concepts, you should now have a decent idea of how STS works with IAM roles and policies, and how you can use STS to give access to your AWS resources for trusted entities. For more tips like this, contact us.
-Jonathan Eropkin, Cloud Consultant
There is a feature in the Linux Kernel that is relevant to VM’s hosted on Xen servers that is called the “steal percentage.” When the OS requests from the host system’s use of the CPU and the host CPU is currently tied up with another VM, the Xen server will send an increment to the guest Linux instance which increases the steal percentage. This is a great feature as it shows exactly how busy the host system is, and it is a feature available on many instances of AWS as they host using Xen. It is actually said that Netflix will terminate an AWS instance when the steal percentage crosses a certain threshold and start it up again, which will cause the instance to spin up in a new host server as a proactive step to ensure their system is utilizing their resources to the fullest.
What I wanted to discuss here is that it turns out there is a bug in the Linux kernel versions 4.8, 4.9 and 4.10 where the steal percentage can be corrupted during a live migration on the physical Xen server, which causes the CPU utilization to be reported as 100% by the agent.
When looking at Top you will see something like this:
As you can see in the screen shot of Top, the %st metric on the CPU(s) line shows an obviously incorect number.
During a live migration on the physical Xen server, the steal time gets a little out of sync and ends up decrementing the time. If the time was already at or close to zero, itcauses the time to become negative and, due to type conversions in the code, it causes an overflow.
CloudWatch’s CPU Utilization monitor calculates that utilization by adding the System and User percentages together. However, this only gives a partial view into your system. With our agent, we can see what the OS sees.
That is the Steal percentage spiking due to that corruption. Normally this metric could be monitored and actioned as desired, but with this bug it causes noise and false positives. If Steal were legitimately high, then the applications on that instance would be running much slower.
There is some discussion online about how to fix this issue, and there are some kernel patches to say “if the steal time is less than zero, just make it zero.” Eventually this fix will make it through the Linux releases and into the latest OS version, but until then it needs to be dealt with.
We have found that a reboot will clear the corrupted percentage. The other option is to patch the kernel… which also requires a reboot. If a reboot is just not possible at the time, the only impact to the system is that it makes monitoring the steal percentage impossible until the number is reset.
It is not a very common issue, but due to the large number of instances we monitor here at 2nd Watch, it is something that we’ve come across frequently enough to investigate in detail and develop a process around.
If you have any questions as to whether or not your servers hosted in the cloud might be effected by this issue, please contact us to discuss how we might be able to help.
-James Brookes, Product Manager
Picking up where we left off…
In my previous blog I gave a fairly high-level overview of what automated AWS account management could (or rather should) entail. This blog will drill deeper into the processes and give you some real-world code samples of what this looks like.
AWS Organizations and Linked Account Creation:
As mentioned in my last blog, AWS recently announced the general availability of AWS Organizations, allowing you to create linked or nested AWS accounts under a master account and apply policy-based management under the umbrella of the root account. It also allows for hierarchical management (up to five levels deep) of linked accounts by Organizational Units (OU). Policies can be applied at the global level, OU level, and individual account level. It is important to note that conflicting policies always defer to the parent entities permission set. Meaning an IAM user/role in account may have permissions to perform some action, but, if at the Organizations level the account, OU, or global settings deny those actions, the resulting action for the IAM resource will be denied. Likewise, the effective permissions for a resource are a union of the resource’s direct permissions assigned in IAM and the permissions that are controlled by Organizations. This means you can lock linked accounts down to do things like “only manage DNS Route53 resources” or “only manage S3 resources” using Organizations policies. Pretty nice way of segmenting off security and reducing the potential blast radius.
I am going to pick the most common denominator for my following examples… AWS CLI. Though I rarely use it for actual automation code, I figure most folks are familiar with it and it has a pretty intuitive syntax.
Step 1: Enable Organizations on your root account
Ensure that your AWS Profile environment variable is set to your desired root account AWS profile that has the necessary permissions to work with AWS Organizations. Alternatively, if you don’t want to use an environment variable, you can either ensure the default AWS Profile is the one which has permissions on your root account or you can specify the –profile argument when typing your AWS CLI commands. I’m going to use the AWS_DEFAULT_PROFILE environment variable in my examples here (output redacted).
> export AWS_DEFAULT_PROFILE=myrootacctadmin
This of course assumes you have a profile set up under your HOME dir in the .aws/credentials file named myrootacctadmin.
Minimally, this will look something like this:
aws_access_key_id = AKI?????????????????
aws_secret_access_key = somesecretaccesskey0somesecretaccesskey0
Now that we have our environment set we can get on with running the AWS CLI commands to create our organization.
Let’s be safe and make sure we don’t already have an organization created under our root account:
$ aws organizations list-roots
An error occurred (AWSOrganizationsNotInUseException) when calling the ListRoots operation: Your account is not a member of an organization.
As the error message indicates, this account is not currently a part of any organization and will need to be configured to use organizations if we want to use this as our master account and create linked accounts underneath it.
Easy enough, let’s just create our organization…
> aws organizations create-organization
Now that we have created an organization let’s try our list-roots command again to see if we get something different this time…
> aws organizations list-roots
Indeed! our myrootacctadmin account is listed as the root (i.e. master) of our entire organization. This is exactly what we wanted. Now let’s see what AWS accounts are identified as part of this organization…
> aws organizations list-accounts
"Name": "Satoshi Nakamoto",
As expected, just our root account. It looks kind of lonely there all by itself, so let’s go ahead and create a Linked account underneath it.
Step 2. Create a Linked Account under your Organization
> aws organizations create-account --email firstname.lastname@example.org --account-name brawndo
The actual creation of the account is not instantaneous, and the API responds to the create-account call before the new account creation is complete. While it is pretty quick to complete, unless we ensure that it is completed before performing any additional automation against it, we may receive an error from the API indicating the account is not yet ready. So prior to performing additional configuration on the new account, we need to ensure the State has reached SUCCEEDED. You will generally just loop until the State is equal to SUCCEEDED in your automation code before moving on to the next step. Also, it might be a good idea to catch failures (e.g. State == “FAILED”) and handle those gracefully. The account creation status can be achieved as follows:
> aws organizations describe-create-account-status --create-account-request-id car-0123456789abcdef0123456789abcdef
Congratulations! You’ve just enabled AWS Organizations and created your first linked account!
At this point you should have a couple of emails from AWS in the inbox of the email address used to create the new account. They are standard boiler-plate emails. One of which is a “Welcome to Amazon Web Services” email and the other tells you that your account is ready and has some “getting started” type links.
Step 3: Reset New Linked Account Root Password
Now that your linked account has been created you will need to go through the AWS Reset Root Account Password workflow to make your new account accessible from either the AWS Web Console or the AWS APIs. The recommended approach here is to reset the root account password, enable MFA, Create an IAM user with Administrator privileges, store the root account secrets in a VERY secure place, and only use them as a last resort for account access.
Here’s a shortened URL that will take you directly to the root account password reset page: http://amzn.pw/45Nxe
Step 4: (Optionally) Create Organizational Units
Let’s go through a couple of examples of Organizational Units.
- OU for only allowing S3 services
- OU for only allowing services in us-west-2 and us-east-1 regions
“What if I want to bring my existing accounts under the umbrella of Organizations?” you ask
Good news! You can invite existing AWS accounts to join your organization. Using the API you can issue an invitation to an existing account by Account ID, Email, or Organization. For the sake of simplicity, let’s use an Account ID (222222222222) for the following example (again, using the root/master account AWS profile):
> aws organizations invite-account-to-organization --target Id=222222222222,Type=ACCOUNT
"Value": "Satoshi Nakamoto",
A couple of things of note – The handshake Id is what will be required to accept the invitation on the linked account side. Notice the difference between the RequestedTimestamp (epoch 1524610827.55) and the ExpirationTimestamp (epoch 1525906827.55). 1296000 seconds. Divide that by 86400 seconds in a day and we get 15 days.
At this point you have 15 days to issue an acceptance of the invitation (aka: handshake), from the target AWS account. You could simply log in to the AWS Web Console, navigate to Organizations, and accept the invitation, but that’s not what this article is about now is it? We’re talking automation here! And, as all good DevOpsers know, we utilize security entities that employ PoLP (Principal of Least Privilege) to perform process-specific tasks.
This means we aren’t going to do something ludicrous like adding AWS Access Keys to our root account login (please don’t ever do this). Nor are we going to create an IAM User with Administrator access for this very specific task. You can either create a User or a Role in the target account to accept the handshake, although, creating a Role will require you to assume that Role using STS, which might be overkill. On the other hand, you might use a lambda function to automate the handshake in which case you most certainly would utilize an IAM Role. Either way, the following IAM Policy Document will provide the User/Role with the required permissions to accept (or delete) the invitation:
Using the AWS CLI (leveraging a profile of a User/Role with the aforementioned permissions under the existing target account), you would issue the following command to accept the invitation/handshake:
> aws organizations accept-handshake --handshake-id h-0123456789abcdef0123456789abcdef
"Value": "Satoshi Nakamoto",
The returned JSON struct is the exact same handshake struct returned by the API when we issued the invitation with one important difference. The State property is now reflecting a value of ACCEPTED.
That’s it. You’ve successfully linked an existing account into your Organization under the master billing account.
In the next installment, I will go into depth on the processes involved in automating the Account Initialization, Configuration, and Continuous Compliance.
Thanks for tuning in!
-Ryan Kennedy, Principal Cloud Automation Architect
Why do it?
Alexa gets a lot of use in our house, and it is very apparent to me that the future is not a touch screen or a mouse, but voice. Creating an Alexa skill is easy to learn by watching videos and such, but actually creating the skill is a great way to understand the ins and outs of the process and what the backend systems (like AWS Lambda) are capable of.
First you need a problem
To get started, you need a problem to solve. Once you have the problem, you’ll need to think about the solution before you write a line of code. What will your skill do? You need to define the requirements. For my skill, I wanted to ask Alexa to “park my cloud” and have her stop all EC2 instances or RDS databases in my environment.
Building a solution one word at a time
Now that I’ve defined the problem and have an idea for the requirements of the solution, it’s time to start building the skill. The first thing you’ll notice is that the Alexa Skill port is not in the standard AWS portal. You need to go to developer.amazon.com/Alexa and create a developer account and sign in there. Once inside, there is a lot of good information and videos on creating Alexa skills that are worth reviewing. Click the “Create Skill” button to get started. In my example, I’m building a custom skill.
The process for building a skill is broken into major sections; Build, Test, Launch, Measure. In each one you’ll have a number of things to complete before moving on to the next section. The major areas of each section are broken down on the left-hand side of the console. On the initial dashboard you’re also presented with the “Skill builder checklist” on the right as a visual reminder of what you need to do before moving on.
This is the first area you’ll work on in the Build phase of your Alexa skill. This is setting up how your users will interact with your skill.
Invocation will setup how your users will launch your skill. For simplicity’s sake, this is often just the name of the skill. The common patterns will be “Alexa, ask [my skill] [some request],” or “Alexa, launch [my skill].” You’ll want to make sure the invocation for your skill sounds natural to a native speaker.
I think of intents as the “functions” or “methods” for my Alexa skill. There are a number of built-in intents that should always be included (Cancel, Help, Stop) as well as your custom intents that will compose the main functionality of your skill. Here my intent is called “park” since that will have the logic for parking my AWS systems. The name here will only be exposed to your own code, so it isn’t necessarily important what it is.
Utterances is your defined pattern of how people will use your skill. You’ll want to focus on natural language and normal patterns of speech for native users in your target audience. I would recommend doing some research and speaking to a diversity of people to get a good cross section of utterances for your skill. More is better.
Amazon also provides the option to use slots (variables) in your utterances. This allows your skill to do things that are dynamic in nature. When you create a variable in an utterance you also need to create a slot and give it a slot type. This is like providing a type to a variable in a programming language (Number, String, etc.) and will allow Amazon to understand what to expect when hearing the utterance. In our simple example, we don’t need any slots.
Interfaces allow you to interface your skill with other services to provide audio, display, or video options. These aren’t needed for a simple skill, so you can skip it.
Here’s where you’ll connect your Alexa skill to the endpoint you want to handle the logic for your skill. The easiest setup is to use AWS Lambda. There are lots of example Lambda blueprints using different programming languages and doing different things. Use those to get started because the json response formatting can be difficult otherwise. If you don’t have an Alexa skill id here, you’ll need to Save and Build your skill first. Then a skill id will be generated, and you can use it when configuring your Lambda triggers.
AWS Account Lambda
Assuming you already have an AWS account, you’ll want to deploy a new Lambda from a blueprint that looks somewhat similar to what you’re trying to accomplish with your skill (deployed in US-East-1). Even if nothing matches well, pick any one of them as they have the json return formatting set up so you can use it in your code. This will save you a lot of time and effort. Take a look at the information here and here for more information about how to setup and deploy Lambda for Alexa skills. You’ll want to configure your Alexa skill as the trigger for the Lambda in the configuration, and here’s where you’ll copy in your skill id from the developer console “Endpoints” area of the Build phase.
While the actual coding of the Lambda isn’t the purpose of the article, I will include a couple of highlights that are worth mentioning. Below, see the part of the code from the AWS template that would block the Lambda from being run by any Alexa skill other than my own. While the chances of this are rare, there’s no reason for my Lambda to be open to everyone. Here’s what that code looks like in Python:
if (event[‘session’][‘application’][‘applicationId’] != “amzn1.ask.skill.000000000000000000000000”):
raise ValueError(“Invalid Application ID”)
Quite simply, if the Alexa application id passed in the session doesn’t match my known Alexa skill id, then raise an error. The other piece of advice I’d give about the Lambda is to create different methods for each intent to keep the logic separated and easy to follow. Make sure you remove any response language from your code that is from the original blueprint. If your responses are inconsistent, Amazon will fail your skill (I had this happen multiple times because I borrowed from the “Color Picker” Lambda blueprint and had some generic responses left in the code). Also, you’ll want to handle your Cancel, Help, and Stop requests correctly. Lastly, as best practice in all code, add copious logging to CloudWatch so you can diagnose issues. Note the ARN of your Lambda function as you’ll need it for configuring the endpoints in the developer portal.
Once your Lambda is deployed in AWS, you can go back into the developer portal and begin testing the skill. First, put your Lambda function ARN into the endpoint configuration for your skill. Next, click over to the Test phase at the top and choose “Alexa Simulator.” You can try recording your voice on your computer microphone or typing in the request. I recommend you do both to get a sense of how Alexa will interpret what you say and respond. Note that I’ve found the actual Alexa is better at natural language processing than the test options using a microphone on my laptop. When you do a test, the console will show you the JSON input and output. You can take this INPUT pane and copy that information to build a Lambda test script on your Lambda function. If you need to do a lot of work on your Lambda, it’s a lot easier to test from there than to flip back and forth. Pay special attention to your utterances. You’ll learn quickly that your proposed utterances weren’t as natural as you thought. Make updates to the utterances and Lambda as needed and keep testing.
Now you wait. Amazon seems to have a number of automated processes that catch glaring issues, but you will likely end up with some back and forth between yourself and an Amazon employee regarding some part of your skill that needs to be updated. It took about a week to get my final approval and my skill posted.
Creating your own simple Alexa skill is a fun and easy way to get some experience creating applications that respond to voice and understand what’s possible on the platform. Good luck!
-Coin Graham, Senior Cloud Consultant
Originally, I thought I’d give a deep dive into the mechanics of some of our automated work flows at 2nd Watch, but I really think it’s best to start at the very beginning. We need to understand “why” we need to automate our delivery. In enterprise organizations, this delivery is usually setup in a very waterfall way. The artifact is handed off to another team to push to the different environments and to QA to test. Sometimes it works but usually not so much. In the “not so much” case, it’s handed back to DEV which interrupts their current work.
That back and forth between teams is known as waste in the Lean/Agile world. Also known as “Throwing it over the wall” or “Hand offs.” This is a primary aspect any Agile process intends to eliminate and what’s led to the DevOps movement.
Now DevOps is a loaded term, much like “Agile” and “SCRUM.” It has its ultimate meaning, but most companies go part way and then call it won. The changes that get the biggest positive effects are cultural, but many look at it and see the shiny “automation” as the point of it all. Automation helps, but the core of your automation should be driven by the culture of quality over all. Just keep that in mind as you read though this article, which is specifically about all that yummy automation.
There’s a Process
Baby steps here. You can’t have one without the other, so there are a series of things that need to happen before you make it to a fully automated process.
Before that though, we need to look at what a deployment is and what the components of a deployment are.
First and foremost, so long as you have your separate environments, development has no impact on the customer, therefore no impact to the business at large. There is, essentially, no risk while a feature is in development. However, the business assumes ALL the risk when there is a deployment. Once you cross that line, the customer will interact with it and either love it, hate it, or ignore it. From a development standpoint, you work to minimize that risk before you cross the deployment line – Different environments, testing, a release process etc. These are all things that can be automated, but only when that risk has been sufficiently minimized.
Step I: Automated testing
You can’t do CI or CD without testing, and it’s the logical first step. In order to help minimize the deployment risk, you should automate ALL of your testing. This will greatly increase your confidence that changes introduced will not impact the product in ways that you may not know about BEFORE you cross that risk point in deployment. The closer an error occurs to the time at which it’s implemented the better. Automated testing greatly reduces this gap by providing feedback to the implementer faster while being able to provide repeatable results.
Step II: Continuous Integration
Your next step is to automate your integration (and integration tests, right?), which should further provide you with confidence in the change that you and your peers have introduced. The smaller the gap between integrations (just as with testing), the better, as you’ll provide feedback to the implementers faster. This means you can operate on any problems while your changes are fresh in your mind. Utilizing multiple build strategies for the same product can help as well. For instance, running integration on a push to your scm (Source Control Management), as well as nightly builds.
Remember, this is shrinking that risk factor before deployment.
Step III: Continuous Deployment
With Continuous Deployment you take traditional Continuous Delivery a step further by automatically pushing the artifacts created by a Continuous Delivery process into production. This automation of deployments is the final step and is another important process in mitigating that risk for when you push to production. Deploying to each environment and then running that environment’s specific set of tests is your final check before you are able to say with confidence that a change did not introduce a fault. Remember, you can automate the environments as well by using infrastructure as code tooling around virtual technology (i.e. The Cloud).
Continuous Deployment is the ultimate goal, as a change introduced into the system triggers all of your confidence-building tools to minimize the risk to your customers once it’s been deployed to the production system. Automating it all not only improves the quality, but reduces the feedback to the implementer, increasing efficiency as well.
I hope that’s a good introduction! In our next post, we’ll take a more technical look at the tooling and automation we use in one of our work flows.
-Craig Monson, Sr Automation Architect (Primary)