Monday, March 20, 2017

Ensure EC2 auto-recovery by rebuilding AMI with Packer

Problem: Amazon EC2 auto recovery fails if machine image has ephemeral block device mappings defined (even when no instance store volumes are attached).

  1. Specify custom block device mapping during instance launch.
  2. Rebuild Ubuntu AMI (with automation).
This post shows how to achieve the later using Packer by HashiCorp.


I don't like to be woken up at 3 AM to restart a server, especially if hardware of the cloud provider running your instances decides to misbehave. Back in 2015 AWS made it possible to automatically recover EC2 instances from issues like:
  • Loss of network connectivity
  • Loss of system power
  • Software issues on the physical host
  • Hardware issues on the physical host that impact network reachability
All you have to do is create a CloudWatch alarm for the metric StatusCheckFailed_System and choose the Recover this instance action.

With some not unreasonable limitations of course:
  • Use a C3, C4, M3, M4, R3, R4, T2, or X1 instance type
  • Run in a VPC (not EC2-Classic)
  • Use shared tenancy (not dedicated hardware)
  • Use EBS volumes (not ephemeral instance store volumes)
One limitations is not mentioned in the docs or troubleshooting guide at the time of this writing, however - you cannot auto-recover instance created from AMI defining ephemeral (a.k.a. instance store) block device mappings. It doesn't matter whether you actually attach ephemeral volumes to your instance, or that instance family doesn't support instance store volumes (M4). If your AMI defines ephemeral devices, auto-recovery fails.

So if you're using official Ubuntu AMIs, you have a problem...

Let's examine the latest AMI of Ubuntu Trusty server from Canonical (HVM, EBS backed, x86_64). Run the following command (feel free to replace --region value with one of your preference):

aws ec2 \
describe-images \
--region ap-southeast-1 \
--owners 099720109477 \
--filters \
Name=root-device-type,Values=ebs \
Name=architecture,Values=x86_64 \
Name=name,Values='*ubuntu-trusty-14.04*' \
--query 'sort_by(Images, &CreationDate)[-1]'

Take note of the "BlockDeviceMappings" array in the output. You should see two ephemeral devices defined:

  "DeviceName": "/dev/sdb",
  "VirtualName": "ephemeral0"
  "DeviceName": "/dev/sdc",
  "VirtualName": "ephemeral1"

Oops... This means that if you're using Canonical AMIs, your instances will not auto-recover.

My team and I have been unfortunate enough to confirm this a couple of times in the past month. AWS support and this forum post suggest to override block device mapping when launching the instance. However, this implies you're using AWS CLI or AWS Console to do so, what if you're lunching your instances into an Auto Scaling Group?

The alternative is to bake your own AMI without ephemeral block device mappings, which is what this post is about.

I'm a big fan of HashiCorp tools and have used Packer in the past to build my own VirtualBox/VMware vagrant boxes, so I decided to revisit it as an AMIs baking tool.

Packer getting started example gave me a good starting point, I just had to make a few small alterations to accommodate my environment and security constrains.

Here's my packer template:

Notable deviations from original example:
  1. Remove "access_key" and "secret_key", these will be read from ~/.aws/credentials or passed via environment with awsudo.
  2. Replace "source_ami" with "source_ami_filter" to automatically use the latest AMI.
  3. Add "ami_block_device_mappings" to remove ephemeral device mappings with the help of "no_device" property.
  4. Add "ami_description" and "tags", cause metadata is good.
  5. Explicitly set "vpc_id", "subnet_id" and "associate_public_ip_address" via external variables file.
  6. Use "force_deregister" and "force_delete_snapshot" to replace existing AMIs with the same name.
The rest is simple: configure your AWS credentials, change the variables sections to match your VPC and run:

packer build ubuntu-14.04-x86_64.json

In my dev environment I run:

awsudo -u sa@dev -- packer build -var-file=vars-dev.json ubuntu-14.04-x86_64.json

and a couple of minutes later you have a new Ubuntu AMI without those pesky device mappings:

amazon-ebs output will be in this color.

==> amazon-ebs: Force Deregister flag found, skipping prevalidating AMI Name
    amazon-ebs: Found Image ID: ami-19c87b7a
==> amazon-ebs: Creating temporary keypair: packer_58cec6dc-dff7-c642-cc8f-3e1fecd15019
==> amazon-ebs: Creating temporary security group for this instance...
==> amazon-ebs: Authorizing access to port 22 the temporary security group...
==> amazon-ebs: Launching a source AWS instance...
    amazon-ebs: Instance ID: i-0587ac0941f65f4f4
==> amazon-ebs: Waiting for instance (i-0587ac0941f65f4f4) to become ready...
==> amazon-ebs: Adding tags to source instance
==> amazon-ebs: Waiting for SSH to become available...
==> amazon-ebs: Connected to SSH!
==> amazon-ebs: Stopping the source instance...
==> amazon-ebs: Waiting for the instance to stop...
==> amazon-ebs: Deregistered AMI hvm-ssd/ubuntu-trusty-14.04-x86_64-server, id: ami-2af74a49
==> amazon-ebs: Deleted snapshot: snap-08dc6ce8b2e28c50d
==> amazon-ebs: Creating the AMI: hvm-ssd/ubuntu-trusty-14.04-x86_64-server
    amazon-ebs: AMI: ami-13e85570
==> amazon-ebs: Waiting for AMI to become ready...
==> amazon-ebs: Modifying attributes on AMI (ami-13e85570)...
    amazon-ebs: Modifying: description
==> amazon-ebs: Modifying attributes on snapshot (snap-01d7b1b146e61d64c)...
==> amazon-ebs: Adding tags to AMI (ami-13e85570)...
==> amazon-ebs: Tagging snapshot: snap-01d7b1b146e61d64c
==> amazon-ebs: Creating AMI tags
==> amazon-ebs: Creating snapshot tags
==> amazon-ebs: Terminating the source AWS instance...
==> amazon-ebs: Cleaning up any extra volumes...
==> amazon-ebs: No volumes to clean up, skipping
==> amazon-ebs: Deleting temporary security group...
==> amazon-ebs: Deleting temporary keypair...
Build 'amazon-ebs' finished.

==> Builds finished. The artifacts of successful builds are:
--> amazon-ebs: AMIs were created:

ap-southeast-1: ami-13e85570

Check to confirm with this command (replacing ami-XXXXXXXX with AMI ID returned by packer):

aws ec2 describe-images --image-ids ami-XXXXXXXX