Cloud

Codifying brownfield infrastructure into Terraform: the import flow that works

Nobody starts greenfield. Most Terraform work is taking infrastructure already in the cloud and putting it under code. The import flow I've used on customer engagements, the gotchas that bite every time, and why 'good' Terraform from scratch can make brownfield import worse.

Sid Smith

17 Oct 2023 • 6 min read

Every Terraform tutorial starts the same way: terraform init, write some HCL, terraform apply, watch a fresh VPC appear in your AWS account. Greenfield. Clean room. Nobody else has touched the cloud.

Nobody starts there. Not in real environments. The actual job (the one I spend most of my customer engagements on) is taking infrastructure that already exists, written by hand or by some long-forgotten script or by the last engineer’s CloudFormation experiment, and putting it under code without breaking anything.

That work is called codification, and it is the unglamorous heart of every IaC adoption project I’ve ever seen.

I want to write down what the import flow actually looks like, including the parts the docs gloss over, because I keep meeting teams who are stuck halfway through it and don’t realize that the problem they’re hitting is the same problem everyone hits.

The setup

Picture the demo environment I keep rebuilding for these engagements. An AWS account with a scatter of test resources someone created by clicking in the console six months ago. Resources with names like ec2testsid, sidtest-codify2, sid-ec2-module. A handful of VPCs, one or two unused, a security group whose ingress rules nobody can explain, an S3 bucket with public access settings that match no team’s policy.

That account is a small, controllable version of every customer environment I have ever walked into. The names are toy names. The shape of the problem is exact.

The question is the same every time: how do we get this stuff under Terraform without:

Destroying any of it.
Causing a re-create of resources that look right but aren’t quite identical to what we’d write by hand.
Spending six months on it.

The manual import flow

The way Terraform’s docs walk you through this is:

Write a resource block in HCL for the thing you want to import. Empty or near-empty is fine.
Run terraform import resource_type.resource_name <cloud_id>. The cloud ID is the AWS resource ID, the Azure resource ID, the GCP self-link.
Run terraform plan. Look at the diff between the HCL you wrote and the actual state of the resource as it exists in the cloud.
Edit your HCL until the plan is clean.
Repeat for every resource.

That works. It also takes forever, and step 4 is where everybody gets stuck.

The reason step 4 is hard is that real cloud resources have dozens of attributes you didn’t write down because you didn’t have to. Default values that AWS or Azure or GCP picked for you when the resource was created in the console. Attributes that were copied from a parent. Attributes that were set by some integration you forgot existed. terraform plan faithfully reports every one of these as a diff between your HCL and reality.

So you sit there for an hour reconciling forty-seven attributes on a single security group, half of which you didn’t know existed.

The gotcha: lifecycle blocks for the attributes you can’t reproduce

Here’s the thing the docs don’t teach you well: some of those attributes you literally cannot reproduce in HCL without rewriting them every plan.

The clearest example is timestamps. creation_date on an S3 bucket. created_at on an RDS instance. The cloud sets these. You don’t. If they show up as part of the resource state, Terraform will dutifully diff them and try to “fix” them on the next apply, which will either fail or, worse, force a recreate.

The fix is lifecycle { ignore_changes = [...] }. You tell Terraform: yes, this attribute exists, no, don’t try to manage it, leave it alone.

resource "aws_s3_bucket" "imported" {
  bucket = "sidtest-codify2"

  lifecycle {
    ignore_changes = [
      tags["CreatedAt"],
      grant,
    ]
  }
}

The list of things you end up ignoring in a real brownfield import is longer than you’d think. Default VPC tags that AWS adds. Console-set lifecycle rules on S3 buckets. The auto-generated aws:cloudformation:* tags on anything that was ever touched by a CFN stack. EBS volume types that have changed default behavior between API versions. The list goes on, and it is specific to the cloud, the region, and the resource type.

The rule of thumb I tell teams: every lifecycle.ignore_changes block is a place where Terraform has stopped managing reality. That isn’t bad, sometimes it’s the right answer. But you should know how many of them you’ve added, because every one of them is a small region of your infrastructure where IaC is no longer authoritative.

The harder you tried to write good Terraform, the worse this gets

This is the pattern that took me a while to articulate, and I want to say it plainly because it surprises people: the more carefully you’ve structured your Terraform conventions, the more painful brownfield import is.

Here’s why. If you’ve built a sid-ec2-module that takes opinionated inputs (instance type, AMI, security group references, tagging conventions) and produces a “correctly shaped” EC2 instance, then trying to import an existing EC2 instance into that module almost always fails. The imported resource doesn’t match the module’s opinions. The tags don’t conform. The security group references are by ID, not by module output. The AMI is from six months ago and isn’t in your input list. The user data has a comment that doesn’t match your template.

You then face a choice you didn’t want to make:

Bend the module to fit the resource. This pollutes the module with one-off escape hatches for legacy stuff.
Bend the resource to fit the module. This means recreating things, which was the thing you were trying to avoid.
Import outside the module. Now you have two Terraform conventions in the same repo, module-based for new stuff, raw-resource for imported stuff.

In practice everybody picks option three, at least at first. Then six months in, when the import has stabilized, they refactor toward the module. The transition phase is the messy part.

The lesson I’ve drawn: if you know you’re going to inherit brownfield, design your modules with optional inputs and generous defaults, not strict required inputs. Make the module accommodating before you need it to be.

Auto-generation tools

The manual flow is so painful that several tools have grown up to automate the “generate the .tf from cloud state” part of it. The category goes by names like cloud-to-code or reverse Terraform or codify-from-cloud. They all do roughly the same thing:

Read the actual state of resources in a cloud account.
Generate Terraform HCL that matches that state.
Optionally, write the corresponding state file entries so you can skip the terraform import step.

The best of these tools handle the lifecycle-block problem automatically, they know which attributes are cloud-managed and which aren’t, and they generate ignore_changes blocks for the former. They also handle resource relationships, generating module references where appropriate.

I will not name the specific tool I use most often on engagements, let’s just say the category is real and the good tools are good. The point I want to leave you with is: if your codification project is more than a few dozen resources, manual terraform import is the wrong tool. Find a generator. Even if you have to rewrite the output by hand to fit your conventions, starting from a generated baseline is hours of work, not weeks.

What I tell customers

When a team says “we want to put our existing infrastructure under Terraform,” the conversation I have with them is:

Inventory first. Before you import a single thing, list what you have. By account, by region, by resource type, by criticality. A 200-resource codification project is fundamentally different from a 20,000-resource one.

Triage by blast radius. Production database with five years of state? Last to be imported, if ever. A test EC2 instance someone forgot about? First. The order matters.

Use the generator. Manual import is for the dozen weird resources at the end. Don’t start there.

Plan for ignore_changes to grow. Budget time for the second pass where you go through every lifecycle block you generated and decide which ones to keep, which ones to tighten, and which ones to remove because the underlying issue got fixed.

Don’t rewrite the imported HCL into modules in the same PR as the import. Import. Stabilize. Refactor. Three steps, three changes, three review cycles. Conflating them is how brownfield imports turn into multi-month projects.

Run the plan in a pipeline, not locally. The plan-output-in-the-PR pattern I’ve been pushing on every customer this year matters extra for brownfield. You want every reviewer to see what the import is actually going to do before it does it.

The longer thread

Codification is the bridge between the world of cloud-as-clicked-buttons and the world of cloud-as-code. Every team I’ve worked with has to cross that bridge, and every one of them underestimates how long the bridge is.

There is also a category of problem that shows up only after the import is done: the resources whose state in the file doesn’t match the cloud anymore. Drift in one direction, ghost resources in the other. Those are the next two pieces I want to write, because the brownfield work is what creates the conditions for both of them.

For now: if you’re staring at an AWS account full of ec2testsid-style legacy resources, take a breath. The pattern is solvable. It is also boring. That’s why nobody writes about it.

, Sid