Cloud

GKE clusters via Terraform: the variables that actually matter

Most GKE-via-Terraform modules expose every knob the API has, then bury the few that matter. Here's the split I've landed on after a year of customer demos, what to parameterize per environment, what to hardcode and forget, and the regional-vs-zonal trade-off nobody likes to talk about.

Sid Smith

23 Apr 2024 • 7 min read

The GKE demo I’ve been running on customer calls for the last few months is small but deliberately complete. One module, one google_container_cluster resource, one google_container_node_pool resource, and a variables.tf file that’s been quietly shrinking every time I review it.

The shrinking is the point.

When you first build a Terraform module for GKE, the temptation is to expose every variable the Google provider supports, because the API surface is large, and you don’t want to lock anyone out of a knob they might need. Three months in, the module has forty variables. Six months in, somebody onboards onto it and gives up after twenty, because they can’t figure out which five they actually need to set.

The split I want to write down is the one I’ve arrived at after running this module against a few customer scenarios: which variables actually vary per environment, and which ones should be hardcoded inside the module and forgotten about.

The cluster resource, abbreviated

For grounding, the shape of the google_container_cluster block in the demo:

resource "google_container_cluster" "primary" {
  name     = var.cluster_name
  location = var.location

  initial_node_count       = 1
  remove_default_node_pool = true

  cluster_ipv4_cidr = var.cluster_ipv4_cidr

  master_authorized_networks_config {
    dynamic "cidr_blocks" {
      for_each = var.master_authorized_cidrs
      content {
        cidr_block   = cidr_blocks.value.cidr
        display_name = cidr_blocks.value.name
      }
    }
  }

  enable_shielded_nodes = true

  master_auth {
    client_certificate_config {
      issue_client_certificate = false
    }
  }

  logging_config {
    enable_components = ["SYSTEM_COMPONENTS", "WORKLOADS"]
  }

  monitoring_config {
    enable_components = ["SYSTEM_COMPONENTS"]
    managed_prometheus { enabled = true }
  }

  # ... and a few more
}

Five things vary by environment in that block. Everything else is the same in every environment, on purpose.

The five variables that earn their keep

cluster_name. Obvious. Every cluster needs a unique name within its project. The naming convention I push for on engagements is <env>-<region>-<purpose>, e.g., prod-uscentral1-platform, because it puts the most-different parts on the left and the most-similar parts on the right, which is what kubectl config get-contexts is going to display.

location. Region or zone. This is the variable that drives the biggest single decision about the cluster, and most teams get it wrong by accident. We’ll come back to it.

cluster_ipv4_cidr. The pod CIDR. This needs to be unique across every cluster that ever needs to talk to every other cluster, because GKE pods have routable IPs. Get this wrong and three years from now somebody is renumbering pods because two clusters tried to use the same /14. Pick wide ranges and assign them deliberately. I keep a spreadsheet on engagements that tracks every pod CIDR allocated across the estate.

master_authorized_cidrs. The list of IP ranges that are allowed to reach the control plane. Per-environment because the source IPs differ, your office VPN, your CI runner egress, your bastion. This is the variable that gets edited most often after the cluster is initially built, and it is the one that’s worth having a documented PR process for, because mistakes here are direct security findings.

Node pool sizing. Min nodes, max nodes, machine type, disk size. These vary across dev/staging/prod and across workload shapes. They live in a separate google_container_node_pool resource, parameterized via its own variable block. Worth noting: initial_node_count on the cluster is set to 1 and the default pool is immediately removed, because the default pool is harder to manage cleanly than an explicit named one.

That’s it. Five variables. Everything else in the cluster block is a constant.

The things you should hardcode and forget

These are the values I see teams parameterize “for flexibility,” then never vary. They are also the values where the wrong setting causes the most damage. So they belong inside the module, with no variable in front of them.

enable_shielded_nodes = true. Shielded GKE nodes verify the integrity of the node image at boot. There is no reason to turn this off in any environment. If somebody on your team wants to disable shielded nodes, the right answer is to find out what problem they’re trying to solve, not to expose a var.enable_shielded_nodes knob. Hardcode true.

master_auth.client_certificate_config.issue_client_certificate = false. Client certificates on the GKE control plane are a legacy auth mechanism. They cannot be rotated cleanly. They should never be issued. Hardcode false.

logging_config.enable_components = ["SYSTEM_COMPONENTS", "WORKLOADS"]. GKE’s Cloud Logging integration. If you turn it off, you can’t debug anything. The cost is real but acceptable. Hardcode it on, with both components.

monitoring_config.enable_components = ["SYSTEM_COMPONENTS"] plus managed_prometheus { enabled = true }. Same shape. The cost is acceptable, the diagnostic value is high, the trap of having half your clusters with monitoring off is not worth the savings. Hardcode it.

Workload Identity. Configure it once in the module. Don’t expose a flag.

remove_default_node_pool = true and initial_node_count = 1. The default node pool is a footgun. You can’t change its node count in-place without recreating it, and you can’t customize most of its attributes after creation. Always create the cluster with one default node, remove the default pool, then attach named node pools as separate resources. This is a module-level pattern, not a per-environment knob.

The principle I keep coming back to: if a variable’s right answer is the same in every environment you’ve ever cared about, that variable doesn’t belong as a variable. It belongs as a constant in the module, with maybe a comment explaining why.

The variables that matter are the business decisions

The five variables I just listed are not really “the GKE knobs.” They are the organizational business decisions expressed in Terraform vocabulary. Node-pool sizing, machine types, region selection, pod CIDRs, control-plane network policy, those are the decisions the company has already made about what “a production Kubernetes cluster” means. The Terraform module is where those decisions get projected into the GKE primitives that implement them.

This is Decisions as Code (DaC) at the variables surface. The methodology behind nearly every self-service and automation system I’ve designed: extract business decisions out of platform configuration into a small, curated layer, often five real decisions where the raw config exposed eighty-nine. The remaining choices get absorbed into templates and defaults the platform owns. (I called this Property Toolkit during my OneFuse days; the foundation is different, the shape isn’t.)

The variables-surface application is direct. Define the business decisions once (what “small” means, what “production” means, what the approved OS template is) and let each consumer project them through a platform-specific adapter. The consumers used to be vRA7, vRA8, and Terraform, each with their own reserved namespace pulling the same standard values into the shape each platform expected. Today the consumers are GKE, EKS, and AKS modules, each with their own variables.tf pulling from a shared standards module exposing module.standards.size["small"] and module.standards.tags.production. Same methodology, centralize, project, don’t duplicate. The vocabulary changed, the structural problem didn’t.

The reason this matters for the “which variables matter” question is that the variables that matter are exactly the ones that belong in the standards module, not in the GKE module. Machine types, autoscaling ranges, regional defaults, those are organization-level decisions. The GKE module should consume them, not re-define them. When the organization’s prod machine-type policy changes, you change it in one place and every Kubernetes flavor picks it up. That’s the test for whether a variable is “the right kind of variable”, does changing it once propagate to every consumer that needs it, or does it require touching N modules in N repos.

Regional vs zonal, the conversation nobody wants to have

The location variable accepts either a region (us-central1) or a zone (us-central1-a). The difference is enormous and most teams discover it the wrong way around.

A zonal cluster has a single control plane in a single zone. If that zone has an outage, your cluster’s control plane is unreachable. The nodes might keep running (workloads usually do), but you can’t kubectl against the cluster, you can’t autoscale, you can’t recover from anything.

A regional cluster has the control plane replicated across three zones in the region. Single-zone outages don’t affect the control plane. Node pools can be regional too, in which case nodes are distributed across zones automatically.

The cost difference: regional clusters cost more, because Google charges per cluster per hour and the regional control plane is more expensive than the zonal one. The difference is on the order of about seventy dollars a month per cluster in 2024 pricing, meaningful for a dev sandbox, lost in the noise for production.

The recommendation I make on every customer call:

Production: regional. Always. The cost is irrelevant compared to the cost of a zone outage taking your prod control plane down.
Staging: regional, if budget allows. Zonal is a defensible second choice if you’re confident you can rebuild quickly.
Dev: zonal is fine. Especially for ephemeral dev sandboxes that get destroyed and recreated regularly.

The reason this matters as a Terraform module conversation: the same location variable accepts both shapes. A typo (us-central1 vs us-central1-a) silently changes the cluster topology. I’ve seen teams discover, six months after building a “production” cluster, that they accidentally built a zonal one because the value got truncated in a config file. The fix involves recreating the cluster. There is no in-place migration.

Lifecycle hooks help here. On the cluster resource:

lifecycle {
  prevent_destroy = true
  precondition {
    condition     = length(split("-", var.location)) <= 2
    error_message = "Production clusters must use a region, not a zone. Got '${var.location}'."
  }
}

The precondition checks that the variable looks like a region (us-central1, two dash-separated parts) rather than a zone (us-central1-a, three parts). It’s not bulletproof, but it catches the most common typo. Combined with prevent_destroy, you can’t accidentally tear down a production cluster, and you can’t accidentally build a zonal one when you meant regional.

Node pools vs the cluster, kept apart

The other architectural decision worth being deliberate about: keep node pools in separate resources from the cluster.

The shape:

resource "google_container_cluster" "primary" {
  # ... cluster config
  remove_default_node_pool = true
  initial_node_count       = 1
}

resource "google_container_node_pool" "platform" {
  cluster  = google_container_cluster.primary.name
  location = google_container_cluster.primary.location

  initial_node_count = var.node_pool_initial_count
  autoscaling {
    min_node_count = var.node_pool_min
    max_node_count = var.node_pool_max
  }

  node_config {
    machine_type = var.node_pool_machine_type
    # ...
  }
}

The reason: node pool changes need to be safe to apply independently of cluster changes. If you bundle the node pool into the cluster resource’s node_pool block (which the provider permits), then changes to node config force recreates of the cluster. With separate resources, you can change the machine type or update the autoscaling range without touching the cluster.

This also lets you have multiple named node pools (platform, gpu, spot) each with its own configuration, each managed by its own resource. Adding a new node pool is a new resource, not a refactor of an existing block.

The customer-scenario framing

This pattern shows up most often on engagements where an organization is picking a GKE path over EKS. The conversation runs predictably: the team wants Kubernetes, has reasons for picking GCP over AWS for this workload, and has not previously operated a managed control plane at scale. The thing they want from the module is not flexibility. It’s a paved road that does the obvious thing for them while making it hard to do the wrong thing.

Five variables, nice and sensible defaults baked in, regional-by-default for anything labeled production, node pools as separate resources, prevent_destroy on the cluster itself. That’s the module. Everything else is internal.

The version that ships from a “let’s make every option configurable” instinct has twenty more variables and three more lifecycle hooks and ends up being worse, because the surface area is wider, and the team has to develop opinions about every knob.

The version that ships from “what would the on-call engineer at 2 a.m. actually want to change” has five. Pick the five.