Decisions as Code: The Methodology Behind Every Self-Service System I Design

Every platform team buries the business in an eighty-nine-row YAML and calls it self-service. That isn't self-service, it's a configuration handoff. Decisions as Code is the discipline of extracting the five decisions that actually matter and letting the platform absorb the rest.

Decisions as Code: The Methodology Behind Every Self-Service System I Design

Every platform team I’ve worked with eventually does the same thing. They build a powerful platform, expose its full configuration surface to their users, write a Confluence page that says “self-service,” and announce that they’ve delivered the developer portal. Six months later the users are filing tickets again, because nobody on a product team wants to spend twenty minutes filling out an eighty-nine-row YAML to ship a service. That isn’t self-service. That’s a configuration handoff dressed in a self-service costume.

The shape I keep returning to is what I now call Decisions as Code. DaC for short. It is the methodology behind nearly every self-service and automation system I’ve designed. The core move is small and the consequences are large: extract the business decisions out of the platform’s raw configuration into a small, curated layer, and let the platform absorb everything else through templates and defaults the platform owns.

In practice it usually means five decisions where the raw config exposed eighty-nine. The user reads five lines and ships. The platform team owns the remaining eighty-four. Nobody is worse off; almost everyone is materially better off. This is the through-line of how I think about every platform I build.

The hook: configuration handoffs are not self-service

The reason “self-service” platforms keep failing the people they’re built for is that the platform team and the user disagree about which decisions the user actually wants to make. The platform team sees a hundred knobs and dutifully exposes them all because they don’t want to lock anyone out. The user sees a hundred knobs and gives up, because they came to the platform with a request (“I need a small production-class database in the EU region”) that they could express in one sentence.

That gap between the user’s sentence and the platform’s hundred knobs is where DaC lives. The user’s sentence has maybe five real decisions in it: workload class, environment, region, sizing, ownership. Everything else (the replication factor, the parameter group, the encryption settings, the lifecycle policy, the security group rules, the IAM bindings, the backup schedule, the tag set, the network policy, the maintenance window) is a platform decision dressed up as a user decision. The user neither knows nor cares which Postgres parameter group is the right one; they just want a production-class database. The platform should know and care, on the user’s behalf.

DaC asks the platform team to take ownership of the platform’s decisions. Curate the surface the user actually has to make decisions on, and bury the rest behind sensible defaults, templates, and conventions. The platform stops being a configuration shop and starts being a decision shop. That’s the whole shape.

The methodology

The five primitives of DaC, the things I find myself reaching for every time I build one of these systems, regardless of foundation, are these.

Centralization of decisions. There is one source of truth for the small set of decisions the business actually makes. Workload class. Sizing tier. Environment. Region. Ownership. Data classification. Whatever the company has decided are the categories that matter. They live in one place. Every consumer of the platform pulls from that place. When the business changes its mind about what “production” means, you change it in one file and every consumer picks it up on the next render.

Platform-aware shapes via reserved namespaces. Different consuming platforms need the same decision in different shapes. The decision “this is a production workload” lands as a label on a Kubernetes Deployment, a tag on a Terraform resource, a metadata field on a Backstage entity, and a property on whatever else needs it. The centralized layer holds the standard decision; per-platform adapters project it into the platform-correct vocabulary. Reserved namespaces in the standard structure make it predictable which slice each platform will pull.

Template and variable interpolation. The decisions get pulled into the platform-specific artifacts through a templating mechanism. Helm templates, Terraform locals, a config-as-data renderer, whatever the foundation provides. The standard value is referenced, never copied. Change the standard value, every dependent artifact picks it up.

Composition and nesting. Decisions reference other decisions. The “production application” decision references the “production environment” decision references the “EU region” decision. Changing the standard “EU region” definition propagates through every dependent layer without anyone touching the layers in between. This is what makes the methodology scale, without composition you end up flattening every decision into every consumer, which is the duplication problem you were trying to escape.

A discovery convention. Adapters need to find what they’re looking for. A predictable naming scheme, standards.sizing.small, standards.env.prod, standards.classification.pii, lets templates address decisions by path. No surprises. No archaeology. The convention is part of the contract.

Those five primitives are what I find myself rebuilding every time. Substrates change. The primitives don’t.

The lineage

I called this Property Toolkit when I built the first version of it in 2021, during the OneFuse days. The OneFuse Property Toolkit was a structured property surface that let you define organizational business standards once and project them onto vRA7, vRA8, Terraform, and anything else that needed them. T-shirt sizing in one file, consumed as VirtualMachine.CPU.Count by vRA7, flavor by vRA8, vcpu by Terraform. Nested property sets so changing the standard “production Linux template” propagated to every dependent application.

The tool is in maintenance now. The companies have changed hands. The methodology has aged better than the product. I’ve since used the same shape (same five primitives) in:

  • Terraform, where module.standards is the standard decisions layer and every infrastructure module is a consumer.
  • Helm, where a library chart plus values.schema.json exposes the standard decisions and every application chart calls into its named templates.
  • Crossplane, where Compositions are the platform-aware adapters and the XR (Composite Resource) is the curated decision surface.
  • OPA / Gatekeeper, where ConstraintTemplates are the policy class and Constraints are the per-consumer instantiation.
  • Backstage, where Software Templates are the form surface that absorbs everything past the five user decisions.
  • Argo Workflows, where WorkflowTemplates are the standard pipeline shape and per-project Workflows pull from them.
  • AI platform layers, where the standard decisions are model class, eval thresholds, GPU sizing, cost-center labels, and per-environment routing.

In every case the methodology is unchanged. The foundation is different. The methodology is what travels.

The test

The test for whether you’ve gotten the design right is conversational. Sit the business owner (the person who actually owns the decisions you’re trying to surface) down in front of the decision surface and ask them to walk through it. If they can read it in five minutes and tell you what choice they’re making, the design is right. If they need a translator (you, the platform engineer, sitting next to them explaining what each field means), the platform is leaking. You haven’t extracted the decisions; you’ve just relabeled the configuration.

When the surface is right, the conversation shifts. The business owner says “I want a production-class workload” and the platform takes care of the rest. They don’t ask what the network policy is, because the network policy is a platform decision, not a business decision. They don’t ask which IAM role to attach, because that’s a platform decision too. They make the decisions that are theirs to make; the platform makes the decisions that are its to make. Both sides own less. Both sides ship faster.

This test is the cheapest one I know. It’s also the one that platform teams almost never run, because it requires admitting that the platform is leaking decisions it should own. It’s worth running anyway.

What it isn’t

DaC is not a framework. There is no DaC tool to install. There are tools that help you do DaC well (every one of the substrates above can be the implementation) but the methodology lives in the discipline, not in any of them. If somebody tries to sell you “the DaC platform,” they’re selling you the foundation; the methodology is yours to apply on top.

It is not BPMN, business-process modeling, or anything that lives in a workflow-diagramming tool. BPMN was an attempt to draw business processes; DaC is the discipline of curating the decision surface a business process touches. Different problem, different shape.

It is not low-code. Low-code is about making the platform’s full configuration surface clickable instead of typeable. DaC is about deciding that the user shouldn’t see most of the platform’s configuration surface at all. The user touches the five decisions; the platform handles the rest, whether the platform is HCL or YAML or clickable forms.

It is not config-as-code. Config-as-code is the foundation, it’s how the platform stores its configuration. DaC is the methodology applied at the boundary between the platform and the business. You can have config-as-code without DaC (most platform teams do, that’s why their “self-service” portals are eighty-nine-row YAMLs) and in principle you could do DaC without config-as-code, though I’ve never seen that work in practice. The two are complementary. Config-as-code is the medium; DaC is the discipline.

And it is not a one-time exercise. The decisions shift over time. What counted as a business decision two years ago might be a platform decision now (because the platform has matured) or vice versa. DaC requires the platform team to keep curating. The decision surface is a living artifact, not a one-time deliverable.

The through-line

The reason this methodology runs through almost everything I’ve shipped is that the underlying problem doesn’t go away. Every era of platform engineering has had the same shape: a powerful foundation, a set of users who don’t want to touch the foundation’s full surface, and a platform team in the middle trying to mediate. The substrates rotate, vSphere, vRA, OneFuse, Terraform, Crossplane, Helm, KServe, Backstage. The mediation problem is constant. DaC is what I’ve found that survives the rotation.

If you’re a platform engineer reading this and you’re about to ship a “self-service” portal, the question to ask before you ship is: what are the five decisions on this surface that actually matter to my users? If you can name them and the rest of the configuration is below the surface, the design is right. If you can’t, the platform is leaking, and the users will tell you so by filing tickets.

If you’re a business owner reading this and you’re about to inherit a platform whose surface you can’t read in five minutes, push back. Ask the platform team which five decisions are yours to make. If they can’t answer cleanly, they haven’t extracted them yet, and you have leverage to insist they do.

The methodology is older than any of the substrates I’ve used to implement it. It will outlast all of them. That’s the through-line of this blog: the shape of the problem and the shape of the answer are stable. The vocabulary changes. The discipline doesn’t.

Centralize the decisions. Project them everywhere. Validate at the boundary. Govern at admission. That’s the whole shape. Once you see it, you see it everywhere.

, Sid