Automation

The five-not-eighty-nine test for any decision surface

If your self-service form has eighty-nine questions on it, you haven't designed a decision surface, you've handed off a configuration job. The five-not-eighty-nine test: can the business owner read the surface in five minutes and know what they're choosing? Here's how to apply it.

Sid Smith

25 Nov 2025 • 8 min read

Sit a business owner in front of the self-service portal your platform team just shipped. Ask them to walk through provisioning the thing they came to provision. Count the questions on the page.

If the count is closer to five than to eighty-nine, you've designed a decision surface and your platform is doing the work it's supposed to do. If the count is closer to eighty-nine, you haven't designed a decision surface, you've handed off a configuration job and called it self-service.

This is the five-not-eighty-nine test, and it's the cheapest test I know for whether a platform is actually serving the people it's built for.

It's also the test that platform teams almost never run. The reason it's worth a piece on its own is that running it honestly is uncomfortable, the result is usually unflattering, and the fix is structural rather than cosmetic. If you do run it and the answer is "eighty-nine," the right response is not to add a wizard.

Where the eighty-nine number comes from

Eighty-nine isn't a literal count. It's the rough size of the configuration surface a real production-grade resource exposes when you let the underlying platform's full schema show through. A Postgres in RDS has dozens of parameter group settings, a security group, a subnet group, a backup configuration, an encryption configuration, a maintenance window, a tag set, an IAM policy. A Kubernetes Deployment has resource requests and limits, tolerations, node selectors, security contexts, network policies, sidecars, init containers, liveness and readiness probes. A vRA blueprint, in the era I lived it, had whatever fields the catalog item designer dropped onto the form, which was usually "all of them, just in case."

Eighty-nine is the number of fields the user has to look at when the platform is a thin wrapper over the base layer's full surface. It's the number that shows up when the platform team's mental model is "expose the configuration; let the user choose," and the user's mental model is "I want a small production-class database in EU-West, please don't make me read the AWS docs."

Five is the number that shows up when the platform team has done the curating work. Workload class. Environment. Region. Sizing. Ownership. Maybe a sixth, data classification, or a cost-center attribution. Not many more than that. The remaining eighty-four fields exist; they just live below the surface, owned by the platform, expressed as defaults and templates the user doesn't have to read.

The approach I keep coming back to for getting from eighty-nine to five is Decisions as Code (DaC). The five-not-eighty-nine test is the cheapest way to find out whether you've actually applied DaC, or whether you've just labelled your form differently and called it done.

How to actually run the test

The test is conversational. It's not a spreadsheet. It's not an audit. The reason it works is that the failure modes show up immediately when a real business owner reads the surface out loud.

Pick the right person. The "business owner" for the test is the person whose decisions you're claiming to expose. For an internal database self-service. It's the application team lead, the person who's deciding "I need a small production-class Postgres for the recommendations service." It's not the platform engineer. The platform engineer can read eighty-nine fields and tell you what each one is. That's exactly why they're the wrong test subject. You need the person who came to the platform with a sentence, not the person who came to the platform with the docs open.

Have them walk through the form, out loud. Not silently. Out loud. The narration is the data. If they get to a field and say "I don't know what this is," the field has failed the test. If they get to a field and say "uh, what should I put here, what's everyone else putting," the field has failed the test. If they get to a field and say "wait, I don't make this decision, my SRE does," the field has especially failed the test. That's a decision the user shouldn't be making at all.

Time it. Set a five-minute timer when they start. The test isn't about whether they finish, it's about what's left on the form when the timer fires. The fields they got past are decisions they actually had context for. The fields they're still staring at are the ones the platform should own.

Count the real decisions, not the field count. Some forms have many fields that are really one decision broken into pieces. If "EU-West-1, EU-West-2, EU-West-3" is three radio buttons but the user is really choosing "EU," that's one decision. The count for the test is the number of decisions, not the number of widgets. The other way around: "tags" looking like one field but really being eight independent choices counts as eight, not as one.

Note the panic responses. When the user gets stuck on a field, watch what they do. Do they ask a colleague? Do they ping you in Slack? Do they pick the default and hope? Do they pick whatever the example showed? Each of those is a signal that the field doesn't belong on the surface. The default is the platform's opinion; the field is the user's decision. If the answer is "use the default," the field shouldn't be a field.

The output of the test is a count and a list. The count is the number of fields that really got a considered answer in the five minutes. The list is the fields that didn't, the ones that got skipped, defaulted, panicked over, or bounced to a colleague. The list is what the platform team has to take back into the platform.

What "no" looks like, and what to do about it

If the count is fifteen, twenty-five, eighty-nine, the answer to "is this a decision surface" is no. The instinct in this moment is to add a wizard, or add tooltips, or add a "talk to a platform engineer" button. Don't. None of those is the fix. The fix is to take the failed-list back to the platform and decide which of those fields the platform owns now.

The decision-by-decision pass looks like this. For each field on the failed list, the platform team asks: is this a business decision that a different application team would make differently from this one? If yes, it stays on the surface, and the platform team owes that team better tooling for making it (better docs, better defaults, a clearer label). If no, if every team would pick the same value, or pick from a small set of pre-defined values, or pick whatever the platform recommended, then the field comes off the surface. It moves into the standards repo as a default. It gets templated. It becomes a platform decision projected onto the workload via the DaC pattern.

The other piece: for the fields that genuinely belong on the surface, are they the right level of abstraction? "Replication factor: 3" is the wrong level. "Workload class: production" is the right level, and "production" is the standards repo's responsibility to define, including what replication factor it implies. The user picks the class; the standards library projects the class onto the base layer's primitives. Two different sentences, two different jobs. The test is whether the user's sentence is a sentence they actually wanted to say.

And then there's vocabulary. Are the names on the surface in the user's words or the platform's words? "Pod anti-affinity" is the platform's vocabulary. "Spread across racks" might be the user's. "InstanceClass: db.t3.medium" is the platform's vocabulary. "Sizing: small" is the user's. The standards library is the place the translation lives. The form is the place the user's vocabulary shows up.

A platform team that runs this loop honestly across the failed list will, in my experience, come out the other side with a surface that has between five and ten fields on it for almost any provisioning task. Not because five is a magic number. Because that's the number that falls out when you've drawn the line correctly between business decisions and platform decisions.

The OneFuse echo

The first time I ran a version of this test in earnest, in the OneFuse era (around the time we were building what I now call the DaC pattern, the Property Toolkit lineage), the failed-list for a "deploy a new application stack" form was thirty-seven fields long. The user (a project manager, not a platform engineer) got through eight of them in five minutes. The other twenty-nine were a mix of "I don't know," "I don't know what this means," and "I usually just put whatever Joe puts." Joe was the platform engineer two desks over who ended up de facto co-filling every form.

We took the failed-list back to the platform side, decided which of the twenty-nine were genuinely platform decisions (about twenty-three of them), wrote them into the standard property surface as defaults, and rebuilt the form. The next version had nine fields. The next test took four minutes. The platform engineer got back the time he'd been spending hand-holding form fills. The project manager was visibly relieved.

The lesson wasn't "fewer fields are better." The lesson was that the fields had been on the wrong side of a line. The line was always between business decisions and platform decisions. The platform team had drawn the line in the wrong place, exposing platform decisions because nobody had plainly said "those are ours, not theirs." The five-not-eighty-nine test surfaces the line.

When five is too few

The mirror failure is also worth naming, because I've seen it. A platform team reads the test as "fewer is better, monotonically," and ends up with a surface that has three fields on it because they collapsed every choice into the default. The user gets a database. They didn't get a small database, they got the medium one because medium was the platform's default, and they didn't have a way to override it. They didn't get a production-class database, they got dev-class because the platform decided every new request was dev until somebody said otherwise.

Five-not-eighty-nine cuts both ways. The right number of fields is the number of decisions the user actually wants to make. If the user has five real decisions, the surface has five fields. If the user has three, the surface has three. If the user has ten, the surface has ten, and that's fine, because the test isn't "minimize the surface," it's "match the surface to the decisions."

The failure mode at both ends is the same failure mode: the platform team has guessed wrong about which decisions are the user's. The fix at both ends is the same fix: ask the user, watch them work, and move the line.

Why this is the test that matters

Every methodology paper I've ever read about platform engineering eventually arrives at "be user-centric." Sure. The five-not-eighty-nine test is what user-centricity looks like operationally. It's the question you can answer in fifteen minutes that tells you whether the approach has reached the surface or stopped at the strategy deck.

If your platform passes the test, you have a decision surface and the approach is doing its job. If your platform fails the test, you have a configuration handoff in self-service clothing, and now you know what the platform team's next sprint is.

Either way, you ran the test, which is more than most platforms ever do. That's the cheapest progress you can make.

, Sid