The enterprise AI stack: substrate, platform, applications
Three layers, top to bottom: applications (the AI features users see), platform (model registry, serving, observability, governance), substrate (K8s, GPUs, storage). Decisions as Code runs through every layer. Centralize the decisions. Project them everywhere.
I've spent the back half of 2025 writing about specific corners of the enterprise AI stack, the Kubernetes patterns that hold up, Backstage as the catalog layer, Helm values as the Decisions as Code surface, the model-serving foundation question, GPU scheduling in K8s. Each piece is a slice. Here's the framing piece, the three-layer mental model I keep coming back to when I think about how those pieces fit together, and the approach that runs through all three.
The three layers, top to bottom:
- Applications. The AI features users actually see. The summarization endpoint embedded in the CRM. The chat assistant. The agentic workflow that drafts the report. The recommendation engine. This is the layer that matters to the business.
- Platform. Model registry. Serving framework. Eval pipeline. Observability. Governance. The reusable plumbing that lets every application team ship without reinventing the operational story.
- Substrate. Kubernetes. GPUs. Storage. Networking. Identity. The compute and connectivity that everything else runs on top of.
The trap that's killed several enterprise AI programs I've watched from a distance is treating these as if they're independent. They aren't. Each layer projects standards into the layer below it and consumes contracts from the layer above. Get the projection wrong and the whole thing fragments into per-team bespoke stacks that don't share anything except a logo.
The approach that makes this work is Decisions as Code (DaC). DaC is the approach behind nearly every self-service and automation system I've designed: pull business decisions out of platform configuration into a small, curated layer (often five real decisions where the raw config exposed eighty-nine) and let the platform absorb the rest through templates and defaults the platform owns. (I called this Property Toolkit during my OneFuse days; the shape of the idea hasn't changed, only the foundation under it.) DaC applies at every layer. Same shape. Different primitives. The approach is what travels.
Want to go deeper on the methodology itself? The Decisions as Code piece walks through the centralize-and-project pattern, and the Helm values article is the Kubernetes-specific version.
Layer 1: The substrate
The bottom layer is the boring layer, which is exactly why it's the foundation. The foundation is:
- Kubernetes as the workload orchestrator. Multi-cluster if you need it. One regional cluster per environment if you don't.
- GPU nodes with proper taints, NVIDIA device plugin or its operator, the scheduling primitives covered earlier.
- Storage, block for stateful platform components, NFS or parallel filesystems for model weights and large artifacts, object storage for everything else.
- Networking, service mesh if the workload count justifies it, otherwise plain Service / Ingress with a sensible default network policy.
- Identity. Workload identity for service-to-service auth, IAM integration for the data path, secrets management that isn't environment variables in a YAML file.
- Cost telemetry, a Prometheus / cost-exporter setup that ties spend to labels you control.
The foundation isn't AI-specific. Almost none of it is. The interesting thing about an AI-ready foundation is that it's an honest one, the discipline that's been documented for backend services since 2018 applies word-for-word, with two additions: the GPU node pool and the storage path for model weights.
DaC shows up at the foundation layer as the labels and the resource conventions. Centralize the standard label set, app, env, team, cost-center, data-classification, and project it everywhere. Centralize the resource tiers (small / medium / large for CPU; small / medium / large for GPU) and project them everywhere. The foundation is the place where these decisions are most consequential and most often lost.
Layer 2: The platform
The middle layer is the layer that earns or loses its keep based on whether application teams want to use it. Platform is:
- Model registry. Where models live as first-class artifacts with versions, lineage, and ownership. MLflow, Weights & Biases, or a model-as-code repo backed by object storage. The catalog I wrote about in the Backstage piece sits in front of the registry and projects it as a navigable surface.
- Serving framework. KServe + vLLM for the LLM case; Triton or BentoML for the mixed-framework case; a thin Deployment-and-Service for the single-model case. The foundation question covered in the serving piece lives here.
- Eval pipeline. Where evals run on every model version. The eval results are first-class metadata on the catalog. Drift detection runs on schedule against production traffic.
- Observability. Prometheus + Grafana for metrics, OpenTelemetry for traces, the model-specific metrics (tokens-per-second, queue depth, p99 latency, prompt + completion logging where compliance allows). Cost dashboards that tie back to the foundation's cost telemetry.
- Governance. OPA + Gatekeeper or Kyverno at admission. Policy gates on what can deploy. Data-classification enforcement. Identity-aware routing.
- Developer portal. Backstage, ideally. The catalog of models, agents, RAG pipelines, eval reports, runbooks, on-call rotations.
This is the layer where DaC is most visible, because the platform is the projection layer. The standard decisions from the foundation need to project onto the platform's primitives: labels become InferenceService metadata, environment becomes Knative scaler configs, sizing becomes resource specs, security context becomes pod security standards. The platform team is in the catalog-and-adapter business. The foundation underneath is different from anything I worked with in the OneFuse era; the approach is identical.
The application-facing contract is the platform's product. "How does a team ship a model?" should have a single, unambiguous answer, with a single Backstage template, a single Helm library chart consumed by the application chart, a single set of conventions for ownership, eval cadence, and on-call. When that contract is clean, teams use the platform. When it isn't, teams build their own, and you end up with three serving stacks and four eval pipelines.
Layer 3: The applications
The top layer is where the business value lives. Applications are:
- The customer-facing chat assistant.
- The internal summarization API that the support tooling calls.
- The retrieval-augmented Q&A endpoint feeding the help center.
- The agentic workflow that drafts a quarterly report from a dozen data sources.
- The recommendation engine that learns from clickstream data.
Each application is a small thing that consumes the platform. The application team:
- Picks a model from the registry (or registers a new one).
- Declares an InferenceService manifest, or a Deployment if the platform's KServe layer doesn't fit.
- Wires up retrieval against the platform's vector store.
- Writes the prompts and the eval set.
- Ships.
That's the contract. The application team isn't responsible for the serving infrastructure, the GPU scheduling, the model lifecycle plumbing, or the observability story. The platform handles all of it. The application focuses on the AI feature.
DaC shows up at the application layer as the consumption surface. Application charts depend on the standards library chart from the platform. The chart's templates call into the library helpers. The result is a workload manifest that's correctly labeled, correctly sized, correctly tolerated to the right GPU node pool, correctly governed at admission. The application team writes thirty lines of values; the standard machinery does the rest.
How the projection works across layers
If I had to sketch DaC across all three layers in one paragraph: the standard business decisions (env, sizing, labels, security, ownership) live in a centralized standards source. The foundation projects them as node labels, resource quotas, network policies, IAM bindings. The platform projects them as InferenceService configs, eval scheduling, catalog metadata, on-call rotations. The applications project them as Deployment manifests, Service definitions, prompt templates, retrieval scopes. The standards source is the single point of truth; each layer is an adapter that translates the standard values into platform-correct shapes.
This is the DaC pattern running at three layers instead of one. The shape doesn't change. The vocabulary does.
What makes enterprise AI different from enterprise anything-else
Two things, mainly. The first is that the cost concentration is sharp. GPUs cost meaningfully more than CPUs, and the platform layer can either save or waste a lot of money depending on whether it gets multi-tenancy right. The other piece is that the eval / governance story is newer and less mature, so the platform layer carries operational responsibility that older platforms (databases, message queues) had decades to grow into.
Both of those are arguments for taking the platform layer seriously. Not "we'll let each team figure it out" seriously. Real platform-team-with-funded-staffing seriously. The companies I've watched succeed at enterprise AI in 2025 didn't get there by being smarter about model choices. They got there by building a platform layer with the same discipline they'd apply to any other production foundation, and then letting application teams ship against the platform's contracts.
Does a small shop need all three layers?
A natural objection: do you really need all three layers if you're a small shop? No. The three-layer model scales down. At the smallest scale, "the platform" is one engineer who maintains a Helm chart and a docs page. At medium scale. It's a small team. At fleet scale. It's a real platform org with KServe operators and a Backstage catalog and OPA policies. The shape doesn't change; the staffing does.
The risk is small shops trying to build the medium-shop platform too early, and large shops not building any platform at all because every team rolled their own first. The former wastes engineering hours; the latter wastes GPU hours, which are more expensive.
The frugal AI piece covers this for the small-shop case. The framing here works for both ends: pick the layer scope that matches your scale, but don't skip the approach. The centralization-and-projection discipline is cheap. The drift it prevents isn't.
What I keep coming back to
I've been working on the same problem in different costumes for two decades. K-6 IT systems where Active Directory was the decisions source and login scripts were the projection. Virtualization at Pierce where the VM template library was the decisions source and the deployment workflow was the projection. The OneFuse era where the curated decision surface fed every platform adapter. Terraform modules where module.standards is the decisions source and every infrastructure module is a consumer. Now enterprise AI where the decisions source is a small repo of YAML and JSON Schema, and the projection happens at three layers.
The technology rotates. The approach persists. If you take one framing from this whole batch, take this: enterprise AI is a three-layer problem (substrate, platform, applications), and Decisions as Code is what makes it tractable. The companies that internalize this end up with an AI platform that ships features. The ones that don't end up with fifteen bespoke stacks, six different opinions about how to log inference, and a quarterly conversation about why the GPU bill is so high.
Centralize the decisions and project them everywhere. Validate at the boundary, and govern at admission. That's how the enterprise AI stack holds together, and it's how every enterprise stack worth building holds together.