Backstage as the developer portal for AI services

AI services need a catalog the same way every other internal platform does. The wiki approach falls over the moment you have more than a handful of models. Backstage with a thin AI plugin layer is the pattern that holds, a direct callback to the catalog discipline.

Sid Smith

29 Aug 2025 • 6 min read

Every organization I've watched stand up an AI platform eventually hits the same wall: there are now twelve models in flight, six teams shipping them, three eval frameworks, and a Confluence page that's eight months out of date. Somebody asks "who owns the customer-summarization model and which version is in prod?" and nobody knows. The wiki is lying. The README in the model repo is lying. The Slack thread where the decision happened is gone.

The fix isn't a better wiki. The fix is a service catalog. AI services need the same discovery, ownership, and metadata discipline that platform teams figured out for backend services a decade ago. Backstage (Spotify's open-source developer portal, now a CNCF graduated project) is the obvious foundation, and there's enough of an AI plugin layer in 2025 to make it useful out of the box.

Bottom-up at 3am: which Component uses which Resource?

I've been running a Backstage instance on engine-01 in the homelab for a few months, mostly to see how the catalog metaphor lands when the "services" being cataloged are model endpoints, agent runtimes, and RAG pipelines instead of plain old microservices. The short answer is: the metaphor lands fine, the work is in defining the right entity types.

The catalog metaphor, one more time

Anybody who's read my older work is going to recognize the shape of this argument. This is Decisions as Code (DaC) projected onto the form surface, the approach behind nearly every self-service and automation system I've designed: extract business decisions out of platform configuration into a small, curated layer (often five real decisions where the raw config exposed eighty-nine) and let the platform absorb the rest through templates and defaults. (I called this Property Toolkit during my OneFuse days; the foundation is different, the shape isn't.)

Backstage is one of the cleanest projections of DaC I've seen. The Backstage Software Templates are literally curated forms: a handful of inputs the developer actually has to decide, and a templated scaffold that absorbs the rest. The catalog itself is a structured, discoverable, central source of decisions (owner, classification, lifecycle, on-call) that every consuming platform can pull from. The discipline of "name your things, declare their owners, make them discoverable, project them onto whatever platform needs to consume them" is the discipline DaC has always asked for; Backstage is what it looks like in the developer-portal lane.

Entities (Components, APIs, Resources, Systems, Domains) get YAML descriptors, those descriptors live in git, and the catalog ingests them and presents the result as a navigable portal. Ownership is first-class. Relationships are first-class. The thing you wanted from the wiki ("what is this, who owns it, where does it live, what depends on it") comes for free.

The leap for AI services is that you need a couple more entity kinds (a Model entity, an Eval entity), and you need the catalog to surface metadata that a backend service doesn't have (model card, eval scores, training data lineage, on-call rotation specific to model issues). Backstage's plugin model handles both.

What goes into the AI plugin

The right Backstage configuration for AI services, at minimum, covers:

Model versions. Each model has a Component entity (or a custom Model kind, via the entity schema). Versions are tracked, production, staging, deprecated. Lineage to the upstream base model is captured. The Hugging Face registry, an internal MLflow, or a model-as-code repo can all back the catalog; the entity declaration just needs to point at the source of truth.

Ownership. Every model has a team. Every team has an escalation path. When the customer-summarization model misbehaves at 3am, the on-call knows who to wake up. This is the single most valuable thing the catalog does, and it's the thing the wiki is worst at, by the time the wiki page lists an owner, that owner has changed teams twice.

Eval reports. Recent eval runs surface as entries on the model's entity page. The latest score on the held-out set, the latest red-team report, the latest drift check. Not the eval results buried in a CI log, the eval results as a first-class piece of catalog metadata. A small custom plugin reads from your eval system and renders the result.

Runbooks. What to do when this model misbehaves. How to roll back. How to escalate. How to fail over to a previous version. The Backstage TechDocs feature renders Markdown from the repo directly, so the runbook lives next to the code that needs it.

On-call rotation. PagerDuty / OpsGenie / Grafana OnCall integration is one of the older Backstage plugins. Surface the active on-call for the model's owning team right on the entity page. The "who is on call for the embedding service" question dies the moment that information is one click away.

Dependencies and consumers. Which agents call this model? Which RAG pipelines use this embedding endpoint? The Backstage relationship graph answers this if your entity declarations are honest about dependsOn and dependencyOf. The "we deprecated that endpoint six months ago and three production services are still calling it" surprise dies.

Cost. Optional but increasingly worth it: cost per 1k tokens, cost per request, or cost per hour for self-hosted serving. Wire it to your billing or your Prometheus cost-exporter; surface it on the entity page. Teams that see cost behave differently than teams that don't.

Why a Backstage portal beats a wiki

The honest pitch:

The catalog is enforced. Backstage entities are YAML in git. They get reviewed in PRs. They get linted. They don't drift the way a wiki page drifts, because the catalog is generated from the source of truth instead of being a parallel write.

Discovery is structured. A search across the catalog returns entities with metadata, not pages with text matches. "Show me all production models owned by the platform team that haven't been eval'd in 30 days" is a real query against the catalog, not a Slack archaeology project.

Ownership is unambiguous. Every entity has an owner. The owner is a Group or a User. Groups have members. Members have email addresses and chat handles. The on-call for any entity is one query away.

Plugins do real work. Backstage plugins for Kubernetes, Argo CD, GitHub, PagerDuty, Sentry, Grafana, the integrations exist. The AI-specific plugins (MLflow, Weights & Biases, HuggingFace, LangSmith) are newer but real. The result is a single pane of glass that the wiki can't match, because the wiki is just text.

Scorecards work. Backstage's Scorecards / Soundcheck patterns let you express "every production model must have an owner, a runbook, an eval less than 14 days old, and an on-call rotation." The catalog tells you which entities are out of compliance. The wiki tells you nothing.

What it looks like on the homelab

Backstage runs on engine-01, a single Deployment behind an Ingress, with Postgres for the catalog database. The entity catalog pulls from a small monorepo that holds catalog-info.yaml files for everything I run: the local Whisper service, the embedding pipeline on node-01, the vLLM endpoint on the Mac Studio for inference, the n8n instance on store-01.

Each entity declaration is twenty or thirty lines of YAML. Owner: me. On-call: also me (the homelab on-call rotation is short). Annotations point at the Grafana dashboard, the Forgejo repo, the recent eval results.

The catalog isn't doing anything magical. What it's doing is forcing me to declare, in YAML, the things I'd otherwise leave implicit. The discipline is the value. By the time a hundred-team enterprise is running this, the discipline is the difference between "we know what's in prod" and "we used to."

What I keep coming back to

The catalog problem is the same DaC problem in a different costume. The decisions used to be "what's the standard name for the production-Linux template" with consumers vRA, vRO, and Terraform. Now it's "what's the standard owner for the customer-summarization model" with consumers PagerDuty, Grafana, the eval pipeline, and a human on-call. The form surface that absorbs everything else moves with the foundation underneath.

The approach, declare the decisions once, centralize them, project them onto the consuming platforms via predictable conventions, is older than the tools. Backstage is just the cleanest version of it for the developer-portal use case in 2025. If you're running more than a handful of AI services and you don't have a catalog. That's where the next quarter of platform pain is going to come from. Stand the portal up first; the rest of the AI platform discipline gets easier when the catalog is the source of truth.

The catalog metaphor, one more time

What goes into the AI plugin

Why a Backstage portal beats a wiki

What it looks like on the homelab

What I keep coming back to

Subscribe to Echoes of the machine