Self-hosted Forgejo and Harbor: the sovereign AI substrate

If your AI infra depends on third-party container images, you don't control your supply chain. Forgejo on store-01 as the source-of-truth git host, Harbor on engine-01 as the registry plus image-signing layer. The sovereign-infra argument, and why mirroring is non-negotiable now.

Self-hosted Forgejo and Harbor: the sovereign AI substrate

The 2024 supply-chain-attack year quietly rewrote the assumptions a lot of teams were running on. The XZ utils backdoor was the headline event, months of patient social engineering against a single open-source maintainer, very nearly landed in every distro, caught by accident. But the ambient noise was just as loud: typosquatted PyPI packages targeting AI tooling, malicious VS Code extensions, npm dependency confusions, container images on public registries pulling random cryptominers in side layers. None of it was novel. The volume was.

If you're running AI workloads (and especially if you're running them in any kind of regulated context) the question that comes out of that year is simple: do you control your supply chain or do you not? The honest answer for most teams is "not really." Pulls from docker.io, models from public Hugging Face mirrors, base images from random OCI registries, build dependencies from PyPI and npm. The supply chain is a hundred third parties wide and one CI run deep.

Sovereign AI stack, your perimeter AI workloads inference, fine-tuning, eval Model + dataset registry weights and data live here Forgejo (Git) + Harbor (Registry) your code, your images Kubernetes your cluster, your scheduler Hardware your servers, your colo, your DC your perimeter If it leaves the dashed line, it leaves the perimeter. Most teams pick wrong.
What stays inside your perimeter.

Sovereign AI stack, your perimeterAI workloadsinference, fine-tuning, evalModel + dataset registryweights and data live hereForgejo (Git) + Harbor (Registry)your code, your imagesKubernetesyour cluster, your schedulerHardwareyour servers, your colo, your DCyour perimeterIf it leaves the dashed line, it leaves the perimeter. Most teams pick wrong.What stays inside your perimeter.

The argument I want to make in this piece is that the answer for serious AI infra is the same answer that worked for serious software infra a decade ago: own your source of truth, own your registry, mirror your dependencies, sign what you publish. The tools are mature now. The cost is bounded. The benefit is sovereignty.

I run this stack at home: Forgejo on store-01 (the Synology DS1019+, via Container Manager, with the repos backed by the NAS volume) as the git source of truth, and Harbor on engine-01 (the Linux box with the GTX 1080 Ti, its main job is GPU but it also hosts the registry and a handful of services) as the OCI registry and signing layer. It's not a fleet-scale deployment. It is a pretty honest test of whether the pattern is achievable at small scale, which it is.

The argument: own your supply chain

The supply chain is the set of things you depend on to get a workload from a developer's laptop to a running pod. For an AI workload, that's:

  • Source code (yours and your dependencies').
  • Container base images.
  • Runtime libraries (CUDA, PyTorch, TensorRT, vLLM, etc.).
  • Model weights.
  • Tokenizers and config files.
  • Build tools.
  • Sidecar / init containers (loggers, metrics agents, mesh proxies).

Every one of those is a place where something can land in your runtime that you didn't put there yourself. The mitigation isn't paranoia; it's the same mitigation enterprises have always applied, a controlled, mirrored, audited path between the upstream and your runtime.

The minimum bar for "owning your supply chain" in 2025 looks roughly like this:

  1. Source of truth on infrastructure you control. Not a vendor's git host. The vendor can host a mirror; the source of truth is yours.
  2. Container registry on infrastructure you control. Mirrored bases, mirrored deps, signed images, vulnerability-scanned at push.
  3. Model registry on infrastructure you control. Mirrored weights, hashed and signed, with a metadata trail back to the upstream.
  4. Build pipeline that pulls from your registries, never from upstream. Air-gappable.
  5. Egress controls on your runtime nodes so the running pod can't reach back out to a public registry to pull a side image at runtime.

Most teams have one or two of those. Few have all five. The five together are the bar.

Forgejo as the source of truth

Forgejo is the community-led fork of Gitea. Same shape: a self-hosted, lightweight git server with issues, PRs, releases, container registry, and CI runner integration. Lives in a single binary or a small container. Backs onto a postgres or sqlite. Runs comfortably on hardware most people already have lying around.

I run mine on store-01 (the Synology DS1019+) as a Container Manager stack. The Synology already has the storage, the backup, the snapshots, the off-box replication. Putting Forgejo in front of those primitives gets you a git host with NAS-grade durability for free. The Forgejo Actions runner runs on the same box; for AI repos that need GPU, the runner farms out to engine-01 over the homelab network.

Why Forgejo and not GitHub Enterprise or GitLab self-hosted? For a homelab the answer is "it's small and works." For an enterprise the answer is more interesting: Forgejo is genuinely small enough to deploy as part of a sovereign foundation, the codebase is auditable in an afternoon, and the AGPL license means the evolution stays in the community. GitLab self-hosted is a perfectly fine answer for a fleet; Forgejo is the answer when you want the foundation to be something you can fully reason about.

The Forgejo registry feature also gives you a container registry inline, but I run that separately on Harbor because the Harbor feature set (replication, signing, scanning, RBAC) is the part of the supply chain story that matters most.

Harbor as the registry plus the signing layer

Harbor is the CNCF graduated registry. It's the OCI registry plus a handful of features that turn it from a place to push images into a supply-chain hub:

  • Replication, pull-based mirrors of upstream registries (docker.io, quay.io, gcr.io, nvcr.io, ghcr.io). You declare which images you want mirrored; Harbor pulls them on a schedule and serves them locally.
  • Vulnerability scanning at push and at scheduled intervals. Trivy or Grype as the backend.
  • Image signing via Cosign. The sign step happens at push; the verify step happens at admission via an OPA / Gatekeeper policy or the Connaisseur admission controller.
  • Project-scoped RBAC so the AI team's project can only push to the AI namespace and only pull from approved upstream mirrors.
  • Webhooks so the CI pipeline knows when a new mirrored upstream lands and can re-run the build with the new base.

I run Harbor on engine-01 because that node already has the bandwidth and the storage layout for the registry's blob store. The Synology backs up the Harbor postgres and the blob store nightly. The whole thing is a single Helm chart on the homelab K8s cluster.

The pattern that matters: nothing in the build pipeline pulls from a public registry. The base image FROM line points at harbor.lab/library/ubuntu:22.04, which is a mirror of docker.io/library/ubuntu:22.04. The CUDA layer pulls from harbor.lab/nvidia/cuda:12.4, mirrored from nvcr.io. The vLLM image is built locally from an open-source Dockerfile, signed at push, and pulled from the local Harbor at runtime. The pipeline can be air-gapped and still work.

The model registry question

The model is the hardest part to mirror. A 70B-class model's weights are 130-plus gigabytes. Pulling them down once to mirror them is a real network event. Storing them durably on the homelab NAS is fine; storing them durably on a regulated enterprise's infrastructure is also fine; the in-between case where you'd rather not is the case nobody's solved cleanly.

Harbor doesn't natively store model weights as a first-class artifact, but it can store them as OCI artifacts via the oras tooling. The model becomes an OCI artifact in Harbor, signed and scanned, with a manifest that describes the metadata. KServe and a handful of other serving stacks know how to pull OCI artifacts directly. For workflows that want a richer model store, MLflow or BentoML's model store run alongside; the pattern is the same, mirrored once at the boundary, pulled internally at runtime, never re-pulled from the public mirror.

I keep a single 7B model mirrored on store-01 and pulled by engine-01 at deploy time. It's not a stress test of the pattern, but it confirms the shape.

Egress as the last layer

Mirroring deps is necessary and not sufficient. The runtime pod needs to be unable to reach out to a public registry on its own. The K8s pattern is a NetworkPolicy that allows egress to the local Harbor and denies egress to anything else. The pattern shows up in the Helm values article as a centralized standard: every workload chart inherits the egress NetworkPolicy from the standards library chart; the chart can't render without it.

This is also where the stop-letting-ai-vendors-handle-your-data on-prem case shows back up. The supply-chain version of the on-prem argument is the same argument: if a third party can reach into your runtime, they're part of your trust boundary whether you like it or not. The cure is the same, close the boundary, mirror what you need across it, audit what crosses.

Pin everything

The other half of the supply-chain story is pinning. The 2024 provider-version-pinning piece made this point about Terraform; the same argument applies to container base images, runtime libraries, model versions, tokenizer configs, every dependency in the chain. latest is the worst tag in the registry. Hash-pin everything you depend on. The pipeline should re-pin deliberately, with a PR, with the diff visible.

What I keep coming back to

Sovereignty over your AI supply chain isn't a maximalist position. It's the table stakes for running AI in any environment where "what's actually in the runtime" needs to have an answer. Forgejo and Harbor are the two pieces I'd put at the bottom of that stack at any scale where the homelab pattern wouldn't blow up. Both are mature open-source projects. Both run comfortably on infrastructure most teams already have. Both close gaps that the 2024 attack year demonstrated were real.

I keep the pattern at home because it's the simplest place to test whether the discipline scales down. It does. The bigger version of the pattern is the same shape with more replicas, more scanning policies, tighter RBAC, and a real signing key ceremony. The discipline, not the size, is the contribution.

The supply-chain story for AI infra is the supply-chain story for any infra. Two years ago that was a security-team conversation. Now it's everyone's conversation. Pick the foundation. Stand it up. Mirror your deps. Sign your images. Lock down egress. The cost of the discipline is bounded. The cost of skipping it is the next supply-chain attack.