Personal AI

Self-hosted everything: my 2026 stack

What I actually run, in March 2026, four boxes, a NAS, a small set of services, and the open-weights models that do the daily work. Practical and concrete; this is the stack as it sits, not the stack as I'd pitch it.

Sid Smith

18 Mar 2026 • 6 min read

A few people have asked for the full, current picture of what I'm actually running at home. Not the abstract architecture diagram, the literal list. Hardware, models, services, the wiring between them, what backs up what. This is that list, as of mid-March 2026. The stack has been stable for long enough that writing it down isn't going to age into irrelevance the next morning.

The shape comes out of years of small decisions, most of them documented in earlier posts. The cluster-of-one piece from last October covers the operating principles. The open-source office stack post covers the non-AI services. The on-prem case covers why most of this lives at home in the first place. This is the inventory.

Hardware

Four boxes do all the work. The naming convention is role-NN; the roles are stable enough that I haven't renumbered anything in two years.

core-01. Mac Studio, M4 Max, 64 GB unified memory. This is the inference workhorse. Anything that touches the larger model weights runs here. It sits on a shelf, fans audible only when training is happening, otherwise silent. Wired ethernet to the switch.

node-01. Mac mini, M4, 16 GB. The supporting-cast box. Runs the lighter always-on services that don't justify Studio-class memory. TTS, STT, OCR, embeddings, a few small task-specific models. Each service is a launchd job that auto-restarts on failure. Same shelf as core-01, also wired.

store-01. Synology DS1019+, 36 TB hybrid pool plus an 8 TB SSD pool. The NAS is doing more than storage, it's running Forgejo, n8n, and a CI runner under DSM Container Manager. Snapshots are configured. Off-site backup is configured. Disk health is monitored. This is the most important single box in the house and the one I think about least, which is exactly the right shape.

laptop-01. MacBook Pro, M4 Pro, 48 GB. The actual workstation. Where editing, writing, and most code happen. Has a local fallback inference setup so that when I'm offline the assistant still works at reduced capability. Two 4 TB SanDisk Pro M.2 drives travel with me, one for working storage, one for shuttling model snapshots between machines without saturating the home network.

That's the whole fleet. No fictional GPU rig in the closet, no rented colo node, no engine-01 that I keep meaning to set up. Four boxes and a NAS. Total capex roughly equivalent to a year of mid-tier GPUaaS for the same workload at the usage I actually run.

Models

The model lineup has settled into a small, opinionated set. I've stopped chasing every release and let the workhorses earn their keep.

Daily-driver chat, a Llama 4 derivative in the 30B-class range, 4-bit quantized via MLX, running on core-01. Fast enough to be invisible, capable enough for the bulk of writing-assist and code-discussion work. This is the model the assistant routes to by default.

Heavier reasoning. DeepSeek R2 distilled down to something that fits in 64 GB at usable context. I keep this loadable but not always loaded; the gateway swaps it in for tasks that explicitly request reasoning depth. The smaller routine model handles 90% of requests; this one handles the rest.

Code-specific. Qwen-coder in a mid-size variant. Lives on core-01, gets pulled in for refactoring and code-walkthrough work where the general model is leaving accuracy on the table.

Embeddings + small classification, a couple of small open-weights embedding models on node-01. The kind of thing where being on the LAN dominates anything else about quality of service. RAG against personal corpus runs through these.

STT, TTS, OCR. Whisper-class for STT, a couple of open TTS models for the non-realtime voice work, a small VLM for OCR. All on node-01. None of them glamorous; all of them load-bearing.

I run these via MLX when MLX support is mature for the model family, and Ollama for everything else. The split has been stable since late 2025. MLX is the default for new model adoption; Ollama is the default for anything that already works there and doesn't need to move.

Services

Above the inference layer, there's a small set of services that turn the boxes into something I actually use.

Inference gateway, runs on core-01. Presents an OpenAI-compatible API at a stable LAN URL. Routes requests to the right backend (MLX vs Ollama, core-01 vs node-01) based on workload type. Captures audit logs to the NAS. Applies the policy decisions about what model gets what context. Single point of contact for everything that wants inference.

Forgejo, on store-01 in a Container Manager container. Hosts every personal git repo that doesn't need to be on GitHub for collaboration reasons. The blog repo, the homelab repo, scratch projects, drafts. Boring, fast, reliable, mine.

n8n, also on store-01. Drives the workflow automation, the news-combination jobs, the periodic data-pull jobs, the scheduled backups-of-backups, the routing of incoming webhooks to wherever they need to go. n8n's not glamorous and it's exactly the right tool for "I need cron plus webhooks plus a few HTTP calls plus light scripting" without writing a service for each one.

CI runner, a small self-hosted runner on store-01 that picks up jobs from Forgejo. Builds the blog theme, runs the article-validation passes, does the lint-and-test on whatever else is in the repo. Container Manager makes the lifecycle of this trivial.

MCP servers, a handful, all running locally. The filesystem MCP for letting the assistant work with my notes directory. A small Forgejo MCP. An n8n MCP for triggering workflows from chat. The MCP-only pattern I wrote about late last year holds, the assistant has tools, the tools are bounded, the auth lives on the box not in some vendor's OAuth screen.

Office stack, the Nextcloud / Mailcow / Vikunja / BookStack stack is still running. Same shape as the late-2025 writeup. Maintenance burden is essentially flat after the initial setup.

Monitoring and observability

A small Grafana lives on store-01. It watches the inference endpoint health, the per-model latency distributions, the NAS disk SMART status, the UPS state, the platform service status, and the backup-job completion. Alerts go to a private Discord channel I keep open in the menu bar. Most weeks the channel is empty. When something goes wrong I usually know within minutes.

There's no Prometheus federation, no central log combination, no service mesh. For a single-user stack the lightweight monitoring is the right answer. The day the requirements outgrow that shape I'll revisit; today's complexity matches today's scope.

Backup

Three layers, tested quarterly.

Snapshots on store-01. Synology BTRFS snapshots on a schedule that captures hourly for the working day, daily for a month, weekly beyond. The first line of defense against "I deleted the wrong file."

Off-site replication, the irreplaceable subset of store-01 (personal repos, knowledge base, financial records, archive of writing) replicates to an off-site target nightly. Encrypted at rest, encrypted in transit, the keys live where I can recover them without depending on any single device.

Workstation backups. Time Machine on laptop-01 to a target on store-01. Periodic full clones of the working SanDisk to its sibling so I can lose either one without losing data.

The quarterly restore exercise is the part that matters. Every backup that hasn't been restored is a hope, not a backup. A weekend per quarter doing a real recovery to a real destination keeps the system honest.

What's not in the stack

Worth noting what isn't here, because the absences are deliberate.

No Kubernetes. No service mesh. No managed cloud database. No third-party auth provider for the home services. No vendor SaaS in the critical path of anything that needs to keep working without an internet connection. No frontier-API dependency for the daily-driver assistant, those APIs are tools the assistant can reach for, not the foundation it runs on.

The stack is small on purpose. Each piece earns its keep against the alternative of "could I just not run this?" The pieces that didn't earn their keep are gone, there's a small pile of services I tried in 2024 and 2025 that quietly disappeared from the inventory because the maintenance cost outran the value.

Where this is heading

The stack is in maintenance mode more than build mode at this point. The architectural changes I'm watching for in the rest of 2026 are mostly model-side, better small models that let node-01 take on more work, MLX maturity that lets me retire the last few Ollama dependencies, possibly an upgrade path on core-01 if Apple ships an Ultra that justifies the swap.

The hardware list I expect to be writing about in March 2027 is probably the same hardware list, with one box upgraded and the model lineup refreshed. That's the goal, a stack boring enough to disappear into, capable enough to do the work, owner-friendly enough to maintain in a few hours per quarter.

Anyone running a similar setup, I'm curious what's in your inventory. The comparing-notes part of this category is where the patterns get refined.