Personal AI

A year of running AI workloads at home

Twelve months in, the home setup has changed shape: a Mac mini for the always-on lighter services, a Synology underneath for storage and platform services, fast portable SSDs at the working ends, and since March a Mac Studio for inference.

Sid Smith

23 Jun 2025 • 7 min read

It's been about a year of running serious AI workloads on hardware I own. The shape of the setup has changed over the year. It started as a Mac mini doing always-on TTS plus a NAS holding everything, with the laptop carrying most of the rest. The mini gradually absorbed Whisper transcription, OCR, and embedding generation as those workloads found their natural home. The Mac Studio joined in March when the M4-generation Studio shipped, taking over the language-model inference that used to live awkwardly on the laptop or in the cloud. The shape now is what I described in the home-setup piece: each box doing the job it's best at.

Twelve months in, worth being honest about what's actually been useful, what hasn't, what broke, and what the discipline of running this stuff yourself teaches that the cloud version doesn't.

What's actually been useful

A list, roughly in order of how much value I've gotten out of each:

The always-on personal AI assistant. The thing that watches my files, my conversations, my activity, and surfaces relevant context when I'm working on something. This is the workload that justifies the hardware existing. Hard to put a number on the value, but the version where I had to send everything through a cloud API was operationally untenable for privacy reasons; the local version is just a daily-use tool. Earlier in the year this ran cramped on the laptop; since the Studio came in March it runs comfortably with room to spare.

The local OpenAI-compatible inference endpoint. Anything in the office can hit a local URL and get back a model response. My scripts, my notes app, my homegrown automation, all use it. The pattern of "if it could use AI, just point it at the local endpoint" lowered the friction enough that I started building small AI-augmented tools I wouldn't have built if each one needed cloud-API setup.

Whisper transcription on the mini. The Mac mini sits there and transcribes everything I throw at it, meeting recordings, voice notes, the occasional batch of audio files. Apple Silicon is genuinely good at this via MLX or whisper.cpp; the throughput is fine for incremental work and the 16 GB is enough as long as I'm not also running OCR on a large batch at the same time. Free in the marginal-cost sense; private by construction.

OCR for the document pipeline. Document scanning, receipt processing, the occasional batch of PDFs. Runs on the mini next to Whisper. The Apple Vision framework plus a small ML model handles most of what I need; for the harder cases there's a TrOCR-class model that fits comfortably alongside the rest. None of these documents should be leaving the house and they don't have to.

Embedding and indexing pipelines. All my notes, all my published writing, a lot of my code, indexed locally with a local embedding model running on the mini. The retrieval surface that lives on top is what makes the personal-AI assistant actually useful. Hosted equivalents would be expensive and would put my data somewhere it shouldn't be.

The platform services on the NAS. Forgejo for the self-hosted git server, n8n for workflow automation, a small CI runner, all running as containers on the Synology via DSM's Container Manager. None of them is exciting; all of them are quietly load-bearing. The NAS is always on anyway, so the marginal cost of running these services there is essentially zero.

Fast iteration on prompts and chains. Building any non-trivial agentic workflow involves a lot of iteration. Doing that iteration on a local model that costs nothing per call is a very different experience from doing it on a metered API where every iteration is a dollar or two. I build things I wouldn't otherwise build because the per-iteration cost is zero.

What hasn't been useful

Worth being honest about the things that didn't work:

Trying to run the absolute frontier locally. The biggest open-weights models (DeepSeek V3 671B, Llama 4 Maverick at full precision) don't fit in 64 GB. The Studio runs the workhorse-tier open-weights models that actually fit; the giants stay hosted. The closed-frontier models (Claude Opus, GPT-5) are still ahead on the hardest reasoning. The local setup isn't a substitute for those; it's a complement.

Experimenting with multi-modal generation beyond images. Image generation via FLUX runs fine on the Studio. Video generation models exist for local hosting; the quality at the local-feasible parameter counts isn't compelling versus the hosted alternatives. Burned a few weekends on this; mostly went back to cloud for any serious video work.

Trying to replicate cloud-vendor agentic surfaces locally. The framework-and-orchestration layer that ships with Claude Code or Cursor or the equivalent is hard to fully replicate locally. The model side runs fine; the IDE-integration side and the tool-use ergonomics are still better in the cloud-vendor products.

Pi-class boxes for orchestration. Tried adding a Pi 5 for some of the always-on monitoring early on. Operationally annoying; the small Mac mini does the same job better and the operational complexity of two different platforms wasn't worth the ~$80 saved.

Running the heavy AI services concurrently on the mini. TTS plus Whisper plus OCR plus embeddings is fine when the loads are interleaved; it's not fine when a long Whisper batch overlaps with an OCR batch and an indexing run. The 16 GB on the mini is real. Lesson: schedule, don't trust that demand will naturally spread.

What broke and what it taught me

Some war stories from the year:

A model load process exhausted the unified memory on the Studio and the OS panicked. The 64 GB that's plenty for one model isn't enough for two large ones loaded simultaneously. Lost the inference endpoint for an hour. Lesson: always set explicit memory limits on the model server, even when the platform "should" handle it.

A weekend power outage took down everything and the orchestration recovery was incomplete. The boxes came back, the model server didn't auto-restart, the dependent tools failed silently. Lesson: write the actual recovery automation, don't trust that "it'll figure itself out."

A concurrent Whisper + OCR + embedding load on the mini hit the swap. First time I noticed it the mini felt frozen for several minutes; turned out to be paging hard against the unified-memory budget. Lesson: the mini's 16 GB is comfortable for the always-on services as long as the heavy loads don't overlap. Schedule them.

A flaky cable on the NAS manifested as random embedding-quality degradation. Took a week to root-cause because the symptom was subtle (occasional bad embeddings) and the network seemed fine. Lesson: monitoring on the network and storage layers matters more than I thought, and the failure modes aren't always "everything stops working."

An overnight indexing run discovered a content issue in some of the source documents I hadn't checked carefully. The model absorbed and surfaced what was there. Lesson: the same hygiene that applies to production ML applies to weekend experimentation; what you index is what you get back.

The pattern across these: the operational discipline that production cloud workloads have built over decades is mostly absent at the home-office scale. The only person enforcing it is me. When I do enforce it, the setup behaves like a small but reliable production system. When I don't, things break in interesting ways.

What the cloud version doesn't teach

Running this stuff yourself teaches things that running it on someone else's foundation doesn't:

The actual shape of inference cost. Token billing is an abstraction. Watching a model server spike memory and CPU on a long conversation makes the underlying resource cost visible in a way the API bill doesn't. I'm a better cost-modeler for cloud workloads as a result of having watched the equivalent workloads consume actual hardware locally.

What tool-use latency really is. A multi-step agent run on a hosted API hides the per-step latency in the round trips. The same run locally lets you see exactly where the time goes. Most of the latency is model inference itself; the network is small. The hosted-API conversation about latency obscures this.

How model quality actually varies with quantization and configuration. Running the same model at different quantization levels, different context sizes, different batch sizes, makes the quality-versus-throughput trade-offs concrete. The cloud version of the same model is one fixed configuration that the vendor chose; the local version makes the choice your problem and gives you the information to make it well.

The actual security perimeter of an AI workload. When the model and the data and the inference endpoint all live on machines you control, the security perimeter is visible. When it's hosted, the perimeter is somebody's terms of service. The local version forces you to think about what data goes where in a way the hosted version doesn't.

For someone considering whether running AI locally is worth it:

For privacy-bound use cases, yes. The setup doesn't have to be elaborate. A single Mac Studio with enough memory plus a NAS gets you most of the value.
For cost-sensitive high-volume workloads, probably. The break-even is real; the ongoing operational cost is real; do the math honestly.
For pure experimentation and iteration, yes. Fast feedback loops without watching the meter is genuinely a different experience.
For replacing cloud frontier-tier inference, no. The closed-frontier models are still ahead on the hardest workloads; the local Studio is a complement, not a substitute.
For operational simplicity, no. The cloud answer remains the right one if you don't want to think about hardware and operations.

Twelve months in, the setup has been worth it, not for any single workload but for the combination. The privacy-bound personal AI assistant alone justifies the existence; everything else is gravy. The discipline of running this stuff yourself has made me better at thinking about the cloud version. The continued improvement of Apple Silicon plus open-weights means the workload set that fits this pattern keeps growing.

I'd do it again. I'd build it slightly differently. I wouldn't recommend it to most people. The people for whom the math works should do it; the rest should keep using cloud and stop apologizing for it.

What's actually been useful

What hasn't been useful

What broke and what it taught me

What the cloud version doesn't teach

What I'd recommend after a year

Subscribe to Echoes of the machine