Notes on building your own AI assistant: start here
Starter notes for the practitioner who wants to build their own personal AI in 2026. The substrate is finally ready. Hardware, model, MCP-or-not, memory, privacy boundaries, the pragmatic shape of 'if I were starting today.'
The question I get most from people who've read me for a while: "Okay, I'm convinced. Where do I actually start?" The answer in 2024 was awkward; the answer in early 2025 was workable-with-caveats; the answer in March 2026 is finally clean enough to write down without the apologetic preamble.
The foundation is ready. Apple Silicon hits the price-performance point. MLX is mature, the open-weights field has settled into a few defensible defaults. MCP has the integration story sorted, and the local memory tooling is no longer a weekend-coding project. None of this is complete; all of it is workable. The bridge product I keep saying Apple should ship still isn't here. The thing you can build for yourself is.
These are starter notes, practical, opinionated, not thorough. The "if I were starting today" framing. I'll point at choices and say which ones I'd make and why; you'll make different ones for your own reasons and that's the point.
Hardware: pick the smallest thing that fits
The instinct is to start with the model, then back into the hardware. Don't. Start with the hardware envelope, then choose the model that fits.
The cheap entry point. A Mac mini M4 with 16-24GB of unified memory runs a 7B-class model at usable interactive speed. This is the "I want to see what local AI feels like" tier. It's also genuinely useful, most of my routine assistant work lives on a Mac mini I call node-01, sitting on a shelf, doing inference and small agent tasks while my laptop does whatever else.
The serious tier. A Mac Studio with 64GB-96GB of unified memory runs the 30-70B class comfortably and the 100B+ class with quantization. This is what I'd buy if I were starting today and could justify one machine. My core-01 is an M4 Max 64GB Mac Studio; it does the heavy lifting that the mini can't. The price-per-useful-token-per-second is the best it's been.
The laptop question. The MacBook Pro line with 48GB+ of unified memory makes a credible single-machine setup if you don't want a desktop. My laptop-01 is an M4 Pro with 48GB and it can run the work I need on the road. The trade-off is heat and battery; the upside is one machine. If you travel often, a single capable laptop beats a desktop you can't reach.
The storage question. You will accumulate models, embeddings, transcripts, and memory artifacts faster than you think. Plan for it. I run a Synology DS1019+ (store-01) as the durable tier, model cache, embedding indexes, transcripts of conversations, the personal-knowledge corpus that the assistant draws on. NAS isn't sexy and it isn't optional once the personal-AI surface grows past one machine.
The honest single-recommendation: if you're starting fresh and have the budget, get a Mac Studio with 64GB or more. If you don't, a Mac mini with 24GB gets you 80% of the experience for a quarter of the cost. Don't agonize over the choice; both work.
Model: pick a default, then stop optimizing
The open-weights field in early 2026 is in a much better place than it was a year ago. There are three or four strong defaults at each size class, the licensing is mostly clean, and the quality gap between local and frontier is narrower than the discourse implies, particularly for the kinds of tasks a personal assistant actually does.
My defaults today: A capable 30B-class instruction-tuned model as the everyday default. Something larger for the harder tasks, coding, multi-step reasoning, longer-context synthesis. A small fast model (3-7B) for the latency-sensitive bits, autocomplete, classification, the things where 200ms matters more than quality.
What I'd avoid: The "pick the bleeding edge weekly" trap. The model release cadence is fast enough that you can spend all your time evaluating and none of your time using. Pick a default that's good enough; commit to it for a quarter; re-evaluate when something genuinely changes the field.
On quantization: 4-bit is fine for almost everything. 8-bit if you have the memory and want a small quality bump. Don't run unquantized unless you have a specific reason, the memory cost rarely buys you a noticeable gain for assistant-class tasks.
On fine-tuning: Don't, at the start. The retrieval-and-context-engineering route gets you 90% of the personalization with none of the training overhead. Fine-tuning is a real lever once you know exactly what you want from it; it's premature on day one.
MCP or not
A year ago this was a real question. Today it's mostly settled: yes, use MCP for tool integrations, with eyes open about which servers you trust.
Why yes. The protocol does the boring integration work that previously had to be hand-rolled per tool. Calendar, email, files, browser, the personal services you use, there's an MCP server for most of them, the servers compose, and the agent doesn't need to know about each one's quirks. This is real leverage.
The trust boundary. MCP servers run with whatever permissions you give them. A poorly-vetted server is a hole in your personal-AI privacy story. I trust the servers I run myself; I'm cautious about third-party servers that touch sensitive surfaces. The principle from the secrets-aware assistant piece applies: assume the model can see whatever the tool can see, and design accordingly.
My pragmatic line. Run the MCP servers you wrote yourself or audited. For third-party servers, pick the ones with active maintainers, transparent permissions, and ideally local-only operation. Be especially careful about anything that touches credentials, payments, or messaging, those are the surfaces where a misbehaving server does the most damage.
Memory architecture
The piece that sounds boring and turns out to matter most. An assistant without durable memory is a chatbot with extra steps. An assistant with badly-structured memory is worse than one with none.
The shape I'd build today: A two-tier system. A small fast working-memory store (recent conversations, today's context, the things-in-flight) that the assistant reads on every turn. A larger durable knowledge store (embeddings over personal documents, transcripts, notes, project context) that gets retrieved against on demand. The working memory is rewritten frequently; the durable store is append-mostly with periodic curation.
Storage foundation. SQLite for the working memory, a vector store (LanceDB, Qdrant, whatever you prefer) for the embeddings. Both can run on the same machine that does inference, or split off to a small always-on node. Mine sits on node-01 with the durable corpus on store-01.
Curation matters. The default trap is to log everything and retrieve against the pile. The pile gets noisy fast and retrieval quality drops. Some lightweight curation (periodic summarization, pruning, re-embedding when the embedding model changes) keeps the system useful past month two. Schedule it, automate it, don't skip it.
The privacy boundary lives here. Memory is the most sensitive surface in the system. It accumulates the things you'd never paste into a hosted chat, personal correspondence, financial detail, half-formed thoughts, the texture of your life. Treat the memory store with the seriousness that implies. Encrypt at rest, control which processes can read it, and never connect it to a tool you don't fully trust.
Privacy boundaries
The reason most of us are doing this in the first place. The point of building your own is that the data stays under your control. The architecture has to honor that or the project is theatre.
The default rule. Personal data (memory, documents, conversation history) never leaves the local network unless you plainly send it. The model runs locally; the embeddings run locally; the memory store is local. If a tool needs to send something out, you know about it, and you decide.
The hosted-model exception. Sometimes a frontier hosted model is the right tool for a specific task. Coding help on a hard problem; one-shot summarization where quality matters more than locality. Use hosted when you choose to; don't make it the default. The discipline is in keeping the boundary clear: this conversation is local, that one isn't, and the assistant knows the difference.
Network posture. The personal-AI surface should not be exposed to the open internet. Tailscale or equivalent for remote access; firewall everything else. The assistant is a private service; it should look like one from the network's perspective.
Backups. The memory store is irreplaceable in a way that the model weights aren't. Back it up. Test the restore. Personal AI is a long-game artifact; losing six months of accumulated context to a disk failure is a real loss.
What to build first
The temptation when you have all the pieces is to build the everything-assistant on day one. Don't. Build one workflow, end-to-end, that you actually use. Then add the second one once the first is stable.
My suggestion for a first workflow. A daily briefing assistant. It reads your calendar, pulls relevant context from your memory store and recent documents, drafts a short summary of what's happening today, and sits in your morning routine. It exercises the model, the memory, at least one MCP server, and gives you immediate honest feedback on whether the system is useful. If the daily briefing is good, the system is on the right track. If it isn't, you haven't sunk weeks into something elaborate.
What not to do first. Don't start with the always-listening voice assistant. Don't start with the autonomous-agent-running-overnight pattern. Don't start with the "replaces my therapist" project. These are real use cases for a system you trust; they aren't the place to begin while you're still learning the foundation's edges.
The honest closing
The reason to build your own personal AI in 2026 isn't that it's better than the hosted assistants on every dimension. It isn't, and pretending otherwise sets you up for disappointment. It's better on the dimensions that matter for the long game, privacy, durability, control, the accumulating personal context that compounds over years rather than getting reset by a vendor's product decision.
The foundation is ready in a way it wasn't a year ago. The starter path is shorter than it used to be. The remaining work, the bridge product that makes this accessible to people who don't want to think about MLX or memory schemas, still falls to whoever ends up shipping it. Until then, the practitioner path is the one that's available, and it's a perfectly good place to be.
If you've been waiting for the right moment to start, this is the moment. Pick the smallest hardware that fits, pick a default model, pick one workflow, and build it. The rest grows from there.