Sid Smith - Echoes of the machine (Page 13)

AI

Model serving on Kubernetes: KServe, vLLM, and the substrate question

KServe, vLLM, Triton, BentoML, four different answers to the same question. Each layer trades operational complexity for serving features. Worth being concrete about which layer is right for which workload, tested on a single GTX 1080 Ti.

Several small computer chips of different shapes arranged in a connected cluster pattern on a dark wooden surface with thin glowing fiber-optic cables linking them

Personal AI

Distributed training without a hyperscaler bill: what's possible in 2025

The cloud bill for serious AI training is still aimed at the hyperscaler tier. The patterns for doing meaningful training without that bill have matured. Worth being explicit about what's gettable on a budget that doesn't require enterprise procurement.

AI in the News

AI in the news: week of October 12, 2025

DevDay week. OpenAI ships GPT-5 Pro in the API, Sora 2 in the API, AgentKit, the Apps SDK, ChatKit, Codex GA, and signs a six-gigawatt AMD compute deal, all on Monday. Then the Sora app starts producing the deepfakes everyone predicted. My read on a heavy week.

An open leather-bound auditor's ledger book on a dark wooden desk with a brass stamp resting on a paper certificate next to it

AI

Local LLMs and SOC 2 evidence: talking to auditors

Auditors are starting to ask about AI use in SOC 2 cycles. The shops running local LLMs have a different story to tell than the shops running cloud, and the auditors mostly haven't internalized the difference. Worth being explicit about what evidence actually answers the questions.

A geometric arrangement of six polished wooden building blocks of increasing size on a dark wooden surface with the middle block highlighted by a warm spotlight

AI

Atomic-unit architecture for AI workloads (how I think about it)

The atomic unit of an AI workload isn't the model call, isn't the request, isn't the user. It's the conversation. The architecture decisions that follow from that, caching, billing, governance, ops, all get cleaner when you start there.

AI in the News

AI in the news: week of October 5, 2025

California signs SB 53, the first US frontier-AI law. Anthropic ships Sonnet 4.5 with an Agent SDK. OpenAI ships Sora 2 with a biometric-scan social app. The AI-layoff narrative consolidates. My take on a heavy news week.

A small neatly organized wooden home-office shelf with a silver computer box, a small NAS, and a Mac mini connected by neatly coiled cables

Personal AI

Cluster of one: building an at-home AI stack worth keeping

Most home AI setups die after the novelty wears off. The ones that survive into year two share a small set of operational properties, boring, durable, owner-friendly. Worth being explicit about what makes a stack worth keeping rather than just worth building.

Three glass library card-catalog drawers stacked vertically with the top brightly lit, middle in shadow, bottom dim and dusty

Personal AI

Federated retrieval: when RAG outgrows the laptop

Retrieval-augmented generation works well when the corpus fits on one machine. The honest version of what to do when the corpus outgrows that, without rebuilding the whole stack on cloud, is more interesting than either the all-local or all-cloud framings suggest.

AI

Vector databases on Kubernetes: Qdrant, Weaviate, Milvus

Qdrant vs Weaviate vs Milvus on K8s. The foundation question for retrieval. StatefulSets, persistent volumes, replication, the operational reality. RAG indexing patterns at homelab scale on engine-01, and the decisions that change shape at fleet scale.

A polished brass railway switching tower miniature on a dark wooden surface with multiple thin tracks of different colors converging into and diverging out of it

AI

MetaMCP and the rise of MCP routing layers

MCP solved the agent-to-tool plumbing. The next layer up, routing across many MCP servers, scoping access per agent, observing what's happening, is where MetaMCP and a small cluster of similar tools have started showing up. Worth being plain about why the layer exists.

An open vintage leather-bound journal on a dark wooden desk with two ribbon bookmarks marking pages two years apart and a fountain pen alongside

AI

Two years on from the Imprint thesis: what changed, what didn't

Two years past the encoding-a-person framing. The thesis held in the parts I expected and bent in the parts I didn't. Worth being honest about what survived contact with the actual technology and what was just well-aged speculation.

AI

Helm Values as Business Standards: Decisions as Code for Kubernetes

Helm values.yaml is where Decisions as Code lives in Kubernetes. Centralized business decisions, schema-validated, composed via library charts, projected into every workload. The approach is the contribution. The tool changed.