Vector databases on Kubernetes: Qdrant, Weaviate, Milvus
Qdrant vs Weaviate vs Milvus on K8s. The foundation question for retrieval. StatefulSets, persistent volumes, replication, the operational reality. RAG indexing patterns at homelab scale on engine-01, and the decisions that change shape at fleet scale.
I wrote a vector DB shootout earlier in 2025 that compared pgvector, Qdrant, and LanceDB at a single-laptop scale. That piece was about the embedded/single-node end of the spectrum. This piece is about the other end: vector databases as a long-running, multi-node K8s workload, with the operational realities that StatefulSets impose. Three contenders worth getting concrete about. Qdrant, Weaviate, and Milvus. Three different answers to the same foundation question.
I run Qdrant on engine-01 (the Linux box with the 1080 Ti) as the homelab's RAG backend. It's a single-replica StatefulSet for now because the corpus is small and the workload is mine. The fleet-scale shape I'm folding in from public reporting, conversations with people who actually use this stuff, and the documented operator behavior of all three under heavier load. The piece is meant to be useful at both scales.
What "on Kubernetes" actually means for a vector DB
A vector DB is a stateful workload by definition. The index is the database; the index is on disk; the disk has to be persistent; the persistence has to survive pod restarts and node loss. That's a different shape of K8s work than serving a stateless inference replica. The decisions you make about persistence and replication at deploy time shape the next two years of operations.
The non-negotiables:
- StatefulSet, not Deployment. Stable network identity, stable persistent volume claims, ordered rollout.
- Persistent volume backed by a storage class with snapshot support. ZFS, Ceph RBD, EBS gp3 with snapshots, Longhorn, NFS with snapshots, pick one that matches your operational story.
- Replication strategy that survives single-node loss. The shape varies by DB; the requirement doesn't.
- A backup loop that ships index snapshots to object storage on a schedule. Velero plus the storage class's snapshotter is the typical pattern.
- Resource requests that match the index, vector DBs are memory-heavy, and the OOM-kill pattern is the most common failure mode.
If any of those five are missing, the deploy will work and the operations will be a story you tell on a postmortem.
Qdrant
Qdrant is the contender I keep coming back to for new work. Written in Rust, single-binary deploy, mature K8s operator, clean cluster mode with raft-based consensus, and the API surface is small enough to fit in your head. The query model (payload-filtering plus vector search plus an HNSW index) is the median workload for most RAG use cases.
The K8s shape: a StatefulSet with three or more replicas, each with its own PVC, joined into a cluster via the Qdrant operator or via static peer config. Sharding is automatic; replication factor is per-collection. The hot index lives in memory; the on-disk format is the persistence layer. The operator handles rolling upgrades, scale-out, and the occasional shard rebalance.
Operational story: snapshots are first-class. The Qdrant API exposes a POST /collections/{name}/snapshots that produces a transferable snapshot file. Wire that to a CronJob that ships to S3 or to your local MinIO. Restore is the inverse call. The disaster-recovery story ends up cleaner than most other vector DBs because the snapshot primitive is a first-class API verb.
What it gives up: very large fleets (multi-billion vector) push Qdrant's cluster mode harder than the design wanted. The Milvus contingent will tell you Qdrant tops out around the upper-hundreds-of-millions vector range per cluster; the Qdrant team will tell you that's old data and the 2025 versions handle multi-billion. Practically, if you're under a billion vectors per collection. Qdrant is the easiest answer. If you're materially over that, the foundation question is open again.
Weaviate
Weaviate is the contender that took the modular AI-platform path. The core is a vector DB; the value-add is a layer of modules (text2vec, qna, generative) that wrap embedding models, retrievers, and LLM calls into the query path. If your team wants the database to do more than store and search. Weaviate is the contender that's leaning hardest into that.
The K8s shape: same StatefulSet pattern, with a slightly more involved Helm chart because the modules are first-class deploys. Replication is via raft. The 1.x → 2.x progression made the cluster mode genuinely production-grade; earlier versions had a less polished story.
The operational tradeoff is that the modules surface is a feature for some teams and a tax for others. If you want a vector store, Weaviate-with-everything-disabled is fine but Qdrant is simpler. If you want a retrieval-augmented platform with embeddings and generation woven in, Weaviate's the one that's leaning into that as a core competency.
The cost story for Weaviate at scale is heavier than Qdrant's, mostly because the modules pull more compute. The benefit is that the integration story for a downstream RAG application is shorter, fewer custom pieces, more out-of-the-box wiring.
Milvus
Milvus is the contender for very large fleets. The architecture is a microservice constellation: query nodes, data nodes, index nodes, root coord, query coord, data coord, mix coord, an etcd cluster for metadata, an object store (S3 or MinIO) for the persistent layer, and a message broker (Pulsar or Kafka) for the write path. It's a lot of moving parts.
The K8s shape: a Helm chart from the Milvus community or the Zilliz operator deploys the constellation. Each component scales independently. Persistent state is in S3 / MinIO; the K8s pods are mostly stateless caches and processors. The cluster scales horizontally in a way Qdrant and Weaviate don't quite match.
The tradeoff is operational complexity. A small Milvus deploy is more YAML than a small Qdrant deploy, and the failure modes are more interesting because there are more components to understand. The break-even is somewhere around "a billion vectors and tens of thousands of QPS", below that, the simpler contenders are usually the right answer; above that, Milvus's architecture earns its keep.
I've stood up the standalone (single-binary) flavor of Milvus on engine-01 to feel the shape. It's fine. The cluster flavor is a different commitment.
RAG indexing patterns
Whichever DB you pick, the indexing pattern for a serious RAG corpus has the same shape:
- Source documents land in object storage with a manifest of metadata (source URL, modified timestamp, classification, owning team).
- An indexing pipeline (Argo Workflows is the easy answer; covered in a follow-up piece) chunks the documents, embeds the chunks via a model serving endpoint, and writes the vectors plus payload metadata into the vector DB.
- A query path in the application takes the user's query, embeds it, queries the DB with payload filters (classification, source, recency), assembles the top-k chunks into the prompt context, and returns the LLM response.
- A re-index loop that detects upstream document changes, re-embeds the affected chunks, and updates the DB. The hardest part of a RAG platform; most teams under-invest in this and live with stale answers.
The vector DB is one component in that loop. Its choice doesn't change the loop's shape, but it does change the operational primitives the loop has available. Qdrant's snapshot API makes the rollback story easier. Milvus's scale story makes the multi-billion-chunk path possible. Weaviate's module story makes the embedding and re-rank steps feel native.
The DaC shape, again
This is Decisions as Code (DaC), the approach behind nearly every self-service and automation system I've designed: pull business decisions out of platform configuration into a small, curated layer (often five real decisions where the raw config exposed eighty-nine) and let the platform absorb the rest through templates and defaults the platform owns. (I called this Property Toolkit during my OneFuse days; the foundation is different, the shape isn't.)
The Helm chart for the vector DB has a values surface that should not be re-invented per deploy. Replicas, replication factor, storage class, snapshot schedule, the resource requests, the network policy, the egress rules, all of those are organizational decisions, not vector-DB-specific opinions.
The pattern from the Helm values article carries over directly: the standard values live in the standards library chart; the vector DB chart consumes them; the per-environment overrides are thin. The deploy decision becomes "which DB", everything around it is the same chassis.
Define the decisions once. Project them onto every consuming chart. Validate at the boundary. Pair with policy at admission. The vector DB is just another consumer.
What I keep coming back to
For most teams in 2025, Qdrant is the easiest answer that's still production-grade. Weaviate is the answer when the database should do more than store and search. Milvus is the answer when the corpus is materially larger than the simpler contenders comfortably handle. The decisions are not close calls if you're honest about the workload. They look close because the marketing material from all three is convergent; the operational shapes are not.
The foundation question for retrieval is the same kind of question as the foundation question for serving. Pick the contender that matches the operational story you can tell. Don't pick the contender whose architecture diagram looks coolest on the conference talk. The vector DB is going to be the most stateful component in your AI platform. The decision deserves the foundation-question framing.