Vector Databases for RAG: Qdrant vs Milvus vs Weaviate

The Vector Database Landscape (2025)

Three open-source leaders dominate production RAG deployments: Qdrant (Rust-based, low overhead), Milvus (billion-scale proven), and Weaviate (hybrid search + knowledge graphs). Your choice determines query latency, infrastructure costs, and operational complexity.

Qdrant: Performance-First with Advanced Filtering

Language: Rust (low memory overhead, high concurrency)
GitHub Stars: ~9K (April 2025)
Strengths: Sophisticated payload filtering, low latency, compact footprint
Best For: High-QPS workloads, complex metadata filters, tight budgets
Production Scale: Optimized for 1M-100M vectors with sub-10ms p95 latency

Milvus: Billion-Vector Battle-Tested

Language: Go + C++ (production-hardened since 2019)
GitHub Stars: ~25K (April 2025, highest)
Strengths: Proven at billion-vector scale, richest feature set, fastest indexing
Best For: Massive datasets (100M+ vectors), heavy data engineering teams
Production Scale: Industrial deployments with 1B+ vectors, strong community

Weaviate: Hybrid Search Specialist

Language: Go (modularity-first architecture)
Docker Pulls: >1M/month (April 2025, highest adoption)
Strengths: Hybrid search (vector + BM25), knowledge graph integration, strong GraphQL API
Best For: Hybrid search requirements, relationship-aware retrieval, rapid prototyping
Caveat: Graph features can slow complex queries with multiple relationship traversals

Performance Benchmarks (2025)

Latency (p95) @ 100K Vectors, 1536-dim embeddings

Database	Query Latency (ms)	Indexing Time	Memory Footprint
Qdrant	<10ms	Moderate	Lowest
Milvus	<15ms	Fastest	Moderate
Weaviate	<20ms	Moderate	Higher (graph overhead)

Throughput (QPS) @ Recall 0.95

Milvus: Highest throughput when recall < 0.95, narrows at higher recall
Qdrant: Consistent high QPS across recall levels (70-90% utilization)
Weaviate: Moderate QPS, benefits from hybrid search caching

Benchmark Context Matters

Official benchmarks often use synthetic workloads. For production RAG:

Test with YOUR embedding model (OpenAI ada-002, Cohere, custom)
Include metadata filtering (date ranges, user permissions)
Measure at target concurrency (10 vs 100 concurrent queries = different winner)

Decision Matrix

Choose Qdrant if:

Need advanced payload filtering (date ranges, nested JSON, geo-queries)
Tight budget (low memory overhead = smaller instances)
High-QPS workloads requiring sub-10ms p95 latency
Dataset size: 1M-100M vectors (sweet spot)
Team comfortable with Rust ecosystem (optional customization)

Choose Milvus if:

Massive scale (100M-1B+ vectors) with proven reliability
Need richest feature set (multiple index types, GPU support)
Heavy data engineering team (Kafka, Spark integrations)
Fastest indexing time critical (real-time data ingestion)

Choose Weaviate if:

Hybrid search required (combine vector + keyword BM25)
Knowledge graph relationships important (entities + connections)
Rapid prototyping (GraphQL API, strong modularity)
Moderate scale (1M-50M vectors) with flexible schema

Deployment Quick Start

Qdrant (Docker)

docker run -p 6333:6333 qdrant/qdrant

# Python client
from qdrant_client import QdrantClient
client = QdrantClient(host="localhost", port=6333)
client.create_collection(
    collection_name="docs",
    vectors_config={"size": 1536, "distance": "Cosine"}
)

Milvus (Docker Compose)

wget https://github.com/milvus-io/milvus/releases/download/v2.3.0/milvus-standalone-docker-compose.yml -O docker-compose.yml
docker compose up -d

# Python client
from pymilvus import connections, Collection
connections.connect(host="localhost", port="19530")
collection = Collection("docs")
collection.insert([embeddings])

Weaviate (Docker)

docker run -p 8080:8080 semitechnologies/weaviate:latest

# Python client
import weaviate
client = weaviate.Client("http://localhost:8080")
client.schema.create_class({"class": "Document", "vectorizer": "none"})