The Vector Database Landscape (2025)
Three open-source leaders dominate production RAG deployments: Qdrant (Rust-based, low overhead), Milvus (billion-scale proven), and Weaviate (hybrid search + knowledge graphs). Your choice determines query latency, infrastructure costs, and operational complexity.
Qdrant: Performance-First with Advanced Filtering
- Language: Rust (low memory overhead, high concurrency)
- GitHub Stars: ~9K (April 2025)
- Strengths: Sophisticated payload filtering, low latency, compact footprint
- Best For: High-QPS workloads, complex metadata filters, tight budgets
- Production Scale: Optimized for 1M-100M vectors with sub-10ms p95 latency
Milvus: Billion-Vector Battle-Tested
- Language: Go + C++ (production-hardened since 2019)
- GitHub Stars: ~25K (April 2025, highest)
- Strengths: Proven at billion-vector scale, richest feature set, fastest indexing
- Best For: Massive datasets (100M+ vectors), heavy data engineering teams
- Production Scale: Industrial deployments with 1B+ vectors, strong community
Weaviate: Hybrid Search Specialist
- Language: Go (modularity-first architecture)
- Docker Pulls: >1M/month (April 2025, highest adoption)
- Strengths: Hybrid search (vector + BM25), knowledge graph integration, strong GraphQL API
- Best For: Hybrid search requirements, relationship-aware retrieval, rapid prototyping
- Caveat: Graph features can slow complex queries with multiple relationship traversals
Performance Benchmarks (2025)
Latency (p95) @ 100K Vectors, 1536-dim embeddings
| Database | Query Latency (ms) | Indexing Time | Memory Footprint |
|---|---|---|---|
| Qdrant | <10ms | Moderate | Lowest |
| Milvus | <15ms | Fastest | Moderate |
| Weaviate | <20ms | Moderate | Higher (graph overhead) |
Throughput (QPS) @ Recall 0.95
- Milvus: Highest throughput when recall < 0.95, narrows at higher recall
- Qdrant: Consistent high QPS across recall levels (70-90% utilization)
- Weaviate: Moderate QPS, benefits from hybrid search caching
Benchmark Context Matters
Official benchmarks often use synthetic workloads. For production RAG:
- Test with YOUR embedding model (OpenAI ada-002, Cohere, custom)
- Include metadata filtering (date ranges, user permissions)
- Measure at target concurrency (10 vs 100 concurrent queries = different winner)
Decision Matrix
Choose Qdrant if:
Need advanced payload filtering (date ranges, nested JSON, geo-queries) Tight budget (low memory overhead = smaller instances) High-QPS workloads requiring sub-10ms p95 latency Dataset size: 1M-100M vectors (sweet spot) Team comfortable with Rust ecosystem (optional customization)
Choose Milvus if:
Massive scale (100M-1B+ vectors) with proven reliability Need richest feature set (multiple index types, GPU support) Heavy data engineering team (Kafka, Spark integrations) Fastest indexing time critical (real-time data ingestion)
Choose Weaviate if:
Hybrid search required (combine vector + keyword BM25) Knowledge graph relationships important (entities + connections) Rapid prototyping (GraphQL API, strong modularity) Moderate scale (1M-50M vectors) with flexible schema
Deployment Quick Start
Qdrant (Docker)
docker run -p 6333:6333 qdrant/qdrant
# Python client
from qdrant_client import QdrantClient
client = QdrantClient(host="localhost", port=6333)
client.create_collection(
collection_name="docs",
vectors_config={"size": 1536, "distance": "Cosine"}
)
Milvus (Docker Compose)
wget https://github.com/milvus-io/milvus/releases/download/v2.3.0/milvus-standalone-docker-compose.yml -O docker-compose.yml
docker compose up -d
# Python client
from pymilvus import connections, Collection
connections.connect(host="localhost", port="19530")
collection = Collection("docs")
collection.insert([embeddings])
Weaviate (Docker)
docker run -p 8080:8080 semitechnologies/weaviate:latest
# Python client
import weaviate
client = weaviate.Client("http://localhost:8080")
client.schema.create_class({"class": "Document", "vectorizer": "none"})