Vector Databases for RAG: Qdrant vs Milvus vs Weaviate

Choosing the right vector database determines RAG latency, cost, and reliability. This guide compares Qdrant, Milvus, and Weaviate across production benchmarks—latency, throughput, filtering, and total cost of ownership.

The Vector Database Landscape (2025)

Three open-source leaders dominate production RAG deployments: Qdrant (Rust-based, low overhead), Milvus (billion-scale proven), and Weaviate (hybrid search + knowledge graphs). Your choice determines query latency, infrastructure costs, and operational complexity.

Qdrant: Performance-First with Advanced Filtering

  • Language: Rust (low memory overhead, high concurrency)
  • GitHub Stars: ~9K (April 2025)
  • Strengths: Sophisticated payload filtering, low latency, compact footprint
  • Best For: High-QPS workloads, complex metadata filters, tight budgets
  • Production Scale: Optimized for 1M-100M vectors with sub-10ms p95 latency

Milvus: Billion-Vector Battle-Tested

  • Language: Go + C++ (production-hardened since 2019)
  • GitHub Stars: ~25K (April 2025, highest)
  • Strengths: Proven at billion-vector scale, richest feature set, fastest indexing
  • Best For: Massive datasets (100M+ vectors), heavy data engineering teams
  • Production Scale: Industrial deployments with 1B+ vectors, strong community

Weaviate: Hybrid Search Specialist

  • Language: Go (modularity-first architecture)
  • Docker Pulls: >1M/month (April 2025, highest adoption)
  • Strengths: Hybrid search (vector + BM25), knowledge graph integration, strong GraphQL API
  • Best For: Hybrid search requirements, relationship-aware retrieval, rapid prototyping
  • Caveat: Graph features can slow complex queries with multiple relationship traversals

Performance Benchmarks (2025)

Latency (p95) @ 100K Vectors, 1536-dim embeddings

Database Query Latency (ms) Indexing Time Memory Footprint
Qdrant <10ms Moderate Lowest
Milvus <15ms Fastest Moderate
Weaviate <20ms Moderate Higher (graph overhead)

Throughput (QPS) @ Recall 0.95

  • Milvus: Highest throughput when recall < 0.95, narrows at higher recall
  • Qdrant: Consistent high QPS across recall levels (70-90% utilization)
  • Weaviate: Moderate QPS, benefits from hybrid search caching

Benchmark Context Matters

Official benchmarks often use synthetic workloads. For production RAG:

  • Test with YOUR embedding model (OpenAI ada-002, Cohere, custom)
  • Include metadata filtering (date ranges, user permissions)
  • Measure at target concurrency (10 vs 100 concurrent queries = different winner)

Decision Matrix

Choose Qdrant if:

  • Need advanced payload filtering (date ranges, nested JSON, geo-queries)
  • Tight budget (low memory overhead = smaller instances)
  • High-QPS workloads requiring sub-10ms p95 latency
  • Dataset size: 1M-100M vectors (sweet spot)
  • Team comfortable with Rust ecosystem (optional customization)

Choose Milvus if:

  • Massive scale (100M-1B+ vectors) with proven reliability
  • Need richest feature set (multiple index types, GPU support)
  • Heavy data engineering team (Kafka, Spark integrations)
  • Fastest indexing time critical (real-time data ingestion)

Choose Weaviate if:

  • Hybrid search required (combine vector + keyword BM25)
  • Knowledge graph relationships important (entities + connections)
  • Rapid prototyping (GraphQL API, strong modularity)
  • Moderate scale (1M-50M vectors) with flexible schema

Deployment Quick Start

Qdrant (Docker)

docker run -p 6333:6333 qdrant/qdrant

# Python client
from qdrant_client import QdrantClient
client = QdrantClient(host="localhost", port=6333)
client.create_collection(
    collection_name="docs",
    vectors_config={"size": 1536, "distance": "Cosine"}
)

Milvus (Docker Compose)

wget https://github.com/milvus-io/milvus/releases/download/v2.3.0/milvus-standalone-docker-compose.yml -O docker-compose.yml
docker compose up -d

# Python client
from pymilvus import connections, Collection
connections.connect(host="localhost", port="19530")
collection = Collection("docs")
collection.insert([embeddings])

Weaviate (Docker)

docker run -p 8080:8080 semitechnologies/weaviate:latest

# Python client
import weaviate
client = weaviate.Client("http://localhost:8080")
client.schema.create_class({"class": "Document", "vectorizer": "none"})

Related Articles