Qdrant Tutorial — Vector Search with Python in 10 Minutes (2026)
This Qdrant Python tutorial takes you from pip install to running semantic search queries in 10 minutes. You will set up Qdrant with Docker, create a collection, upsert vectors with metadata, run filtered queries, and build a minimal RAG retrieval layer. Every code example is runnable as-is.
Who this is for:
- Beginners: You want a fast, hands-on introduction to vector databases using Qdrant
- RAG builders: You need a production-grade vector store for your retrieval-augmented generation pipeline
- Engineers evaluating options: You are comparing Qdrant against Pinecone, Weaviate, or Chroma
1. Why Qdrant for Vector Search
Section titled “1. Why Qdrant for Vector Search”Qdrant is a Rust-based open-source vector database built for speed, and this qdrant python tutorial shows you exactly how to use it. Where other vector databases are written in Go or Python, Qdrant’s Rust core delivers consistent sub-millisecond query latency at million-vector scale.
What Makes Qdrant Different
Section titled “What Makes Qdrant Different”- Rust performance — No garbage collector pauses. Query latency stays consistent under load, unlike Go-based alternatives that pause during GC cycles.
- Rich payload filtering — Filter on nested JSON fields, ranges, geo-points, and keywords during search — not after. Filters are applied inside the HNSW graph traversal.
- Open source (Apache 2.0) — Self-host with Docker, no license fees, no vector limits, no query caps. Your data stays on your infrastructure.
- gRPC + REST APIs — The Python client wraps both. gRPC gives you 2-3x faster batch operations compared to REST.
Qdrant handles 1M+ vectors with p99 query latency under 5ms on a single 4-core VM. For comparison, that same workload on an unoptimized Postgres + pgvector setup takes 50-200ms.
2. When to Use Qdrant — Real-World Use Cases
Section titled “2. When to Use Qdrant — Real-World Use Cases”Qdrant fits anywhere you need fast similarity search over high-dimensional data. Here are the five most common production use cases.
| Use Case | What You Store | Why Qdrant Fits |
|---|---|---|
| Semantic search | Document embeddings (1536-dim) | Sub-ms queries + payload filters for faceted search |
| RAG pipelines | Chunk embeddings + source metadata | Filter by document source, date, or category during retrieval |
| Recommendation systems | User/item embeddings | Real-time nearest-neighbor lookup with business rule filters |
| Image similarity | CLIP or ResNet embeddings (512-2048 dim) | Handles high-dimensional vectors with configurable distance metrics |
| Anomaly detection | Sensor/log embeddings | Distance thresholds flag outliers; payload filters scope by device or region |
For RAG pipelines specifically, Qdrant’s payload filtering is a standout feature. You can scope retrieval to specific document sources, date ranges, or content categories without post-filtering — the filter runs inside the HNSW index traversal.
3. How Qdrant Works — Core Concepts
Section titled “3. How Qdrant Works — Core Concepts”Qdrant organizes data into collections, points, vectors, and payloads. Understanding these four concepts is all you need to start building.
The Four Building Blocks
Section titled “The Four Building Blocks”-
Collection — A named group of vectors sharing the same dimensionality and distance metric. Equivalent to a table in a relational database. Each collection has its own HNSW index.
-
Point — A single record in a collection. Every point has a unique ID, a vector, and an optional payload (metadata). Points are what you upsert and query.
-
Vector — A fixed-length array of floats representing your data in embedding space. Common dimensions: 384 (MiniLM), 1536 (OpenAI text-embedding-3-small), 3072 (text-embedding-3-large).
-
Payload — Arbitrary JSON metadata attached to a point. You use payloads for filtering during search. Example:
{"source": "arxiv", "year": 2026, "category": "transformers"}.
HNSW Index — How Qdrant Finds Neighbors Fast
Section titled “HNSW Index — How Qdrant Finds Neighbors Fast”Qdrant uses HNSW (Hierarchical Navigable Small World) for approximate nearest neighbor search. HNSW builds a multi-layer graph:
- Top layers have fewer nodes and long-range connections (coarse navigation)
- Bottom layers have all nodes with short-range connections (fine-grained search)
- A query enters at the top, navigates down, and converges on the nearest neighbors
The result: 95-99% recall with sub-millisecond latency, even at millions of vectors.
Qdrant Data Flow
Section titled “Qdrant Data Flow”📊 Visual Explanation
Section titled “📊 Visual Explanation”Qdrant Vector Search — Data Flow
From raw data to query results in three stages
4. Qdrant Tutorial — Setup to First Query
Section titled “4. Qdrant Tutorial — Setup to First Query”This section walks you through the complete setup in 5 steps. You will have a running Qdrant instance with data you can query by the end.
Step 1: Start Qdrant with Docker
Section titled “Step 1: Start Qdrant with Docker”docker pull qdrant/qdrantdocker run -p 6333:6333 -p 6334:6334 qdrant/qdrantPort 6333 serves the REST API. Port 6334 serves gRPC. The dashboard is available at http://localhost:6333/dashboard.
Step 2: Install the Python Client
Section titled “Step 2: Install the Python Client”pip install qdrant-clientThe qdrant-client package supports both REST and gRPC. gRPC is faster for batch operations; REST is simpler for debugging.
Step 3: Create a Collection
Section titled “Step 3: Create a Collection”from qdrant_client import QdrantClientfrom qdrant_client.models import Distance, VectorParams
client = QdrantClient(url="http://localhost:6333")
client.create_collection( collection_name="articles", vectors_config=VectorParams( size=1536, # Match your embedding model's output dimension distance=Distance.COSINE, # Cosine similarity (most common for text) ),)Three distance metrics are available: COSINE (normalized text embeddings), EUCLID (spatial data), and DOT (when vectors are not normalized). For OpenAI embeddings, use COSINE.
Step 4: Upsert Vectors with Payloads
Section titled “Step 4: Upsert Vectors with Payloads”from qdrant_client.models import PointStruct
# In production, generate these with an embedding model# Here we use placeholder vectors for demonstrationimport randomrandom.seed(42)
points = [ PointStruct( id=1, vector=[random.uniform(-1, 1) for _ in range(1536)], payload={"title": "Intro to RAG", "source": "blog", "year": 2026} ), PointStruct( id=2, vector=[random.uniform(-1, 1) for _ in range(1536)], payload={"title": "HNSW Explained", "source": "arxiv", "year": 2025} ), PointStruct( id=3, vector=[random.uniform(-1, 1) for _ in range(1536)], payload={"title": "Vector DB Benchmarks", "source": "blog", "year": 2026} ),]
client.upsert(collection_name="articles", points=points)Each point needs a unique ID (integer or UUID), a vector matching the collection’s dimensionality, and an optional payload. The upsert operation inserts new points or updates existing ones by ID.
Step 5: Query for Similar Vectors
Section titled “Step 5: Query for Similar Vectors”query_vector = [random.uniform(-1, 1) for _ in range(1536)]
results = client.query_points( collection_name="articles", query=query_vector, limit=3,)
for point in results.points: print(f"ID: {point.id}, Score: {point.score:.4f}, Title: {point.payload['title']}")That is the complete flow: create a collection, upsert points, query. You now have a working vector search system.
5. Qdrant Architecture — Vector Search Stack
Section titled “5. Qdrant Architecture — Vector Search Stack”The full Qdrant stack has six layers, from your application code down to persistent storage.
Qdrant Architecture Layers
Section titled “Qdrant Architecture Layers”📊 Visual Explanation
Section titled “📊 Visual Explanation”Qdrant Vector Search Stack
From application code to persistent storage
Key architectural decisions:
- Segments — Qdrant splits collections into segments for parallel search. Each segment has its own HNSW index. This enables concurrent reads and background optimization.
- WAL (Write-Ahead Log) — Every write goes to the WAL first, ensuring durability. If Qdrant crashes mid-write, it recovers from the WAL on restart.
- mmap storage — Vectors can be memory-mapped from disk instead of loaded into RAM. This trades some query speed for dramatically lower memory usage at large scale.
- Quantization — Qdrant supports scalar and product quantization to compress vectors by 4-32x, reducing memory and speeding up distance calculations with minimal recall loss.
6. Qdrant Python Code Examples
Section titled “6. Qdrant Python Code Examples”These three examples cover the patterns you will use most: basic search, filtered search, and batch upsert with real embeddings.
Example 1: Basic Vector Search
Section titled “Example 1: Basic Vector Search”from qdrant_client import QdrantClient
client = QdrantClient(url="http://localhost:6333")
# Search for the 5 most similar vectorsresults = client.query_points( collection_name="articles", query=query_vector, # Your embedded query (list of floats) limit=5, with_payload=True, # Return metadata with results)
for point in results.points: print(f"{point.payload['title']} — score: {point.score:.4f}")Example 2: Filtered Search with Payload Conditions
Section titled “Example 2: Filtered Search with Payload Conditions”from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
# Find similar vectors, but only from 2026 blog postsresults = client.query_points( collection_name="articles", query=query_vector, query_filter=Filter( must=[ FieldCondition(key="source", match=MatchValue(value="blog")), FieldCondition(key="year", range=Range(gte=2026)), ] ), limit=5,)
for point in results.points: print(f"{point.payload['title']} ({point.payload['year']}) — {point.score:.4f}")Filters run inside the HNSW traversal, not as a post-processing step. This means filtered queries are nearly as fast as unfiltered ones — Qdrant does not scan all vectors and then discard non-matches.
Example 3: Batch Upsert with OpenAI Embeddings
Section titled “Example 3: Batch Upsert with OpenAI Embeddings”from openai import OpenAIfrom qdrant_client import QdrantClientfrom qdrant_client.models import PointStruct
openai_client = OpenAI()qdrant_client = QdrantClient(url="http://localhost:6333")
documents = [ {"id": 10, "text": "RAG combines retrieval with generation for grounded answers.", "source": "tutorial"}, {"id": 11, "text": "HNSW is a graph-based approximate nearest neighbor algorithm.", "source": "paper"}, {"id": 12, "text": "Qdrant supports payload filtering inside HNSW traversal.", "source": "docs"},]
# Batch embed all documents in one API calltexts = [doc["text"] for doc in documents]response = openai_client.embeddings.create(input=texts, model="text-embedding-3-small")
# Build points with embeddings + metadatapoints = [ PointStruct( id=doc["id"], vector=response.data[i].embedding, payload={"text": doc["text"], "source": doc["source"]}, ) for i, doc in enumerate(documents)]
# Upsert batchqdrant_client.upsert(collection_name="articles", points=points)print(f"Upserted {len(points)} points")For large datasets (100K+ documents), use qdrant_client.upload_points() which handles batching and parallelism automatically. The default batch size is 64 points per request.
7. Qdrant Trade-offs — When Not to Use It
Section titled “7. Qdrant Trade-offs — When Not to Use It”Qdrant is a strong choice for most vector search workloads, but it has real limitations you should know before committing.
Qdrant vs Pinecone vs Weaviate vs Chroma
Section titled “Qdrant vs Pinecone vs Weaviate vs Chroma”| Factor | Qdrant | Pinecone | Weaviate | Chroma |
|---|---|---|---|---|
| Hosting | Self-hosted + cloud | Managed only | Self-hosted + cloud | Embedded / self-hosted |
| Language | Rust | Proprietary | Go | Python |
| Hybrid search | Sparse vectors (beta) | Sparse-dense | Native BM25 fusion | No |
| Payload filtering | Rich (nested, geo, range) | Basic metadata | Module-based | Basic metadata |
| Pricing | Free (self-hosted) | Pay per operation | Free (self-hosted) | Free |
| Best for | Production self-hosted | Zero-ops teams | Hybrid search needs | Local prototyping |
For the full landscape, see the vector database comparison. For a deep dive on the managed vs self-hosted trade-off, see Pinecone vs Weaviate.
Other Gotchas
Section titled “Other Gotchas”- Memory planning — The HNSW index lives in RAM. 1M vectors at 1536 dimensions needs 6-8 GB. Underestimate this and Qdrant gets OOM-killed with no warning.
- No native hybrid search — Qdrant supports sparse vectors (beta), but does not have Weaviate-style BM25 + vector fusion. For technical content with exact-match keywords (API names, error codes), this matters.
- Single-node bottleneck — Qdrant supports distributed mode, but most teams start single-node. Plan your sharding strategy before you hit 10M vectors.
- Cold start on mmap — Memory-mapped vectors are slower on first access. If your workload has bursty traffic patterns, pre-warm the mmap cache or keep vectors in memory.
8. Qdrant Interview Questions and Answers
Section titled “8. Qdrant Interview Questions and Answers”These questions cover the Qdrant concepts that come up in GenAI engineering interviews, from architecture to production trade-offs.
Q1: “How does HNSW work in Qdrant?”
Section titled “Q1: “How does HNSW work in Qdrant?””What they are testing: Do you understand the indexing algorithm, not just the API?
Strong answer: “HNSW builds a multi-layer graph where the top layers have fewer nodes with long-range connections for coarse navigation, and the bottom layers have all nodes with short-range connections for fine-grained search. A query enters at the top layer, greedily navigates to the nearest node, drops down a layer, and repeats until it reaches the bottom. Qdrant tunes this with m (max connections per node) and ef_construct (search width during build). Higher values increase recall but slow down indexing.”
Q2: “When would you choose Qdrant over Pinecone?”
Section titled “Q2: “When would you choose Qdrant over Pinecone?””Strong answer: “Three situations: (1) data residency — if data must stay in our VPC, Pinecone is cloud-only, (2) cost at scale — self-hosted Qdrant is free; Pinecone charges per operation, and at 5M+ vectors the cost difference is 5-10x, (3) payload filtering — Qdrant supports nested conditions, geo filters, and range queries that Pinecone cannot match. I would choose Pinecone if we have no DevOps capacity and our dataset is under 1M vectors.”
Q3: “How would you build a RAG retrieval layer with Qdrant?”
Section titled “Q3: “How would you build a RAG retrieval layer with Qdrant?””Strong answer: “Chunk documents into 200-500 token segments, embed each chunk with a model like text-embedding-3-small, and upsert into a Qdrant collection with payload metadata (source, page number, date). At query time, embed the user’s question, run a filtered similarity search scoped to relevant sources, retrieve the top 3-5 chunks, and inject them as context into the LLM prompt. I would add a reranking step with a cross-encoder for production accuracy.”
Q4: “What happens when Qdrant runs out of memory?”
Section titled “Q4: “What happens when Qdrant runs out of memory?””Strong answer: “The HNSW index is loaded in RAM by default. If Qdrant exceeds available memory, the OS OOM-killer terminates the process. To prevent this: enable mmap storage for vectors (keeps vectors on disk, graph in RAM), enable scalar quantization to compress vectors by 4x, or shard across multiple nodes. Monitoring RSS memory usage and setting alerts at 80% capacity is essential for production.”
9. Qdrant in Production — Scaling and Cost
Section titled “9. Qdrant in Production — Scaling and Cost”Running Qdrant in production requires decisions about deployment topology, memory allocation, and cost trade-offs.
Deployment Options
Section titled “Deployment Options”| Deployment | Best For | Operational Burden |
|---|---|---|
| Docker (single node) | Development, small datasets (<1M vectors) | Low — docker-compose up |
| Docker Compose (replicated) | Staging, read-heavy workloads | Medium — configure replicas |
| Kubernetes (distributed) | Production, multi-million vectors | High — Helm chart, monitoring, backups |
| Qdrant Cloud | Teams without DevOps capacity | None — fully managed |
Memory and Cost Planning
Section titled “Memory and Cost Planning”| Scale | RAM Needed (1536-dim) | VM Cost (AWS) | Qdrant Cloud Cost |
|---|---|---|---|
| 100K vectors | ~1 GB | ~$15/mo (t3.small) | Free tier |
| 1M vectors | ~8 GB | ~$60/mo (r6g.large) | ~$65/mo |
| 5M vectors | ~40 GB | ~$200/mo (r6g.xlarge) | ~$300/mo |
| 20M vectors | ~160 GB | ~$800/mo (r6g.4xlarge) | Contact sales |
Cost optimization strategies:
- Scalar quantization — Compresses each float32 to int8, reducing memory by 4x with <1% recall loss. Enable with one config flag.
- mmap storage — Keeps vectors on NVMe SSD instead of RAM. Adds ~1-2ms latency but cuts RAM needs by 60-80%.
- Collection aliases — Use aliases for zero-downtime reindexing. Build the new collection, swap the alias, delete the old one.
Docker Compose for Development
Section titled “Docker Compose for Development”version: "3.8"services: qdrant: image: qdrant/qdrant:latest ports: - "6333:6333" - "6334:6334" volumes: - qdrant_data:/qdrant/storage environment: - QDRANT__SERVICE__GRPC_PORT=6334volumes: qdrant_data:For Python development environments, this Docker Compose setup gives you persistent storage across container restarts. Your vectors survive docker-compose down and come back on docker-compose up.
10. Summary and Qdrant Key Takeaways
Section titled “10. Summary and Qdrant Key Takeaways”- Qdrant is a Rust-based vector database — sub-millisecond queries at million-vector scale with no GC pauses
- Setup takes under 2 minutes —
docker run+pip install qdrant-clientand you are searching vectors - Collections store points — each point is a vector + payload (metadata) pair with a unique ID
- Payload filtering runs inside HNSW — filtered queries are nearly as fast as unfiltered, unlike post-filter approaches
- Self-hosted is free — Apache 2.0 license, no vector limits, no query caps
- Memory planning is critical — budget 6-8 GB RAM per million 1536-dim vectors; use quantization and mmap to reduce
- Production needs Kubernetes — single-node Docker works for development, but distributed mode is required for high availability at scale
Related
Section titled “Related”- Vector Database Comparison — Qdrant vs Pinecone vs Weaviate vs Chroma vs pgvector
- Pinecone vs Weaviate — Managed simplicity vs self-hosted control
- Chroma vs FAISS — Lightweight alternatives for prototyping
- RAG Architecture — How vector databases fit into retrieval-augmented generation
- RAG Chunking Strategies — Optimize what you store in Qdrant
- GenAI Interview Questions — Practice system design and architecture questions
Frequently Asked Questions
What is Qdrant and what is it used for?
Qdrant is an open-source vector database written in Rust, designed for high-performance similarity search. You use it to store, index, and query high-dimensional vectors with optional metadata (payloads). Common use cases include semantic search, RAG pipelines, recommendation systems, and image similarity search.
How do I install Qdrant with Python?
Run docker pull qdrant/qdrant and docker run -p 6333:6333 qdrant/qdrant to start the server. Then pip install qdrant-client to install the Python client. Connect with QdrantClient(url='http://localhost:6333'). The entire setup takes under 2 minutes.
What is a collection in Qdrant?
A collection in Qdrant is a named group of vectors that share the same dimensionality and distance metric. Think of it like a table in a relational database. Each collection has its own HNSW index configuration and stores points (vector + payload pairs). You create a collection by specifying the vector size and distance function (Cosine, Euclid, or Dot).
How does Qdrant compare to Pinecone?
Qdrant is open-source and self-hosted (or cloud-hosted), giving you full control over infrastructure and data residency. Pinecone is fully managed with zero ops. Qdrant supports rich payload filtering with nested conditions, while Pinecone uses simpler metadata filters. Qdrant is free to self-host; Pinecone charges per operation. Choose Qdrant for control and cost savings, Pinecone for zero operational overhead.
What is HNSW and why does Qdrant use it?
HNSW (Hierarchical Navigable Small World) is a graph-based approximate nearest neighbor algorithm. Qdrant uses HNSW because it provides sub-millisecond query latency at million-vector scale with high recall (typically 95-99%). HNSW builds a multi-layer graph where each layer has fewer nodes, enabling fast navigation from coarse to fine search results.
Can I use Qdrant for RAG pipelines?
Yes. Qdrant is commonly used as the vector store in RAG (Retrieval-Augmented Generation) pipelines. You embed your documents into vectors, store them in Qdrant with metadata payloads, and query with the user's embedded question to retrieve the most relevant chunks. Qdrant's payload filtering lets you scope retrieval by source, date, or category.
How do I filter queries in Qdrant?
Qdrant supports payload-based filtering using must, should, and must_not conditions. You can filter on exact matches, ranges, keyword contains, and nested fields. Filters are applied during the HNSW search, not after, so filtered queries remain fast. Example: Filter(must=[FieldCondition(key='category', match=MatchValue(value='python'))]).
Is Qdrant free to use?
Yes, Qdrant is open-source under the Apache 2.0 license. You can self-host it for free using Docker or Kubernetes. Qdrant also offers a managed cloud service (Qdrant Cloud) with a free tier for small workloads and paid plans for production use. Self-hosted Qdrant has no vector limits, no query limits, and no license fees.
How much memory does Qdrant need?
The HNSW index is the primary memory consumer. Roughly, 1 million vectors at 1536 dimensions requires 6-8 GB of RAM for the index. Qdrant supports memory-mapped storage (mmap) to reduce RAM usage by keeping vectors on disk while keeping the graph in memory. For production, plan 8-16 GB RAM per million 1536-dimensional vectors with comfortable headroom.
What is the difference between Qdrant and Chroma?
Chroma is an in-process embedded database ideal for prototyping and small datasets. Qdrant is a standalone server built for production workloads at scale. Chroma stores everything in memory or SQLite; Qdrant uses HNSW with configurable storage backends. Choose Chroma for quick local experiments, Qdrant when you need persistence, filtering, replication, and performance beyond 100K vectors.
Last updated: March 2026 | Qdrant v1.12+ / Python 3.10+ / qdrant-client v1.12+