Weaviate Tutorial — Hybrid Vector Search with Python (2026)
This Weaviate Python tutorial takes you from Docker setup to hybrid search queries in 10 minutes. You will install Weaviate locally, connect with the Python client v4, create collections, batch-import data, and run vector, hybrid, and filtered searches — all with runnable code.
Who this is for:
- Beginners: You want your first vector database running locally with working Python code
- RAG builders: You need hybrid search (BM25 + vector) for a retrieval-augmented generation pipeline
- Self-hosters: You want full control over your data without vendor lock-in
1. Why Weaviate for Vector Search
Section titled “1. Why Weaviate for Vector Search”Weaviate is an open-source vector database that combines vector search with keyword search in a single query — and that hybrid capability is what sets it apart from most alternatives.
The Problem Weaviate Solves
Section titled “The Problem Weaviate Solves”Pure vector search works well for general semantic similarity. But when your documents contain specific technical terms — API names, error codes, model IDs, version numbers — vector search alone misses exact matches. You search for text-embedding-3-small and get results about “small text embeddings” instead.
Weaviate solves this with hybrid search: it runs BM25 keyword search and HNSW vector search simultaneously, then fuses the results. You get semantic understanding and exact term matching in one query.
| Capability | Pure Vector DB | Weaviate Hybrid |
|---|---|---|
| Semantic similarity | Yes | Yes |
| Exact keyword match | No | Yes (BM25) |
| API name / error code search | Poor | Strong |
| Blend control | N/A | Alpha parameter (0-1) |
| Self-hosted option | Varies | Yes (Docker/K8s) |
Weaviate is free to self-host. You pay only for your infrastructure (a $50/month VM handles millions of vectors). For the managed option, Weaviate Cloud offers usage-based pricing.
2. When to Use Weaviate — Use Cases
Section titled “2. When to Use Weaviate — Use Cases”Weaviate fits specific use cases better than alternatives. Here is where it shines and where you should look elsewhere.
Best Use Cases for Weaviate
Section titled “Best Use Cases for Weaviate”| Use Case | Why Weaviate Wins |
|---|---|
| RAG with technical content | Hybrid search catches exact terms that pure vector misses |
| Multi-tenant SaaS | Native tenant isolation — each customer’s data is separated at the database level |
| Data residency requirements | Self-host in your VPC, on-prem, or air-gapped environment |
| Cost-sensitive at scale | Self-hosted Weaviate at 5M+ vectors costs 60-80% less than managed alternatives |
| Auto-vectorization pipelines | Send raw text — Weaviate calls the embedding model for you |
When NOT to Use Weaviate
Section titled “When NOT to Use Weaviate”- Zero-ops requirement — If you have no one to manage Docker or Kubernetes, use Pinecone instead. Weaviate Cloud exists but Pinecone’s managed experience is more mature.
- Simple prototype — For a weekend project, Chroma or FAISS runs in-process with zero setup.
- Purely semantic search — If you never need keyword matching and want the simplest API possible, a managed vector database saves you operational effort.
3. How Weaviate Works — Core Concepts
Section titled “3. How Weaviate Works — Core Concepts”Weaviate organizes data into collections (formerly called “classes” in v3). Each collection holds objects with properties and their vector embeddings.
Collections, Properties, and Vectorizers
Section titled “Collections, Properties, and Vectorizers”A collection is like a database table. It has a name, a set of properties (typed fields like text, int, boolean), and a vectorizer configuration that tells Weaviate how to generate embeddings.
When you insert an object, Weaviate stores the properties and either generates a vector using the configured vectorizer module (text2vec-openai, text2vec-cohere, etc.) or accepts a vector you provide manually.
The Two Search Engines Inside Weaviate
Section titled “The Two Search Engines Inside Weaviate”Weaviate runs two search engines in parallel:
- HNSW vector index — Approximate nearest neighbor search for semantic similarity
- BM25 inverted index — Classical keyword search with term frequency scoring
Hybrid search queries both indexes and merges the results using reciprocal rank fusion. The alpha parameter controls the blend: 0.0 = pure keyword, 1.0 = pure vector, 0.5 = equal weight.
Weaviate Hybrid Search Flow
Section titled “Weaviate Hybrid Search Flow”📊 Visual Explanation
Section titled “📊 Visual Explanation”Weaviate Hybrid Search — Two Paths, One Result
BM25 keyword and HNSW vector results fuse into a single ranked list
After both paths return results, Weaviate applies reciprocal rank fusion to merge them into a single ranked list. Objects that score high on both paths rank highest.
4. Weaviate Tutorial — Docker to First Query
Section titled “4. Weaviate Tutorial — Docker to First Query”This step-by-step guide gets you from zero to a working hybrid search in 10 minutes. Every command is runnable.
Step 1: Start Weaviate with Docker Compose
Section titled “Step 1: Start Weaviate with Docker Compose”Create a docker-compose.yml:
version: '3.4'services: weaviate: image: cr.weaviate.io/semitechnologies/weaviate:1.28.2 ports: - "8080:8080" - "50051:50051" environment: QUERY_DEFAULTS_LIMIT: 25 AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true' PERSISTENCE_DATA_PATH: '/var/lib/weaviate' DEFAULT_VECTORIZER_MODULE: 'text2vec-openai' ENABLE_MODULES: 'text2vec-openai' OPENAI_APIKEY: ${OPENAI_APIKEY} CLUSTER_HOSTNAME: 'node1' volumes: - weaviate_data:/var/lib/weaviate
volumes: weaviate_data:export OPENAI_APIKEY="..." # paste from platform.openai.comdocker compose up -dWeaviate is now running on http://localhost:8080. The gRPC port (50051) enables faster batch operations.
Step 2: Install the Python Client
Section titled “Step 2: Install the Python Client”pip install weaviate-clientThe v4 client requires Python 3.9+. It is a complete rewrite — do not follow v3 tutorials that use weaviate.Client().
Step 3: Connect and Create a Collection
Section titled “Step 3: Connect and Create a Collection”import weaviateimport weaviate.classes.config as wc
client = weaviate.connect_to_local()
# Create a collection with typed propertiesclient.collections.create( name="Article", vectorizer_config=wc.Configure.Vectorizer.text2vec_openai(), properties=[ wc.Property(name="title", data_type=wc.DataType.TEXT), wc.Property(name="content", data_type=wc.DataType.TEXT), wc.Property(name="category", data_type=wc.DataType.TEXT), wc.Property(name="year", data_type=wc.DataType.INT), ],)print("Collection created.")Step 4: Batch Import Data
Section titled “Step 4: Batch Import Data”articles = client.collections.get("Article")
with articles.batch.dynamic() as batch: batch.add_object(properties={ "title": "Attention Is All You Need", "content": "The Transformer architecture uses self-attention mechanisms to process sequences in parallel, replacing recurrence entirely.", "category": "architecture", "year": 2017, }) batch.add_object(properties={ "title": "BERT Pre-training", "content": "BERT uses masked language modeling and next sentence prediction to learn bidirectional representations from unlabeled text.", "category": "pre-training", "year": 2018, }) batch.add_object(properties={ "title": "RAG for Knowledge-Intensive NLP", "content": "Retrieval-augmented generation combines a retriever with a generator to access external knowledge without retraining the model.", "category": "rag", "year": 2020, }) batch.add_object(properties={ "title": "HNSW Approximate Nearest Neighbors", "content": "Hierarchical Navigable Small World graphs provide sub-millisecond approximate nearest neighbor search with high recall at billion-scale datasets.", "category": "search", "year": 2018, }) batch.add_object(properties={ "title": "BM25 Term Frequency Scoring", "content": "BM25 scores documents by term frequency and inverse document frequency, handling exact keyword matching for specific API names and error codes.", "category": "search", "year": 1994, })
print(f"Imported {len(articles)} objects.")Weaviate auto-vectorizes each object using the OpenAI embedding model you configured. No separate embedding API call needed.
Step 5: Run Your First Hybrid Search
Section titled “Step 5: Run Your First Hybrid Search”response = articles.query.hybrid( query="how do transformers process sequences", alpha=0.7, # 70% vector, 30% keyword limit=3,)
for obj in response.objects: print(f"- {obj.properties['title']} ({obj.properties['year']})")That is it — you have a working Weaviate hybrid search. The alpha=0.7 blend gives you semantic understanding with keyword grounding.
5. Weaviate Architecture — Hybrid Search Stack
Section titled “5. Weaviate Architecture — Hybrid Search Stack”The full Weaviate stack runs as a single binary (or container) with all search engines, APIs, and storage built in.
Weaviate Stack Layers
Section titled “Weaviate Stack Layers”📊 Visual Explanation
Section titled “📊 Visual Explanation”Weaviate Architecture — Full Stack
From your application code to the storage engine
The key architectural advantage: both the HNSW vector index and the BM25 inverted index live in the same process. There is no network hop between the two search engines. This makes hybrid queries fast — typically <50ms at collections under 1M objects.
6. Weaviate Python Code Examples
Section titled “6. Weaviate Python Code Examples”These three patterns cover the most common Weaviate operations: pure vector search, hybrid search with alpha tuning, and filtered queries.
Example 1: Pure Vector Search (nearText)
Section titled “Example 1: Pure Vector Search (nearText)”Use near_text when you want semantic similarity only — no keyword component:
articles = client.collections.get("Article")
response = articles.query.near_text( query="neural network attention mechanism", limit=3, return_metadata=weaviate.classes.query.MetadataQuery(distance=True),)
for obj in response.objects: print(f"{obj.properties['title']} — distance: {obj.metadata.distance:.4f}")near_text sends your query through the vectorizer, then searches the HNSW index. The distance value tells you how similar each result is (lower = more similar).
Example 2: Hybrid Search with Alpha Tuning
Section titled “Example 2: Hybrid Search with Alpha Tuning”Hybrid search is Weaviate’s primary differentiator. The alpha parameter is the single most important tuning knob:
# Heavy on keywords — good for exact term matchingresponse = articles.query.hybrid( query="BM25 term frequency", alpha=0.3, # 30% vector, 70% keyword limit=3,)
# Heavy on semantics — good for conceptual queriesresponse = articles.query.hybrid( query="how do models learn from text", alpha=0.8, # 80% vector, 20% keyword limit=3,)
# Balanced — the default starting pointresponse = articles.query.hybrid( query="retrieval augmented generation architecture", alpha=0.5, # 50/50 blend limit=3,)Rule of thumb: Start with alpha=0.5. If users search with specific terms (API names, error codes), lower alpha toward 0.3. If users ask conceptual questions, raise alpha toward 0.8.
Example 3: Filtered Search with Where Clauses
Section titled “Example 3: Filtered Search with Where Clauses”Filters narrow results before search runs, making them efficient even on large collections:
from weaviate.classes.query import Filter
# Filter by category, then hybrid search within that subsetresponse = articles.query.hybrid( query="neural network architecture", alpha=0.7, limit=3, filters=Filter.by_property("category").equal("architecture"),)
# Filter by year rangeresponse = articles.query.hybrid( query="pre-training techniques", alpha=0.6, limit=5, filters=( Filter.by_property("year").greater_or_equal(2018) & Filter.by_property("year").less_or_equal(2024) ),)
for obj in response.objects: print(f"{obj.properties['title']} ({obj.properties['year']})")Filters compose with & (AND) and | (OR). They run on the inverted index before vector search, so adding filters does not slow down queries.
7. Weaviate Trade-offs — When Not to Use It
Section titled “7. Weaviate Trade-offs — When Not to Use It”Weaviate is powerful but not the right choice for every project. Here are the real trade-offs you should know before committing.
Self-Hosting Complexity
Section titled “Self-Hosting Complexity”Running Weaviate in production means you own:
- Docker/Kubernetes operations — container health, resource limits, restarts
- Backups — Weaviate does not back up automatically; you schedule it
- Upgrades — Version bumps require planning; schema migrations can be tricky
- Monitoring — Disk space, memory (HNSW loads into RAM), query latency
For a comparison of the operational trade-offs, see Pinecone vs Weaviate.
Learning Curve
Section titled “Learning Curve”The v4 Python client is well-designed, but Weaviate has more concepts than simpler databases: collections, properties, vectorizers, modules, multi-tenancy, replication factor, HNSW parameters. Expect 2-3 days to become productive.
Memory Requirements at Scale
Section titled “Memory Requirements at Scale”Weaviate loads HNSW indexes into RAM. For 10M vectors at 1536 dimensions, plan for 60-80GB of RAM for the HNSW graph alone, plus memory for the inverted index and query processing. This is the biggest infrastructure surprise for teams scaling up.
8. Weaviate Interview Questions
Section titled “8. Weaviate Interview Questions”These questions test whether you understand Weaviate’s hybrid search model and can reason about self-hosting trade-offs — both common topics in GenAI system design interviews.
Q1: “What is hybrid search and when does it beat pure vector search?”
Section titled “Q1: “What is hybrid search and when does it beat pure vector search?””Strong answer: “Hybrid search combines BM25 keyword matching with vector similarity and fuses the results. It beats pure vector search when documents contain specific terms that must match exactly — API names like text-embedding-3-small, error codes like HTTP 429, or medical codes like ICD-10. Pure vector search maps these to nearby semantic neighbors instead of exact matches. The alpha parameter controls the blend: lower alpha for more keyword weight, higher for more semantic weight.”
Q2: “You are designing a RAG system. Why might you choose Weaviate over Pinecone?”
Section titled “Q2: “You are designing a RAG system. Why might you choose Weaviate over Pinecone?””Strong answer: “Three reasons drive the choice: hybrid search, data residency, and cost. If our documents have technical terms that need exact matching, Weaviate’s BM25 + vector fusion catches them. If the data cannot leave our VPC — healthcare, finance, government — self-hosted Weaviate runs anywhere. And at scale past 5M vectors, self-hosting is 60-80% cheaper than managed pricing.”
Q3: “What are the risks of self-hosting Weaviate?”
Section titled “Q3: “What are the risks of self-hosting Weaviate?””Strong answer: “The main risks are operational: HNSW indexes load into memory so OOM kills cause downtime, schema changes require data migration since properties are immutable after creation, and you own backups — Weaviate does not replicate to a remote store automatically. You need monitoring for memory, disk, and query latency, plus a runbook for common failures.”
Q4: “How would you tune Weaviate hybrid search for a codebase search tool?”
Section titled “Q4: “How would you tune Weaviate hybrid search for a codebase search tool?””Strong answer: “I would start with alpha around 0.3 — heavy on keywords — because code search needs exact function names, class names, and import paths. Pure vector search would return semantically similar but wrong code. I would index function signatures and docstrings as separate properties, use filtered search to scope by language or repository, and benchmark recall at different alpha values against a labeled test set.”
9. Weaviate in Production — Scaling and Cost
Section titled “9. Weaviate in Production — Scaling and Cost”Running Weaviate in production requires different infrastructure depending on your scale. Here is the progression from local development to production deployment.
Development: Docker Compose
Section titled “Development: Docker Compose”The Docker Compose setup from Section 4 works for development and small datasets (<100K objects). Single-node, single-container, no replication.
Staging: Docker with Persistent Volumes
Section titled “Staging: Docker with Persistent Volumes”Add resource limits and named volumes for data safety:
services: weaviate: image: cr.weaviate.io/semitechnologies/weaviate:1.28.2 deploy: resources: limits: memory: 4G volumes: - weaviate_data:/var/lib/weaviateProduction: Kubernetes with Replication
Section titled “Production: Kubernetes with Replication”For production workloads (>1M objects, high availability), deploy a Weaviate cluster with the official Helm chart. A 3-node cluster with replication factor 2 provides fault tolerance.
Cost Comparison at Scale
Section titled “Cost Comparison at Scale”| Scale | Weaviate Self-Hosted | Weaviate Cloud | Pinecone Serverless |
|---|---|---|---|
| <100K objects | ~$50/mo (small VM) | Free tier | Free tier |
| 1M objects | ~$100/mo (2 vCPU, 8GB) | ~$150/mo | ~$150/mo |
| 5M objects | ~$200/mo (4 vCPU, 16GB) | ~$400/mo | ~$600/mo |
| 20M objects | ~$400/mo (8 vCPU, 32GB) | ~$800/mo | ~$2,500/mo |
Self-hosted costs are infrastructure only — you also invest engineering time in operations. Factor in 2-4 hours per week for monitoring, upgrades, and incident response at production scale.
Performance Benchmarks
Section titled “Performance Benchmarks”Typical latencies on a 4-vCPU, 16GB RAM instance with 1M objects (1536 dimensions):
- Vector search (near_text): 5-15ms p99
- Hybrid search: 10-25ms p99
- Filtered hybrid search: 15-40ms p99
- Batch import throughput: 3,000-5,000 objects/second with auto-vectorization
These numbers assume SSD storage and a warm HNSW index. First queries after restart are slower while the index loads into memory.
10. Summary and Key Takeaways
Section titled “10. Summary and Key Takeaways”- Hybrid search is Weaviate’s core advantage — BM25 + vector in a single query catches both semantic matches and exact terms
- The v4 Python client is a full rewrite — use
weaviate.connect_to_local()and the collections API, not the oldweaviate.Client()pattern - Alpha parameter is the tuning knob — start at 0.5, go lower for keyword-heavy queries, higher for conceptual queries
- Self-hosting gives control and cost savings — 60-80% cheaper at 5M+ vectors, but you own operations
- Schema design matters — properties are immutable after creation; plan your collection schema before bulk import
- Docker Compose gets you started in 5 minutes — production needs Kubernetes with replication for high availability
- Weaviate Cloud exists if you want the feature set without the operational burden of self-hosting
Related
Section titled “Related”- Vector Database Comparison — Weaviate vs Pinecone vs Qdrant vs Chroma vs pgvector
- Pinecone vs Weaviate — Deep-dive comparison with pricing crossover analysis
- Chroma vs FAISS — Simpler alternatives for prototyping
- RAG Architecture — How vector databases fit into retrieval-augmented generation
- Embeddings Guide — Choosing the right embedding model for your vectors
- RAG Chunking Strategies — How to split documents before importing into Weaviate
- Advanced RAG Patterns — Reranking, query expansion, and hybrid retrieval
- Python for GenAI — Python fundamentals for building AI applications
Frequently Asked Questions
What is Weaviate and what is it used for?
Weaviate is an open-source vector database that stores objects and their vector embeddings for similarity search. It supports hybrid search (combining BM25 keyword search with vector similarity), auto-vectorization via built-in modules, and self-hosted deployment via Docker or Kubernetes. It is commonly used for RAG pipelines, semantic search, and recommendation systems.
How do I install Weaviate locally?
Run Weaviate locally with Docker Compose. Create a docker-compose.yml with the Weaviate image and your vectorizer module (like text2vec-openai), then run docker compose up -d. The server starts on port 8080. Install the Python client with pip install weaviate-client and connect using weaviate.connect_to_local().
What is hybrid search in Weaviate?
Hybrid search in Weaviate combines BM25 keyword search with vector similarity search in a single query. It uses reciprocal rank fusion to merge both result sets. You control the blend with the alpha parameter: 0 means pure keyword, 1 means pure vector, and 0.5 is an equal mix. Hybrid search outperforms pure vector search for technical content with specific terms like API names and error codes.
What is the Weaviate Python client v4 API?
The Weaviate Python client v4 is a complete rewrite of the client library with a collections-based API. Instead of raw GraphQL queries, you use typed methods like client.collections.create(), collection.data.insert(), and collection.query.hybrid(). The v4 client uses context managers for connection handling and provides IDE autocompletion for all operations.
How do I create a collection in Weaviate?
Use client.collections.create() with a name, vectorizer config, and property definitions. Each property has a name and data type (TEXT, INT, BOOLEAN, etc.). The vectorizer config tells Weaviate which module to use for auto-vectorization. After creation, get a reference with client.collections.get('CollectionName') to insert and query data.
How does Weaviate compare to Pinecone?
Weaviate is open-source and can be self-hosted, while Pinecone is fully managed and cloud-only. Weaviate has first-class hybrid search (BM25 + vector fusion), auto-vectorization, and native multi-tenancy. Pinecone offers zero operational overhead. At scale (5M+ vectors), self-hosted Weaviate can be 60-80% cheaper. Choose Weaviate for hybrid search, data residency, or cost control; choose Pinecone for zero-ops simplicity.
Can Weaviate auto-vectorize my text?
Yes. Weaviate has built-in vectorizer modules (text2vec-openai, text2vec-cohere, text2vec-huggingface, and others). When you configure a vectorizer on a collection, Weaviate automatically generates embeddings when you insert text. You do not need to call an embedding API separately. This simplifies your code and reduces the chance of embedding model mismatches.
How do I run filtered queries in Weaviate?
Use the Filter class from weaviate.classes.query to build where clauses. You can filter by any property before running vector or hybrid search. For example, Filter.by_property('category').equal('tutorial') restricts results to objects where category equals tutorial. Filters are applied before the vector search, making them efficient even on large collections.
What is HNSW in Weaviate?
HNSW (Hierarchical Navigable Small World) is the approximate nearest neighbor algorithm Weaviate uses for vector search. It builds a multi-layer graph where each layer is a navigable small world network. HNSW provides sub-millisecond query latency at high recall rates. Weaviate lets you tune HNSW parameters like ef, efConstruction, and maxConnections to optimize the recall-latency trade-off for your workload.
Is Weaviate good for RAG applications?
Yes. Weaviate is one of the top choices for RAG because hybrid search handles both semantic similarity and exact keyword matching. Technical documentation, code snippets, and API references contain specific terms that pure vector search can miss. Weaviate's hybrid search catches these while still understanding semantic meaning. It also supports multi-tenancy for SaaS RAG products and self-hosting for data residency requirements.
Last updated: March 2026 | Weaviate v1.28+ / Python client v4 / Python 3.9+