Skip to content

Weaviate Tutorial — Hybrid Vector Search with Python (2026)

This Weaviate Python tutorial takes you from Docker setup to hybrid search queries in 10 minutes. You will install Weaviate locally, connect with the Python client v4, create collections, batch-import data, and run vector, hybrid, and filtered searches — all with runnable code.

Who this is for:


Weaviate is an open-source vector database that combines vector search with keyword search in a single query — and that hybrid capability is what sets it apart from most alternatives.

Pure vector search works well for general semantic similarity. But when your documents contain specific technical terms — API names, error codes, model IDs, version numbers — vector search alone misses exact matches. You search for text-embedding-3-small and get results about “small text embeddings” instead.

Weaviate solves this with hybrid search: it runs BM25 keyword search and HNSW vector search simultaneously, then fuses the results. You get semantic understanding and exact term matching in one query.

CapabilityPure Vector DBWeaviate Hybrid
Semantic similarityYesYes
Exact keyword matchNoYes (BM25)
API name / error code searchPoorStrong
Blend controlN/AAlpha parameter (0-1)
Self-hosted optionVariesYes (Docker/K8s)

Weaviate is free to self-host. You pay only for your infrastructure (a $50/month VM handles millions of vectors). For the managed option, Weaviate Cloud offers usage-based pricing.


Weaviate fits specific use cases better than alternatives. Here is where it shines and where you should look elsewhere.

Use CaseWhy Weaviate Wins
RAG with technical contentHybrid search catches exact terms that pure vector misses
Multi-tenant SaaSNative tenant isolation — each customer’s data is separated at the database level
Data residency requirementsSelf-host in your VPC, on-prem, or air-gapped environment
Cost-sensitive at scaleSelf-hosted Weaviate at 5M+ vectors costs 60-80% less than managed alternatives
Auto-vectorization pipelinesSend raw text — Weaviate calls the embedding model for you
  • Zero-ops requirement — If you have no one to manage Docker or Kubernetes, use Pinecone instead. Weaviate Cloud exists but Pinecone’s managed experience is more mature.
  • Simple prototype — For a weekend project, Chroma or FAISS runs in-process with zero setup.
  • Purely semantic search — If you never need keyword matching and want the simplest API possible, a managed vector database saves you operational effort.

Weaviate organizes data into collections (formerly called “classes” in v3). Each collection holds objects with properties and their vector embeddings.

A collection is like a database table. It has a name, a set of properties (typed fields like text, int, boolean), and a vectorizer configuration that tells Weaviate how to generate embeddings.

When you insert an object, Weaviate stores the properties and either generates a vector using the configured vectorizer module (text2vec-openai, text2vec-cohere, etc.) or accepts a vector you provide manually.

Weaviate runs two search engines in parallel:

  1. HNSW vector index — Approximate nearest neighbor search for semantic similarity
  2. BM25 inverted index — Classical keyword search with term frequency scoring

Hybrid search queries both indexes and merges the results using reciprocal rank fusion. The alpha parameter controls the blend: 0.0 = pure keyword, 1.0 = pure vector, 0.5 = equal weight.

Weaviate Hybrid Search — Two Paths, One Result

BM25 keyword and HNSW vector results fuse into a single ranked list

Vector PathSemantic similarity via HNSW
User query text
Vectorize query → embedding
HNSW nearest neighbor search
Top-K by cosine similarity
Keyword PathBM25 term matching
User query text
Tokenize → terms
BM25 inverted index lookup
Top-K by term frequency score
Idle

After both paths return results, Weaviate applies reciprocal rank fusion to merge them into a single ranked list. Objects that score high on both paths rank highest.


4. Weaviate Tutorial — Docker to First Query

Section titled “4. Weaviate Tutorial — Docker to First Query”

This step-by-step guide gets you from zero to a working hybrid search in 10 minutes. Every command is runnable.

Step 1: Start Weaviate with Docker Compose

Section titled “Step 1: Start Weaviate with Docker Compose”

Create a docker-compose.yml:

version: '3.4'
services:
weaviate:
image: cr.weaviate.io/semitechnologies/weaviate:1.28.2
ports:
- "8080:8080"
- "50051:50051"
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
DEFAULT_VECTORIZER_MODULE: 'text2vec-openai'
ENABLE_MODULES: 'text2vec-openai'
OPENAI_APIKEY: ${OPENAI_APIKEY}
CLUSTER_HOSTNAME: 'node1'
volumes:
- weaviate_data:/var/lib/weaviate
volumes:
weaviate_data:
Terminal window
export OPENAI_APIKEY="..." # paste from platform.openai.com
docker compose up -d

Weaviate is now running on http://localhost:8080. The gRPC port (50051) enables faster batch operations.

Terminal window
pip install weaviate-client

The v4 client requires Python 3.9+. It is a complete rewrite — do not follow v3 tutorials that use weaviate.Client().

import weaviate
import weaviate.classes.config as wc
client = weaviate.connect_to_local()
# Create a collection with typed properties
client.collections.create(
name="Article",
vectorizer_config=wc.Configure.Vectorizer.text2vec_openai(),
properties=[
wc.Property(name="title", data_type=wc.DataType.TEXT),
wc.Property(name="content", data_type=wc.DataType.TEXT),
wc.Property(name="category", data_type=wc.DataType.TEXT),
wc.Property(name="year", data_type=wc.DataType.INT),
],
)
print("Collection created.")
articles = client.collections.get("Article")
with articles.batch.dynamic() as batch:
batch.add_object(properties={
"title": "Attention Is All You Need",
"content": "The Transformer architecture uses self-attention mechanisms to process sequences in parallel, replacing recurrence entirely.",
"category": "architecture",
"year": 2017,
})
batch.add_object(properties={
"title": "BERT Pre-training",
"content": "BERT uses masked language modeling and next sentence prediction to learn bidirectional representations from unlabeled text.",
"category": "pre-training",
"year": 2018,
})
batch.add_object(properties={
"title": "RAG for Knowledge-Intensive NLP",
"content": "Retrieval-augmented generation combines a retriever with a generator to access external knowledge without retraining the model.",
"category": "rag",
"year": 2020,
})
batch.add_object(properties={
"title": "HNSW Approximate Nearest Neighbors",
"content": "Hierarchical Navigable Small World graphs provide sub-millisecond approximate nearest neighbor search with high recall at billion-scale datasets.",
"category": "search",
"year": 2018,
})
batch.add_object(properties={
"title": "BM25 Term Frequency Scoring",
"content": "BM25 scores documents by term frequency and inverse document frequency, handling exact keyword matching for specific API names and error codes.",
"category": "search",
"year": 1994,
})
print(f"Imported {len(articles)} objects.")

Weaviate auto-vectorizes each object using the OpenAI embedding model you configured. No separate embedding API call needed.

response = articles.query.hybrid(
query="how do transformers process sequences",
alpha=0.7, # 70% vector, 30% keyword
limit=3,
)
for obj in response.objects:
print(f"- {obj.properties['title']} ({obj.properties['year']})")

That is it — you have a working Weaviate hybrid search. The alpha=0.7 blend gives you semantic understanding with keyword grounding.


5. Weaviate Architecture — Hybrid Search Stack

Section titled “5. Weaviate Architecture — Hybrid Search Stack”

The full Weaviate stack runs as a single binary (or container) with all search engines, APIs, and storage built in.

Weaviate Architecture — Full Stack

From your application code to the storage engine

Your Application
RAG pipeline, search UI, recommendation engine
Weaviate Python Client v4
Typed collections API with IDE autocompletion
REST + GraphQL + gRPC API
HTTP for queries, gRPC for fast batch imports
HNSW Vector Index + BM25 Inverted Index
Dual search engines with reciprocal rank fusion
Vectorizer Modules
text2vec-openai, text2vec-cohere, text2vec-huggingface
Storage Engine (Docker / Kubernetes)
Persistent volumes, data replication, backups
Idle

The key architectural advantage: both the HNSW vector index and the BM25 inverted index live in the same process. There is no network hop between the two search engines. This makes hybrid queries fast — typically <50ms at collections under 1M objects.


These three patterns cover the most common Weaviate operations: pure vector search, hybrid search with alpha tuning, and filtered queries.

Use near_text when you want semantic similarity only — no keyword component:

articles = client.collections.get("Article")
response = articles.query.near_text(
query="neural network attention mechanism",
limit=3,
return_metadata=weaviate.classes.query.MetadataQuery(distance=True),
)
for obj in response.objects:
print(f"{obj.properties['title']} — distance: {obj.metadata.distance:.4f}")

near_text sends your query through the vectorizer, then searches the HNSW index. The distance value tells you how similar each result is (lower = more similar).

Example 2: Hybrid Search with Alpha Tuning

Section titled “Example 2: Hybrid Search with Alpha Tuning”

Hybrid search is Weaviate’s primary differentiator. The alpha parameter is the single most important tuning knob:

# Heavy on keywords — good for exact term matching
response = articles.query.hybrid(
query="BM25 term frequency",
alpha=0.3, # 30% vector, 70% keyword
limit=3,
)
# Heavy on semantics — good for conceptual queries
response = articles.query.hybrid(
query="how do models learn from text",
alpha=0.8, # 80% vector, 20% keyword
limit=3,
)
# Balanced — the default starting point
response = articles.query.hybrid(
query="retrieval augmented generation architecture",
alpha=0.5, # 50/50 blend
limit=3,
)

Rule of thumb: Start with alpha=0.5. If users search with specific terms (API names, error codes), lower alpha toward 0.3. If users ask conceptual questions, raise alpha toward 0.8.

Example 3: Filtered Search with Where Clauses

Section titled “Example 3: Filtered Search with Where Clauses”

Filters narrow results before search runs, making them efficient even on large collections:

from weaviate.classes.query import Filter
# Filter by category, then hybrid search within that subset
response = articles.query.hybrid(
query="neural network architecture",
alpha=0.7,
limit=3,
filters=Filter.by_property("category").equal("architecture"),
)
# Filter by year range
response = articles.query.hybrid(
query="pre-training techniques",
alpha=0.6,
limit=5,
filters=(
Filter.by_property("year").greater_or_equal(2018)
& Filter.by_property("year").less_or_equal(2024)
),
)
for obj in response.objects:
print(f"{obj.properties['title']} ({obj.properties['year']})")

Filters compose with & (AND) and | (OR). They run on the inverted index before vector search, so adding filters does not slow down queries.


7. Weaviate Trade-offs — When Not to Use It

Section titled “7. Weaviate Trade-offs — When Not to Use It”

Weaviate is powerful but not the right choice for every project. Here are the real trade-offs you should know before committing.

Running Weaviate in production means you own:

  • Docker/Kubernetes operations — container health, resource limits, restarts
  • Backups — Weaviate does not back up automatically; you schedule it
  • Upgrades — Version bumps require planning; schema migrations can be tricky
  • Monitoring — Disk space, memory (HNSW loads into RAM), query latency

For a comparison of the operational trade-offs, see Pinecone vs Weaviate.

The v4 Python client is well-designed, but Weaviate has more concepts than simpler databases: collections, properties, vectorizers, modules, multi-tenancy, replication factor, HNSW parameters. Expect 2-3 days to become productive.

Weaviate loads HNSW indexes into RAM. For 10M vectors at 1536 dimensions, plan for 60-80GB of RAM for the HNSW graph alone, plus memory for the inverted index and query processing. This is the biggest infrastructure surprise for teams scaling up.


These questions test whether you understand Weaviate’s hybrid search model and can reason about self-hosting trade-offs — both common topics in GenAI system design interviews.

Section titled “Q1: “What is hybrid search and when does it beat pure vector search?””

Strong answer: “Hybrid search combines BM25 keyword matching with vector similarity and fuses the results. It beats pure vector search when documents contain specific terms that must match exactly — API names like text-embedding-3-small, error codes like HTTP 429, or medical codes like ICD-10. Pure vector search maps these to nearby semantic neighbors instead of exact matches. The alpha parameter controls the blend: lower alpha for more keyword weight, higher for more semantic weight.”

Q2: “You are designing a RAG system. Why might you choose Weaviate over Pinecone?”

Section titled “Q2: “You are designing a RAG system. Why might you choose Weaviate over Pinecone?””

Strong answer: “Three reasons drive the choice: hybrid search, data residency, and cost. If our documents have technical terms that need exact matching, Weaviate’s BM25 + vector fusion catches them. If the data cannot leave our VPC — healthcare, finance, government — self-hosted Weaviate runs anywhere. And at scale past 5M vectors, self-hosting is 60-80% cheaper than managed pricing.”

Q3: “What are the risks of self-hosting Weaviate?”

Section titled “Q3: “What are the risks of self-hosting Weaviate?””

Strong answer: “The main risks are operational: HNSW indexes load into memory so OOM kills cause downtime, schema changes require data migration since properties are immutable after creation, and you own backups — Weaviate does not replicate to a remote store automatically. You need monitoring for memory, disk, and query latency, plus a runbook for common failures.”

Q4: “How would you tune Weaviate hybrid search for a codebase search tool?”

Section titled “Q4: “How would you tune Weaviate hybrid search for a codebase search tool?””

Strong answer: “I would start with alpha around 0.3 — heavy on keywords — because code search needs exact function names, class names, and import paths. Pure vector search would return semantically similar but wrong code. I would index function signatures and docstrings as separate properties, use filtered search to scope by language or repository, and benchmark recall at different alpha values against a labeled test set.”


9. Weaviate in Production — Scaling and Cost

Section titled “9. Weaviate in Production — Scaling and Cost”

Running Weaviate in production requires different infrastructure depending on your scale. Here is the progression from local development to production deployment.

The Docker Compose setup from Section 4 works for development and small datasets (<100K objects). Single-node, single-container, no replication.

Add resource limits and named volumes for data safety:

services:
weaviate:
image: cr.weaviate.io/semitechnologies/weaviate:1.28.2
deploy:
resources:
limits:
memory: 4G
volumes:
- weaviate_data:/var/lib/weaviate

For production workloads (>1M objects, high availability), deploy a Weaviate cluster with the official Helm chart. A 3-node cluster with replication factor 2 provides fault tolerance.

ScaleWeaviate Self-HostedWeaviate CloudPinecone Serverless
<100K objects~$50/mo (small VM)Free tierFree tier
1M objects~$100/mo (2 vCPU, 8GB)~$150/mo~$150/mo
5M objects~$200/mo (4 vCPU, 16GB)~$400/mo~$600/mo
20M objects~$400/mo (8 vCPU, 32GB)~$800/mo~$2,500/mo

Self-hosted costs are infrastructure only — you also invest engineering time in operations. Factor in 2-4 hours per week for monitoring, upgrades, and incident response at production scale.

Typical latencies on a 4-vCPU, 16GB RAM instance with 1M objects (1536 dimensions):

  • Vector search (near_text): 5-15ms p99
  • Hybrid search: 10-25ms p99
  • Filtered hybrid search: 15-40ms p99
  • Batch import throughput: 3,000-5,000 objects/second with auto-vectorization

These numbers assume SSD storage and a warm HNSW index. First queries after restart are slower while the index loads into memory.


  • Hybrid search is Weaviate’s core advantage — BM25 + vector in a single query catches both semantic matches and exact terms
  • The v4 Python client is a full rewrite — use weaviate.connect_to_local() and the collections API, not the old weaviate.Client() pattern
  • Alpha parameter is the tuning knob — start at 0.5, go lower for keyword-heavy queries, higher for conceptual queries
  • Self-hosting gives control and cost savings — 60-80% cheaper at 5M+ vectors, but you own operations
  • Schema design matters — properties are immutable after creation; plan your collection schema before bulk import
  • Docker Compose gets you started in 5 minutes — production needs Kubernetes with replication for high availability
  • Weaviate Cloud exists if you want the feature set without the operational burden of self-hosting

Frequently Asked Questions

What is Weaviate and what is it used for?

Weaviate is an open-source vector database that stores objects and their vector embeddings for similarity search. It supports hybrid search (combining BM25 keyword search with vector similarity), auto-vectorization via built-in modules, and self-hosted deployment via Docker or Kubernetes. It is commonly used for RAG pipelines, semantic search, and recommendation systems.

How do I install Weaviate locally?

Run Weaviate locally with Docker Compose. Create a docker-compose.yml with the Weaviate image and your vectorizer module (like text2vec-openai), then run docker compose up -d. The server starts on port 8080. Install the Python client with pip install weaviate-client and connect using weaviate.connect_to_local().

What is hybrid search in Weaviate?

Hybrid search in Weaviate combines BM25 keyword search with vector similarity search in a single query. It uses reciprocal rank fusion to merge both result sets. You control the blend with the alpha parameter: 0 means pure keyword, 1 means pure vector, and 0.5 is an equal mix. Hybrid search outperforms pure vector search for technical content with specific terms like API names and error codes.

What is the Weaviate Python client v4 API?

The Weaviate Python client v4 is a complete rewrite of the client library with a collections-based API. Instead of raw GraphQL queries, you use typed methods like client.collections.create(), collection.data.insert(), and collection.query.hybrid(). The v4 client uses context managers for connection handling and provides IDE autocompletion for all operations.

How do I create a collection in Weaviate?

Use client.collections.create() with a name, vectorizer config, and property definitions. Each property has a name and data type (TEXT, INT, BOOLEAN, etc.). The vectorizer config tells Weaviate which module to use for auto-vectorization. After creation, get a reference with client.collections.get('CollectionName') to insert and query data.

How does Weaviate compare to Pinecone?

Weaviate is open-source and can be self-hosted, while Pinecone is fully managed and cloud-only. Weaviate has first-class hybrid search (BM25 + vector fusion), auto-vectorization, and native multi-tenancy. Pinecone offers zero operational overhead. At scale (5M+ vectors), self-hosted Weaviate can be 60-80% cheaper. Choose Weaviate for hybrid search, data residency, or cost control; choose Pinecone for zero-ops simplicity.

Can Weaviate auto-vectorize my text?

Yes. Weaviate has built-in vectorizer modules (text2vec-openai, text2vec-cohere, text2vec-huggingface, and others). When you configure a vectorizer on a collection, Weaviate automatically generates embeddings when you insert text. You do not need to call an embedding API separately. This simplifies your code and reduces the chance of embedding model mismatches.

How do I run filtered queries in Weaviate?

Use the Filter class from weaviate.classes.query to build where clauses. You can filter by any property before running vector or hybrid search. For example, Filter.by_property('category').equal('tutorial') restricts results to objects where category equals tutorial. Filters are applied before the vector search, making them efficient even on large collections.

What is HNSW in Weaviate?

HNSW (Hierarchical Navigable Small World) is the approximate nearest neighbor algorithm Weaviate uses for vector search. It builds a multi-layer graph where each layer is a navigable small world network. HNSW provides sub-millisecond query latency at high recall rates. Weaviate lets you tune HNSW parameters like ef, efConstruction, and maxConnections to optimize the recall-latency trade-off for your workload.

Is Weaviate good for RAG applications?

Yes. Weaviate is one of the top choices for RAG because hybrid search handles both semantic similarity and exact keyword matching. Technical documentation, code snippets, and API references contain specific terms that pure vector search can miss. Weaviate's hybrid search catches these while still understanding semantic meaning. It also supports multi-tenancy for SaaS RAG products and self-hosting for data residency requirements.

Last updated: March 2026 | Weaviate v1.28+ / Python client v4 / Python 3.9+