How to Connect LLM to Database — Text-to-SQL & RAG Patterns (2026)

Q: How do you connect an LLM to a database?

You connect an LLM to a database using two main patterns: text-to-SQL (the LLM generates a SQL query from a natural language question, which is executed against the database) or RAG over structured data (database rows are embedded and stored in a vector database for semantic retrieval). Text-to-SQL works best for analytical queries over known schemas. RAG over structured data works best when the database contains free-text content or when semantic search is more useful than exact filtering.

Q: How do you make text-to-SQL safe in production?

Production text-to-SQL safety requires four layers: (1) read-only database users — the LLM connection should never have INSERT, UPDATE, DELETE, or DDL privileges; (2) SQL allowlisting — only allow SELECT statements, reject anything containing semicolons, DROP, INSERT, or EXECUTE; (3) schema validation — provide the LLM with only the tables and columns it needs, not the entire schema; (4) query sandboxing — set row limits (LIMIT 1000), query timeouts, and cost limits before executing any LLM-generated SQL.

Q: When should you use an LLM instead of a traditional ORM?

Use a traditional ORM (SQLAlchemy, Django ORM) when queries are known, predictable, and performance-critical. Use LLM-generated SQL when the question space is open-ended and users need to ask questions the developers did not anticipate — analytics dashboards, internal data tools, customer-facing search. LLMs add latency (300-800ms for SQL generation) and non-determinism; ORMs add developer time but produce reliable, auditable queries. Many production systems use both: ORM for application logic, LLM for ad-hoc analytical queries.

Q: What is RAG over a database and when should you use it?

RAG over a database means embedding database rows (or summaries of rows) into a vector store and using semantic similarity search to find relevant records before generating an LLM response. Use it when your database contains free-text fields (product descriptions, support tickets, knowledge articles) or when users ask conceptual questions that cannot be expressed as exact SQL filters. RAG over databases complements text-to-SQL rather than replacing it — use text-to-SQL for aggregation and filtering, RAG for semantic search and question answering over text-heavy records.

Q: How does schema introspection work for text-to-SQL?

Schema introspection uses SQLAlchemy's inspect() to read table names and column definitions from the live database at runtime. The schema description is formatted as text listing each table with its columns and types, then passed to the LLM in the system prompt. Filtering the schema to expose only allowed tables prevents the LLM from accessing sensitive data like payment methods or credentials.

Q: How do you handle errors when LLM-generated SQL fails?

Implement a retry loop that feeds the database error message back to the LLM as a follow-up prompt, asking it to fix the SQL. This self-correction pattern typically resolves wrong column names or incorrect syntax within 1-2 retries. Log the generated SQL alongside the user question and error for debugging. If retries are exhausted, return a clear error rather than a silent failure.

Q: How do you embed database records into a vector store for RAG?

Serialize each database row into a rich text representation combining key fields (name, category, description, price), then generate an embedding vector using a model like text-embedding-3-small. Store the vector in a vector database (ChromaDB, Pinecone) alongside the row's primary key and metadata. At query time, embed the user's question and retrieve the most semantically similar records.

Q: What is a database agent and how does it differ from basic text-to-SQL?

A database agent is a multi-step LLM system that can plan queries, correct errors, join results across multiple steps, and synthesize findings into a coherent answer. Unlike basic text-to-SQL which is single-turn, agents use tool-calling to invoke SQL execution and semantic search as needed, iterating until they have enough data to produce a complete response.

Q: How do you scale a text-to-SQL system for high query volumes?

Five requirements at scale: connection pooling with SQLAlchemy QueuePool for concurrent workers, query caching with a TTL matching data freshness needs, async execution with async SQLAlchemy to handle concurrent queries without blocking, query cost monitoring to alert on expensive table scans, and audit logging of every generated query for compliance and debugging.

Q: Why should you use a read-only database user for LLM queries?

A read-only database user with SELECT-only privileges is a database-level control that no prompt injection can override. Even if an LLM is manipulated into generating a DROP TABLE or DELETE statement, the database rejects it at the permission level. This is the first and most important safety layer, supplemented by application-level SQL validation and schema filtering.

1. Why LLM Database Integration Matters

Every business runs on structured data. Orders live in Postgres. Metrics live in Snowflake. Customer records live in MySQL. For decades, the only way to get answers from that data was SQL — a language that takes years to learn well and that most business users never master.

LLMs change this equation. A product manager can now type “which customers churned last quarter and what was their average contract value?” and receive a SQL query, executed against the production warehouse, with a plain-English summary of the results — in under two seconds. No SQL knowledge required. No waiting for an analyst.

This is the promise of LLM-to-database integration. But the implementation is not as simple as passing a question to an LLM. Production systems require safety layers to prevent destructive queries, schema management to give the LLM accurate context, error recovery to handle the inevitable hallucinated column names, and architectural decisions about when to use text-to-SQL versus RAG versus a hybrid approach.

This guide builds all of it — from the basic text-to-SQL pattern through production-grade database agents.

2. Why LLM + Database Integration Matters

The gap between data and decisions is expensive. Typical analytics workflows look like this: a business user has a question, submits a request to the data team, waits two to five days for a response, reviews the answer, realizes the question was slightly wrong, and repeats the cycle. At scale, this bottleneck limits how quickly organizations can act on data.

LLM-to-database integration compresses that cycle to seconds. The practical applications are significant:

Internal analytics tools: Business users query data warehouses in natural language without writing SQL
Customer-facing search: Products surface relevant records from databases using semantic queries instead of keyword filters
Automated reporting: Agents pull live data, run computations, and generate narrative summaries
Data exploration: Engineers explore unfamiliar schemas by asking questions instead of reading table definitions

The technical approaches divide into two families. Text-to-SQL works by generating a SQL query from the user’s question and executing it. This is optimal for aggregations, filters, and computations on structured data. RAG over structured data works by embedding database content into a vector store and retrieving relevant rows semantically. This is optimal when records contain free-text fields or when the question is conceptual rather than computational.

Most production systems use both. Understanding when to use each — and how to combine them — is the core competency this guide builds.

If you are new to Python patterns for LLM applications, start with Python for GenAI Engineers for the async and Pydantic foundations that underpin everything in this guide.

3. Text-to-SQL: The Basic Pattern

Text-to-SQL converts a natural language question into an executable SQL query. The minimal implementation is three steps: provide the LLM with the database schema, ask it to generate SQL, execute the query.

Schema Introspection

The LLM needs to know what tables and columns exist before it can write correct SQL. The most reliable approach is to introspect the live schema at runtime:

# Requires: sqlalchemy>=2.0.0, openai>=1.0.0, psycopg2-binary>=2.9.0
import sqlalchemy as sa
from openai import OpenAI

def get_schema_description(engine: sa.Engine) -> str:
    """Introspect the database schema and return a formatted description."""
    inspector = sa.inspect(engine)
    lines = []

    for table_name in inspector.get_table_names():
        columns = inspector.get_columns(table_name)
        col_descriptions = ", ".join(
            f"{col['name']} ({col['type']})" for col in columns
        )
        lines.append(f"Table: {table_name}\n  Columns: {col_descriptions}")

    return "\n\n".join(lines)

# Build the engine (read-only user — see Safety section)
engine = sa.create_engine(
    "postgresql+psycopg2://readonly_user:password@localhost/mydb"
)

schema_description = get_schema_description(engine)
print(schema_description)
# Table: orders
#   Columns: id (INTEGER), customer_id (INTEGER), total (NUMERIC), created_at (TIMESTAMP)
#
# Table: customers
#   Columns: id (INTEGER), email (VARCHAR), plan (VARCHAR), churned_at (TIMESTAMP)

Generating and Executing SQL

With the schema in hand, pass it to the LLM alongside the user’s question:

client = OpenAI()

def text_to_sql(question: str, schema: str, engine: sa.Engine) -> dict:
    """
    Convert a natural language question to SQL and execute it.
    Returns the query and results, or an error message.
    """
    system_prompt = f"""You are a SQL expert. Given a database schema and a question,
write a single valid PostgreSQL SELECT query that answers the question.
Return ONLY the SQL query — no explanation, no markdown, no code fences.
Never use INSERT, UPDATE, DELETE, DROP, CREATE, ALTER, or TRUNCATE.
Always add LIMIT 1000 unless the question explicitly asks for all records.

Schema:
{schema}"""

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": question},
        ],
        temperature=0,  # deterministic SQL generation
    )

    sql_query = response.choices[0].message.content.strip()

    # Validate before executing — see Safety section for full validation
    if not sql_query.upper().startswith("SELECT"):
        return {"error": "LLM generated non-SELECT query", "query": sql_query}

    with engine.connect() as conn:
        result = conn.execute(sa.text(sql_query))
        rows = result.fetchall()
        columns = result.keys()

    return {
        "query": sql_query,
        "columns": list(columns),
        "rows": [dict(zip(columns, row)) for row in rows],
        "row_count": len(rows),
    }

# Usage
result = text_to_sql(
    question="What is the total revenue per customer plan for the last 30 days?",
    schema=schema_description,
    engine=engine,
)
print(result["query"])
# SELECT c.plan, SUM(o.total) AS total_revenue
# FROM orders o JOIN customers c ON o.customer_id = c.id
# WHERE o.created_at >= NOW() - INTERVAL '30 days'
# GROUP BY c.plan ORDER BY total_revenue DESC LIMIT 1000

Handling SQL Errors with Retry

LLM-generated SQL occasionally references wrong column names or uses incorrect syntax. A simple retry loop with the error feedback dramatically improves reliability:

def text_to_sql_with_retry(
    question: str, schema: str, engine: sa.Engine, max_retries: int = 2
) -> dict:
    """Text-to-SQL with automatic error correction."""
    messages = [
        {"role": "system", "content": build_system_prompt(schema)},
        {"role": "user", "content": question},
    ]

    for attempt in range(max_retries + 1):
        response = client.chat.completions.create(
            model="gpt-4o", messages=messages, temperature=0
        )
        sql_query = response.choices[0].message.content.strip()
        messages.append({"role": "assistant", "content": sql_query})

        try:
            with engine.connect() as conn:
                result = conn.execute(sa.text(sql_query))
                rows = result.fetchall()
                columns = list(result.keys())
            return {"query": sql_query, "columns": columns, "rows": rows}

        except Exception as e:
            if attempt == max_retries:
                return {"error": str(e), "query": sql_query}

            # Feed the error back to the LLM for self-correction
            messages.append({
                "role": "user",
                "content": f"That query failed with error: {e}\nPlease fix the SQL and try again.",
            })

    return {"error": "Max retries exceeded"}

4. Text-to-SQL Pipeline — Visual Architecture

The diagram below shows the full end-to-end pipeline — from natural language input through schema assembly, LLM SQL generation, safety validation, execution, and grounded answer output.

📊 Visual Explanation

Text-to-SQL Pipeline — End to End

Natural language enters. A validated SQL query runs against the database. A grounded, structured answer exits. Each stage has a failure mode that requires explicit handling.

Input & SchemaContext assembly

User Question

Natural language

Schema Introspection

Tables + columns

Filtered Schema

Relevant tables only

System Prompt Assembly

Schema + constraints

LLM SQL GenerationQuery synthesis

LLM Call (temp=0)

Deterministic output

SQL Extraction

Strip markdown/prose

Safety Validation

SELECT-only allowlist

Syntax Check

Parse before execute

ExecutionSafe query run

Read-Only Connection

Restricted DB user

Query Execution

With row + time limits

Error Retry Loop

Feed error back to LLM

Result Serialization

Rows → dicts

Answer GenerationGrounded response

Result Summarization

LLM narrates data

Chart Suggestion

Optional visualization

Structured Response

Query + data + prose

Audit Logging

Query + user + timestamp

Idle

5. Safety Patterns: Read-Only, Sandboxing, Schema Validation

Text-to-SQL without safety controls is a security liability. An LLM can be prompted — accidentally or maliciously — to generate destructive SQL. The following layers are non-negotiable in production.

Layer 1: Read-Only Database Users

Create a dedicated database user for LLM queries with SELECT-only privileges:

-- PostgreSQL: create a restricted user for LLM queries
CREATE USER llm_readonly WITH PASSWORD 'strong_password';

-- Grant read access to specific schemas only
GRANT CONNECT ON DATABASE mydb TO llm_readonly;
GRANT USAGE ON SCHEMA public TO llm_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO llm_readonly;

-- Ensure future tables are also read-only for this user
ALTER DEFAULT PRIVILEGES IN SCHEMA public
  GRANT SELECT ON TABLES TO llm_readonly;

-- Explicitly deny write operations (redundant but explicit)
REVOKE INSERT, UPDATE, DELETE, TRUNCATE ON ALL TABLES IN SCHEMA public FROM llm_readonly;

Layer 2: SQL Allowlisting and Injection Prevention

Validate the generated SQL before executing it. Never rely solely on the LLM’s instruction-following:

# Requires: sqlparse>=0.5.0
import re
import sqlparse

BLOCKED_KEYWORDS = {
    "INSERT", "UPDATE", "DELETE", "DROP", "CREATE", "ALTER",
    "TRUNCATE", "GRANT", "REVOKE", "EXECUTE", "EXEC",
    "xp_cmdshell", "sp_executesql",
}

def validate_sql(sql: str) -> tuple[bool, str]:
    """
    Validate that SQL is safe to execute.
    Returns (is_safe, reason_if_unsafe).
    """
    # Strip comments first
    sql_clean = sqlparse.format(sql, strip_comments=True).strip()

    # Must start with SELECT
    if not sql_clean.upper().startswith("SELECT"):
        return False, "Query must start with SELECT"

    # Check for blocked keywords (case-insensitive, word boundary match)
    tokens = re.findall(r'\b[A-Z_]+\b', sql_clean.upper())
    for token in tokens:
        if token in BLOCKED_KEYWORDS:
            return False, f"Blocked keyword found: {token}"

    # No semicolons mid-query (prevents statement stacking)
    if sql_clean.count(";") > 1:
        return False, "Multiple statements detected"

    # Ensure LIMIT is present (prevent unbounded scans)
    if "LIMIT" not in sql_clean.upper():
        sql_clean = sql_clean.rstrip(";") + " LIMIT 1000"

    return True, sql_clean

# Usage
is_safe, result = validate_sql("SELECT * FROM customers; DROP TABLE customers;")
# is_safe = False, result = "Multiple statements detected"

is_safe, validated_sql = validate_sql("SELECT email FROM customers WHERE plan = 'pro'")
# is_safe = True, validated_sql = "SELECT email FROM customers WHERE plan = 'pro' LIMIT 1000"

Layer 3: Filtered Schema Exposure

Do not expose the entire database schema to the LLM. Sensitive tables (payment details, PII, audit logs) should be excluded. Provide only the tables relevant to the user’s domain:

# Define which tables are safe to expose to the LLM
ALLOWED_TABLES = {
    "orders", "products", "customers", "order_items", "categories"
}

# Tables to exclude from schema introspection
EXCLUDED_TABLES = {
    "user_credentials", "payment_methods", "audit_logs",
    "admin_sessions", "api_keys"
}

def get_filtered_schema(engine: sa.Engine, allowed: set[str]) -> str:
    inspector = sa.inspect(engine)
    lines = []

    for table_name in inspector.get_table_names():
        if table_name not in allowed:
            continue
        columns = inspector.get_columns(table_name)
        col_descriptions = ", ".join(
            f"{col['name']} ({col['type']})" for col in columns
        )
        lines.append(f"Table: {table_name}\n  Columns: {col_descriptions}")

    return "\n\n".join(lines)

Layer 4: Query Timeouts and Row Limits

Set server-side limits to prevent runaway queries:

# Requires: sqlalchemy>=2.0.0
from contextlib import contextmanager

@contextmanager
def safe_db_connection(engine: sa.Engine, timeout_ms: int = 5000):
    """Database connection with query timeout and resource limits."""
    with engine.connect() as conn:
        # PostgreSQL: kill query if it runs longer than timeout_ms
        conn.execute(sa.text(f"SET statement_timeout = {timeout_ms}"))
        # Limit working memory to prevent large sort operations
        conn.execute(sa.text("SET work_mem = '64MB'"))
        yield conn

6. RAG Over Structured Data

Text-to-SQL handles analytical queries well — aggregations, filters, joins. But many databases contain text-heavy records: product descriptions, support tickets, knowledge articles, contract clauses. For these, semantic search outperforms exact SQL filters.

RAG over structured data embeds database records into a vector store and retrieves them by semantic similarity before generating a response. See RAG Architecture for the full retrieval pipeline.

When to Use RAG Instead of Text-to-SQL

Query Type	Best Approach	Why
”Total revenue last month”	Text-to-SQL	Exact aggregation
”Customers who churned”	Text-to-SQL	Structured filter
”Products similar to noise-cancelling headphones”	RAG	Semantic similarity
”Support tickets about billing errors”	RAG	Free-text search
”What does our refund policy say about…”	RAG	Unstructured content
”Top 5 products by category for Q3”	Text-to-SQL	Aggregation + filter

Embedding Database Records

The core technique is to serialize each row into a text representation, embed it, and store the vector alongside the row’s primary key:

# Requires: openai>=1.0.0, chromadb>=0.5.0, sqlalchemy>=2.0.0
from openai import OpenAI
import chromadb

client = OpenAI()
chroma = chromadb.Client()
collection = chroma.get_or_create_collection("products")

def embed_database_records(engine: sa.Engine) -> None:
    """
    Embed product descriptions into a vector store.
    Stores the primary key for retrieval linkage.
    """
    with engine.connect() as conn:
        rows = conn.execute(
            sa.text("SELECT id, name, description, category, price FROM products")
        ).fetchall()

    for row in rows:
        # Serialize row to a rich text representation
        text = (
            f"Product: {row.name}\n"
            f"Category: {row.category}\n"
            f"Price: ${row.price:.2f}\n"
            f"Description: {row.description}"
        )

        response = client.embeddings.create(
            model="text-embedding-3-small",
            input=text,
        )
        embedding = response.data[0].embedding

        collection.upsert(
            ids=[str(row.id)],
            embeddings=[embedding],
            metadatas=[{
                "name": row.name,
                "category": row.category,
                "price": float(row.price),
            }],
            documents=[text],
        )

    print(f"Embedded {len(rows)} products into vector store")

def semantic_product_search(query: str, top_k: int = 5) -> list[dict]:
    """Retrieve products semantically similar to the query."""
    response = client.embeddings.create(
        model="text-embedding-3-small", input=query
    )
    query_embedding = response.data[0].embedding

    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=top_k,
        include=["documents", "metadatas", "distances"],
    )

    return [
        {
            "document": doc,
            "metadata": meta,
            "similarity": 1 - dist,  # distance → similarity
        }
        for doc, meta, dist in zip(
            results["documents"][0],
            results["metadatas"][0],
            results["distances"][0],
        )
    ]

For async patterns in embedding pipelines, see Async Python for GenAI.

7. SQLAlchemy Integration

SQLAlchemy is the standard Python ORM for relational databases. It integrates cleanly with LLM applications because it supports multiple database backends, provides a Python-native query API alongside raw SQL execution, and handles connection pooling automatically.

Connection Setup and Pooling

# Requires: sqlalchemy>=2.0.0, psycopg2-binary>=2.9.0
from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool

# Production connection with pooling
engine = sa.create_engine(
    "postgresql+psycopg2://readonly_user:[email protected]:5432/analytics",
    poolclass=QueuePool,
    pool_size=5,           # Concurrent connections for LLM query workers
    max_overflow=10,       # Burst capacity above pool_size
    pool_timeout=30,       # Wait time before "pool exhausted" error
    pool_pre_ping=True,    # Verify connections before use (handles stale connections)
    connect_args={
        "connect_timeout": 10,
        "application_name": "llm-query-agent",  # Visible in pg_stat_activity
    },
)

Async SQLAlchemy for High-Throughput Systems

When your database agent handles multiple concurrent user queries, async SQLAlchemy prevents connection starvation:

# Requires: sqlalchemy>=2.0.0, asyncpg>=0.29.0
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import sessionmaker

async_engine = create_async_engine(
    "postgresql+asyncpg://readonly_user:password@localhost/mydb",
    pool_size=10,
    max_overflow=20,
)

AsyncSessionLocal = sessionmaker(
    bind=async_engine,
    class_=AsyncSession,
    expire_on_commit=False,
)

async def run_query_async(sql: str) -> list[dict]:
    """Execute a validated SQL query asynchronously."""
    async with AsyncSessionLocal() as session:
        result = await session.execute(sa.text(sql))
        rows = result.fetchall()
        columns = list(result.keys())
        return [dict(zip(columns, row)) for row in rows]

# Handle multiple user queries concurrently
import asyncio

async def handle_multiple_queries(questions: list[str], schema: str) -> list[dict]:
    """Process multiple text-to-SQL questions in parallel."""
    tasks = [
        run_text_to_sql_async(q, schema)
        for q in questions
    ]
    return await asyncio.gather(*tasks, return_exceptions=True)

8. Production Architecture: Database Agents

Production LLM-database systems evolve beyond single-turn text-to-SQL into multi-step database agents. An agent can plan queries, correct errors, join results across multiple steps, and synthesize findings into a coherent answer.

The agent architecture connects naturally to the patterns described in AI Agents and the tool-calling mechanisms in Tool Calling for LLMs.

Tool Definitions for a Database Agent

Define database operations as tools the LLM can invoke:

# Requires: openai>=1.0.0, sqlalchemy>=2.0.0
from openai import OpenAI

client = OpenAI()

DATABASE_TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "run_sql_query",
            "description": (
                "Execute a SELECT SQL query against the analytics database. "
                "Use for aggregations, counts, filters, and joins on structured data. "
                "Always returns rows as a list of dicts with column names as keys."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "A valid PostgreSQL SELECT query. Must not contain INSERT, UPDATE, DELETE, or DDL.",
                    },
                    "description": {
                        "type": "string",
                        "description": "What this query is trying to answer (for logging and error context).",
                    },
                },
                "required": ["query", "description"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "semantic_search",
            "description": (
                "Search product catalog or knowledge base by semantic similarity. "
                "Use for conceptual questions, product recommendations, or free-text search."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Natural language search query"},
                    "top_k": {"type": "integer", "description": "Number of results to return (default 5)", "default": 5},
                },
                "required": ["query"],
            },
        },
    },
]

def run_database_agent(user_question: str, schema: str, max_steps: int = 5) -> str:
    """
    Multi-step database agent that can plan, query, correct, and synthesize.
    Returns a natural language answer grounded in live database data.
    """
    messages = [
        {
            "role": "system",
            "content": (
                f"You are a data analyst with access to a PostgreSQL database.\n\n"
                f"Database schema:\n{schema}\n\n"
                "Use run_sql_query for structured data. Use semantic_search for conceptual queries. "
                "Always explain your reasoning. Synthesize a final answer after gathering data."
            ),
        },
        {"role": "user", "content": user_question},
    ]

    for step in range(max_steps):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=DATABASE_TOOLS,
            tool_choice="auto",
        )

        message = response.choices[0].message
        messages.append(message)

        # No tool call — agent has finished
        if not message.tool_calls:
            return message.content

        # Execute each tool call and feed results back
        for tool_call in message.tool_calls:
            fn_name = tool_call.function.name
            fn_args = json.loads(tool_call.function.arguments)

            if fn_name == "run_sql_query":
                is_safe, validated_sql = validate_sql(fn_args["query"])
                if not is_safe:
                    result = {"error": f"Query blocked by safety validator: {validated_sql}"}
                else:
                    try:
                        with safe_db_connection(engine) as conn:
                            rows = conn.execute(sa.text(validated_sql)).fetchall()
                            result = {"rows": [dict(r) for r in rows], "count": len(rows)}
                    except Exception as e:
                        result = {"error": str(e)}

            elif fn_name == "semantic_search":
                result = semantic_product_search(
                    fn_args["query"], fn_args.get("top_k", 5)
                )

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result),
            })

    return "Agent reached maximum steps without a final answer."

Caching and Audit Logging

In production, cache common queries to reduce latency and log every LLM-generated query for compliance:

# Requires: sqlalchemy>=2.0.0, openai>=1.0.0
import hashlib
import json
from datetime import datetime

query_cache: dict[str, dict] = {}  # Replace with Redis in production

def cached_text_to_sql(question: str, schema: str, engine: sa.Engine) -> dict:
    """Text-to-SQL with caching and audit logging."""
    cache_key = hashlib.sha256(f"{question}:{schema}".encode()).hexdigest()[:16]

    if cache_key in query_cache:
        cached = query_cache[cache_key]
        log_query_audit(question, cached["query"], from_cache=True)
        return cached

    result = text_to_sql_with_retry(question, schema, engine)
    query_cache[cache_key] = result
    log_query_audit(question, result.get("query", ""), from_cache=False)
    return result

def log_query_audit(question: str, sql: str, from_cache: bool) -> None:
    """Write every query to an audit log for compliance and debugging."""
    audit_entry = {
        "timestamp": datetime.utcnow().isoformat(),
        "question": question,
        "generated_sql": sql,
        "from_cache": from_cache,
    }
    # In production: write to append-only audit table or log aggregator
    print(f"AUDIT: {json.dumps(audit_entry)}")

9. Interview Prep: LLM Database Integration

LLM-to-database questions appear in senior GenAI and AI infrastructure interviews. Interviewers are testing whether you understand production safety requirements, performance constraints, and architectural trade-offs — not just whether you have seen the basic text-to-SQL pattern.

Q1: How would you prevent a text-to-SQL system from executing destructive queries?

Defense in depth: four independent layers, any one of which catches what the others miss.

First, the database connection uses a read-only user with SELECT-only privileges at the database level — no INSERT, UPDATE, DELETE, or DDL rights regardless of what SQL is sent.

Second, a software-layer validator parses the generated SQL before execution. It checks that the query starts with SELECT, scans for blocked keywords (DROP, INSERT, EXECUTE), and rejects queries with multiple semicolons that could stack statements.

Third, schema filtering ensures the LLM never sees sensitive tables (payment methods, audit logs, credentials). Tables not in the allowlist are invisible to the LLM.

Fourth, server-side timeouts and row limits (statement_timeout, LIMIT 1000) prevent runaway queries from consuming database resources even if a query passes the first three layers.

Never rely solely on the LLM following instructions — the validation layers must be code, not prompts.

Q2: A text-to-SQL query returns wrong results. Walk through your debugging process.

Start by isolating whether the failure is in SQL generation or SQL execution. Log the generated SQL alongside the user question and the actual results in every production query.

If the SQL is syntactically correct but semantically wrong, the problem is usually in the schema context: the LLM did not have enough information to write the right JOIN, or column names are ambiguous. Fix by improving schema descriptions — add comments to columns, provide example values, or rename ambiguous columns in your filtered schema view.

If the SQL returns no results or clearly wrong aggregations, check for timezone mismatches, NULL handling differences, or implicit type casting. These are common failure modes when the LLM writes WHERE created_at = '2026-01-01' without accounting for timestamp precision.

Add an evaluation layer that runs known question-answer pairs against the system and alerts when results drift. See LLM Evaluation for evaluation frameworks applicable to this pattern.

Q3: When would you choose RAG over structured data instead of text-to-SQL?

Three clear indicators for RAG:

The database contains free-text columns — product descriptions, support tickets, notes, articles. These fields cannot be searched meaningfully with SQL LIKE or ILIKE, but semantic similarity retrieval handles them well.

The user’s question is conceptual rather than analytical. “Show me products similar to X” or “find support tickets about payment failures” are semantic queries. “Total revenue per region last month” is an analytical query. The first is a RAG problem; the second is a text-to-SQL problem.

Exact matching fails in practice. Users phrase queries differently than stored data. SQL requires exact string matches unless you use full-text search, which has its own limitations. Vector similarity handles paraphrase and synonym variation naturally.

In practice, hybrid systems win. Use text-to-SQL for aggregations and filters; use RAG for semantic search; combine them when a question requires both (e.g., find products matching a description AND filter by price range).

Q4: How would you scale a text-to-SQL system to handle 1,000 queries per day?

Five requirements at that scale:

Connection pooling: Use SQLAlchemy’s QueuePool with enough connections for concurrent LLM workers. At 1,000 queries/day with <2s average query time, peak concurrency is manageable with 10-20 connections.

Query caching: A significant fraction of business queries are repeated (daily revenue, active users, churn rate). Cache results with a TTL matching data freshness requirements — often 5-15 minutes for analytics dashboards. Redis with a SHA-256 hash of the question + schema as the cache key.

Async execution: Use async SQLAlchemy and async LLM clients so concurrent queries do not block each other. At 1,000 queries/day (roughly 1 per 90 seconds average), synchronous execution works, but traffic is bursty — design for concurrent peak, not average.

Query cost monitoring: At scale, LLM-generated queries can include expensive table scans. Log query execution time and rows scanned. Alert on queries that exceed thresholds. A EXPLAIN ANALYZE call before executing can catch problematic plans.

Audit trail: Every generated query, the user who asked it, and the execution result must be logged. This is both a compliance requirement and a debugging tool when results are questioned.

10. Key Takeaways

LLM-to-database integration is one of the most practical and immediately deployable patterns in the GenAI engineer’s toolkit. The core insight is that it is not a single technique — it is an architecture that combines text-to-SQL generation, safety validation, optional vector retrieval, and agentic orchestration based on what the query requires.

What to apply from this guide:

Use text-to-SQL for structured, analytical queries. Always validate the generated SQL in code — never rely on the LLM following safety instructions alone.
Use a read-only database user as your first and most important safety layer. This is a database-level control that no prompt injection can override.
Use RAG over structured data when records contain free-text fields or when semantic similarity outperforms exact SQL filters.
Combine both patterns in production: route analytical questions to text-to-SQL, semantic questions to vector retrieval, and complex multi-step questions to a database agent.
Log every generated query for audit and debugging. At scale, this log becomes your primary tool for understanding system failures and improving SQL generation quality.

For the async patterns needed to run concurrent database queries efficiently, see Async Python for GenAI. For the broader agent architectures that database agents fit into, see AI Agents.

Python for GenAI — Python foundations for AI development
Python Data Structures — Core data structures for pipelines
RAG Architecture — Retrieval-augmented generation patterns
Vector DB Comparison — Pinecone, Weaviate, Chroma compared

Frequently Asked Questions

How do you connect an LLM to a database?

You connect an LLM to a database using two main patterns: text-to-SQL (the LLM generates a SQL query from a natural language question, which is executed against the database) or RAG over structured data (database rows are embedded and stored in a vector database for semantic retrieval). Text-to-SQL works best for analytical queries over known schemas. RAG works best when the database contains free-text content or when semantic search is more useful than exact filtering.

How do you make text-to-SQL safe in production?

Production text-to-SQL safety requires four layers: (1) read-only database users — the LLM connection should never have INSERT, UPDATE, DELETE, or DDL privileges; (2) SQL allowlisting — only allow SELECT statements, reject anything containing semicolons, DROP, INSERT, or EXECUTE; (3) schema validation — provide the LLM with only the tables and columns it needs, not the entire schema; (4) query sandboxing — set row limits (LIMIT 1000), query timeouts, and cost limits before executing any LLM-generated SQL.

When should you use an LLM instead of a traditional ORM?

Use a traditional ORM (SQLAlchemy, Django ORM) when queries are known, predictable, and performance-critical. Use LLM-generated SQL when the question space is open-ended and users need to ask questions the developers did not anticipate — analytics dashboards, internal data tools, customer-facing search. Many production systems use both: ORM for application logic, LLM for ad-hoc analytical queries.

What is RAG over a database and when should you use it?

RAG over a database means embedding database rows into a vector store and using semantic similarity search to find relevant records before generating an LLM response. Use it when your database contains free-text fields (product descriptions, support tickets, knowledge articles) or when users ask conceptual questions that cannot be expressed as exact SQL filters. RAG over databases complements text-to-SQL rather than replacing it.

How does schema introspection work for text-to-SQL?

Schema introspection uses SQLAlchemy's inspect() to read table names and column definitions from the live database at runtime. The schema description is formatted as text listing each table with its columns and types, then passed to the LLM in the system prompt. Filtering the schema to expose only allowed tables prevents the LLM from accessing sensitive data like payment methods or credentials.

How do you handle errors when LLM-generated SQL fails?

Implement a retry loop that feeds the database error message back to the LLM as a follow-up prompt, asking it to fix the SQL. This self-correction pattern typically resolves wrong column names or incorrect syntax within 1-2 retries. Log the generated SQL alongside the user question and error for debugging. If retries are exhausted, return a clear error rather than a silent failure.

How do you embed database records into a vector store for RAG?

Serialize each database row into a rich text representation combining key fields (name, category, description, price), then generate an embedding vector using a model like text-embedding-3-small. Store the vector in a vector database (ChromaDB, Pinecone) alongside the row's primary key and metadata. At query time, embed the user's question and retrieve the most semantically similar records.

What is a database agent and how does it differ from basic text-to-SQL?

A database agent is a multi-step LLM system that can plan queries, correct errors, join results across multiple steps, and synthesize findings into a coherent answer. Unlike basic text-to-SQL which is single-turn, agents use tool-calling to invoke SQL execution and semantic search as needed, iterating until they have enough data to produce a complete response.

How do you scale a text-to-SQL system for high query volumes?

Five requirements at scale: connection pooling with SQLAlchemy QueuePool for concurrent workers, query caching with a TTL matching data freshness needs, async execution with async SQLAlchemy to handle concurrent queries without blocking, query cost monitoring to alert on expensive table scans, and audit logging of every generated query for compliance and debugging.

Why should you use a read-only database user for LLM queries?

A read-only database user with SELECT-only privileges is a database-level control that no prompt injection can override. Even if an LLM is manipulated into generating a DROP TABLE or DELETE statement, the database rejects it at the permission level. This is the first and most important safety layer, supplemented by application-level SQL validation and schema filtering.

How to Connect LLM to Database — Text-to-SQL & RAG Patterns (2026)

1. Why LLM Database Integration Matters

2. Why LLM + Database Integration Matters

3. Text-to-SQL: The Basic Pattern

Schema Introspection

Generating and Executing SQL

Handling SQL Errors with Retry

4. Text-to-SQL Pipeline — Visual Architecture

📊 Visual Explanation

5. Safety Patterns: Read-Only, Sandboxing, Schema Validation

Layer 1: Read-Only Database Users

Layer 2: SQL Allowlisting and Injection Prevention

Layer 3: Filtered Schema Exposure

Layer 4: Query Timeouts and Row Limits

6. RAG Over Structured Data

When to Use RAG Instead of Text-to-SQL

Embedding Database Records

7. SQLAlchemy Integration

Connection Setup and Pooling

Async SQLAlchemy for High-Throughput Systems

8. Production Architecture: Database Agents

Tool Definitions for a Database Agent

Caching and Audit Logging

9. Interview Prep: LLM Database Integration

10. Key Takeaways

Related

Frequently Asked Questions