Structured Outputs from LLMs — JSON, Pydantic & Schema Enforcement (2026)

Q: How does OpenAI's structured output mode work?

OpenAI's structured output mode uses the response_format parameter with type 'json_schema'. You provide a JSON Schema definition, and the model's decoding process is constrained to only produce tokens that result in valid JSON matching that schema. This guarantees 100% schema compliance — the model physically cannot produce invalid output.

Structured outputs let you force an LLM to return valid, schema-compliant JSON instead of free-form text. This is the bridge between natural language generation and typed software pipelines — without it, every LLM call requires fragile regex parsing and manual error handling. This guide covers the three enforcement methods (prompt-based, constrained decoding, grammar-based), working Python code for OpenAI, Anthropic, and Instructor, and the production patterns that make structured outputs reliable at scale.

Updated March 2026 — Covers OpenAI’s response_format: json_schema, Anthropic’s tool-use-as-schema pattern, Instructor v1.x with Pydantic v2, and open-source constrained decoding with Outlines and llama.cpp GBNF grammars.

1. Why Structured Outputs Matter for AI Engineers

LLMs generate free text by default. Production pipelines need typed, validated data. This mismatch is the root cause of the most common integration failures in GenAI systems.

Typed pipelines require typed data — When an LLM feeds into a database insert, API call, or downstream function, the output must match an exact schema. Free text cannot be reliably parsed with regex across edge cases.
Agent tool calls depend on valid JSON — Every tool calling system requires the LLM to produce structured arguments. If the JSON is malformed, the tool never executes and the agent loop breaks.
Validation eliminates silent failures — Without schema enforcement, an LLM might return a plausible-looking response that omits required fields or uses wrong types. These failures propagate silently through your pipeline until a user reports bad results.

2. When You Need Structured Outputs — Use Cases

Not every LLM call needs structured output. The decision depends on what consumes the response.

Use Case	Output Format	Why Structured
API response generation	JSON matching OpenAPI spec	Downstream services expect exact field names and types
Entity extraction	List of typed objects	NER results feed into databases or knowledge graphs
Agent tool calls	Function name + argument JSON	Agent loops parse tool calls programmatically
Form filling	Key-value pairs with validation	User-facing forms require specific field types
Classification	Enum label + confidence score	Routing logic branches on exact label values
Data transformation	Source schema to target schema	ETL pipelines require deterministic field mapping

When free text is the right choice

Use free text when the output is consumed by humans, not code: summaries, explanations, creative writing, conversational responses. Forcing JSON on these use cases degrades quality without adding value.

3. How Structured Outputs Work — Enforcement Methods

There are three fundamentally different approaches to getting structured data from an LLM. They differ in reliability, latency, and provider support.

Three Approaches to Structured LLM Output

From least reliable (prompt-based) to most reliable (constrained decoding) — each trades flexibility for guarantees.

Prompt-BasedAsk the model to output JSON via instructions

System prompt says 'respond in JSON'

Model generates tokens freely

Parse response, retry on failure

~85-95% success rate

Schema-ConstrainedProvider enforces schema during decoding

Pass JSON Schema to API

Constrained token sampling

Output guaranteed schema-valid

100% compliance (OpenAI, vLLM)

Grammar-BasedFormal grammar restricts token generation

Define GBNF or regex grammar

Token mask applied at each step

Only valid continuations sampled

100% compliance (llama.cpp, Outlines)

Idle

Prompt-based (least reliable)

You instruct the model to return JSON in the system prompt. This works most of the time but fails unpredictably — the model might add markdown fences, include trailing text, or omit required fields. Every call needs try/except parsing with retry logic.

Schema-constrained (provider-native)

OpenAI’s response_format: { type: "json_schema" } constrains the decoding process so the model can only produce tokens that result in valid JSON matching your schema. The guarantee is at the token level — not a post-hoc check.

Grammar-based (open-source)

llama.cpp uses GBNF (Generalized Backus-Naur Form) grammars. Outlines uses finite-state machine constraints. Both modify the token sampling mask at each generation step so only schema-valid continuations are possible.

4. Structured Outputs Tutorial — From Prompt to Typed Response

Three working patterns, from provider-native to library-based.

Pattern 1: OpenAI Structured Outputs (native)

OpenAI’s response_format parameter with json_schema type gives you guaranteed schema compliance.

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Extract the meeting details from the user's message."},
        {"role": "user", "content": "Let's meet at 3pm on Friday at the downtown office to discuss Q2 planning."}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "meeting_details",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "time": {"type": "string", "description": "Meeting time in HH:MM format"},
                    "day": {"type": "string", "description": "Day of the week"},
                    "location": {"type": "string", "description": "Meeting location"},
                    "topic": {"type": "string", "description": "Meeting topic or agenda"}
                },
                "required": ["time", "day", "location", "topic"],
                "additionalProperties": False
            }
        }
    }
)

import json
meeting = json.loads(response.choices[0].message.content)
# {"time": "15:00", "day": "Friday", "location": "downtown office", "topic": "Q2 planning"}

Key detail: "strict": True enables constrained decoding. Without it, the model uses best-effort JSON generation that can still fail.

Pattern 2: Anthropic Tool-Use-as-Schema

Anthropic does not have a native json_mode. Instead, define a single tool whose input_schema matches your desired output shape. The model “calls” the tool, and you extract the structured data from the tool_use block.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[{
        "name": "extract_meeting",
        "description": "Extract structured meeting details from text.",
        "input_schema": {
            "type": "object",
            "properties": {
                "time": {"type": "string", "description": "Meeting time in HH:MM format"},
                "day": {"type": "string", "description": "Day of the week"},
                "location": {"type": "string", "description": "Meeting location"},
                "topic": {"type": "string", "description": "Meeting topic or agenda"}
            },
            "required": ["time", "day", "location", "topic"]
        }
    }],
    tool_choice={"type": "tool", "name": "extract_meeting"},
    messages=[
        {"role": "user", "content": "Let's meet at 3pm on Friday at the downtown office to discuss Q2 planning."}
    ]
)

# Extract from tool_use content block
tool_block = next(b for b in response.content if b.type == "tool_use")
meeting = tool_block.input
# {"time": "15:00", "day": "Friday", "location": "downtown office", "topic": "Q2 planning"}

tool_choice: {"type": "tool", "name": "extract_meeting"} forces the model to use the tool, guaranteeing structured output instead of a free-text response.

Pattern 3: Instructor + Pydantic (cross-provider)

The Instructor library lets you define output schemas as Pydantic models and handles provider-specific details automatically.

import instructor
from pydantic import BaseModel, Field
from openai import OpenAI

class MeetingDetails(BaseModel):
    time: str = Field(description="Meeting time in HH:MM format")
    day: str = Field(description="Day of the week")
    location: str = Field(description="Meeting location")
    topic: str = Field(description="Meeting topic or agenda")

client = instructor.from_openai(OpenAI())

meeting = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    response_model=MeetingDetails,
    messages=[
        {"role": "user", "content": "Let's meet at 3pm on Friday at the downtown office to discuss Q2 planning."}
    ]
)

# meeting is a MeetingDetails instance — fully typed, validated
print(meeting.time)      # "15:00"
print(meeting.location)  # "downtown office"

Instructor patches the client to convert Pydantic models to JSON Schema, send the appropriate API format (OpenAI’s json_schema or Anthropic’s tool_use), parse the response, and retry with validation feedback if parsing fails. It works with OpenAI, Anthropic, Google, Mistral, and any OpenAI-compatible endpoint.

5. Schema Enforcement Architecture

A production structured output pipeline has six layers, from the application request down to the validated typed object.

Structured Output Pipeline

Each layer adds a guarantee — from raw text generation to fully typed, validated objects.

Application Layer

Business logic sends extraction or generation request

Schema Definition

Pydantic model or JSON Schema defines the expected shape

LLM API Layer

Provider receives schema via response_format or tool_use

Constrained Decoding

Token sampling restricted to schema-valid continuations

Validation Layer

Pydantic validates types, ranges, and custom constraints

Typed Output

Application receives a fully typed object — no parsing needed

Idle

Why each layer matters

Schema Definition catches design errors early. If your schema requires a price field as a float but the text contains “$29.99”, the schema tells the model to extract the number, not the string.

Constrained Decoding eliminates JSON syntax errors entirely. The model cannot produce {"name": "Alice",} (trailing comma) or unclosed brackets.

Validation Layer catches semantic errors that syntax-level constraints miss. A confidence field constrained to 0.0-1.0 in your Pydantic model will reject confidence: 95.0 even though it is valid JSON.

6. Structured Output Code Examples

Three production-grade patterns that go beyond basic extraction.

Example 1: Entity extraction with typed models

Extract named entities from unstructured text into a typed Pydantic model.

from pydantic import BaseModel, Field
from enum import Enum

class EntityType(str, Enum):
    PERSON = "person"
    ORGANIZATION = "organization"
    LOCATION = "location"
    DATE = "date"
    MONETARY = "monetary"

class Entity(BaseModel):
    text: str = Field(description="The entity text as it appears in the source")
    entity_type: EntityType = Field(description="Classification of the entity")
    confidence: float = Field(ge=0.0, le=1.0, description="Extraction confidence score")

class ExtractionResult(BaseModel):
    entities: list[Entity] = Field(description="All extracted entities")
    source_text: str = Field(description="Original text that was analyzed")

# Use with Instructor
result = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    response_model=ExtractionResult,
    messages=[{
        "role": "user",
        "content": "Apple Inc. announced a $3 billion investment in Austin, Texas on January 15, 2026."
    }]
)

for entity in result.entities:
    print(f"{entity.text} -> {entity.entity_type.value} ({entity.confidence:.0%})")
# Apple Inc. -> organization (95%)
# $3 billion -> monetary (98%)
# Austin, Texas -> location (97%)
# January 15, 2026 -> date (99%)

Example 2: Classification with confidence scores

Classify support tickets with a typed enum and a confidence threshold.

class TicketCategory(str, Enum):
    BILLING = "billing"
    TECHNICAL = "technical"
    ACCOUNT = "account"
    FEATURE_REQUEST = "feature_request"
    OTHER = "other"

class TicketClassification(BaseModel):
    category: TicketCategory
    confidence: float = Field(ge=0.0, le=1.0)
    reasoning: str = Field(description="One-sentence explanation for the classification")

classification = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    response_model=TicketClassification,
    messages=[{
        "role": "user",
        "content": "I was charged twice for my subscription this month. Can you fix this?"
    }]
)
# category=billing, confidence=0.96, reasoning="User reports duplicate charge on subscription"

# Route based on confidence threshold
if classification.confidence < 0.8:
    route_to_human_review(classification)
else:
    route_to_handler(classification.category)

Example 3: Multi-step agent output with nested schemas

Define a structured plan that an agent must follow, with nested steps and dependencies.

class ToolCall(BaseModel):
    tool_name: str = Field(description="Name of the tool to invoke")
    arguments: dict = Field(description="Arguments to pass to the tool")
    expected_output: str = Field(description="What this tool call should return")

class PlanStep(BaseModel):
    step_number: int
    description: str
    tool_calls: list[ToolCall] = Field(default_factory=list)
    depends_on: list[int] = Field(
        default_factory=list,
        description="Step numbers that must complete before this step"
    )

class AgentPlan(BaseModel):
    goal: str = Field(description="The user's objective restated clearly")
    steps: list[PlanStep] = Field(description="Ordered execution plan")
    estimated_tool_calls: int = Field(description="Total number of tool invocations")

plan = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    response_model=AgentPlan,
    messages=[{
        "role": "user",
        "content": "Find the top 3 competitors for Stripe in the payment processing space and compare their pricing."
    }]
)

# plan.steps is a list of PlanStep objects with typed ToolCall children
for step in plan.steps:
    print(f"Step {step.step_number}: {step.description}")
    for tc in step.tool_calls:
        print(f"  -> {tc.tool_name}({tc.arguments})")

7. Structured Outputs — Provider Comparison

Two fundamentally different approaches: provider-native enforcement vs library-based enforcement.

Native JSON Mode vs Library-Based Enforcement

OpenAI Structured Outputs

Provider-native constrained decoding — 100% schema compliance guaranteed

Token-level constraint — model cannot produce invalid JSON
Zero retry overhead — first response is always valid
Supports nested objects, arrays, enums, and unions
First request with new schema has compilation latency (~1-2s)
Locked to OpenAI models only
All fields must be required (no optional fields with strict mode)
No custom Pydantic validators at the API level

Instructor / Outlines

Library-based schema enforcement — works across any LLM provider

Works with OpenAI, Anthropic, Google, Mistral, and local models
Full Pydantic v2 support including custom validators
Automatic retry with validation error feedback on failure
Outlines provides token-level guarantees for local models
Prompt-based mode has 85-95% first-attempt success rate
Retry loops add latency when validation fails
Additional dependency in your stack

Verdict: Use OpenAI structured outputs when locked to OpenAI and need zero-retry guarantees. Use Instructor when you need cross-provider support, custom validators, or work with Anthropic/open-source models.

Use OpenAI Structured Outputs when…

Use Instructor / Outlines when…

8. Structured Output Interview Questions

These questions come up in system design rounds when candidates describe GenAI pipelines that consume LLM output programmatically.

Q: How do you guarantee an LLM returns valid JSON in production?

You use constrained decoding, not prompt engineering. OpenAI’s response_format: json_schema with strict: True constrains the token sampling so only schema-valid JSON can be generated. For Anthropic, use the tool-use-as-schema pattern with tool_choice forcing the specific tool. For open-source models, use Outlines or llama.cpp GBNF grammars. Prompt-based approaches (“please return JSON”) fail 5-15% of the time at scale and require retry loops.

Q: What happens when a structured output violates business rules that JSON Schema cannot express?

JSON Schema validates structure (types, required fields, enums). Business rules (e.g., end_date must be after start_date, total must equal sum(line_items)) require a validation layer on top. Use Pydantic model validators for this. Instructor’s retry mechanism feeds the validation error back to the model, so the second attempt knows exactly what constraint it violated. Design your pipeline with two validation stages: schema validation (guaranteed by constrained decoding) and business rule validation (Pydantic validators with retry).

Q: How do you handle structured outputs for streaming responses?

Streaming and structured outputs are partially compatible. OpenAI streams structured output tokens — you receive partial JSON as it generates. You cannot parse until the stream completes, but you can show progress. For real-time UX, stream the raw tokens for display while accumulating the full response for parsing at the end. Anthropic’s tool-use blocks stream the input field incrementally. Instructor supports streaming with create_partial which yields progressively more complete Pydantic objects.

Q: When would you choose prompt-based JSON over constrained decoding?

Two scenarios: (1) when you need optional fields — OpenAI’s strict mode requires all fields, so optional fields need prompt-based generation or a workaround using nullable types; (2) when you need the model to decide whether to return structured data or free text — constrained decoding always produces the schema, even when the input does not match any expected case. Hybrid approaches use constrained decoding for the outer structure and allow free-text fields within it.

9. Structured Outputs in Production — Reliability

Schema enforcement alone does not make a production system reliable. You need retry strategies, fallback patterns, and monitoring.

Failure modes by enforcement method

Method	Can produce invalid JSON?	Can violate schema?	Can violate business rules?
Prompt-based	Yes (5-15% failure rate)	Yes	Yes
OpenAI strict mode	No	No	Yes
GBNF / Outlines	No	No	Yes
Instructor (auto-retry)	Rare (retries fix it)	Rare (retries fix it)	Depends on validators

Retry strategy with Instructor

Instructor retries include the validation error in the next prompt, giving the model targeted feedback.

import instructor
from openai import OpenAI
from tenacity import Retry

client = instructor.from_openai(OpenAI())

result = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    response_model=ExtractionResult,
    max_retries=3,  # Retry up to 3 times with validation error feedback
    messages=[{"role": "user", "content": text_to_extract}]
)

On each retry, Instructor appends the Pydantic ValidationError message to the conversation so the model sees exactly which field failed and why.

Fallback patterns

Schema simplification — If the full schema fails after retries, try a simpler schema with fewer required fields
Model escalation — If a smaller model cannot produce valid output, escalate to a larger model for that specific request
Graceful degradation — Return a partial result with a flag indicating which fields could not be extracted, rather than failing the entire request

Monitoring structured output quality

Track three metrics in production:

First-attempt success rate — percentage of requests that produce valid output without retries (target: >95% with constrained decoding, >85% with prompt-based)
Retry rate — percentage of requests needing at least one retry (alert if >10%)
Schema violation rate after retries — requests that exhaust all retries and still fail (alert if >1%)

Log every validation failure with the raw output, the schema, and the error message. These logs are the training data for improving your schemas and prompts.

10. Summary and Key Takeaways

Structured outputs are the interface layer between LLMs and typed software. The choice of enforcement method determines your reliability ceiling.

What to remember:

Use constrained decoding when available — OpenAI’s json_schema mode with strict: True guarantees valid output at the token level. No retries needed.
Use Instructor for cross-provider workflows — It abstracts provider differences and adds Pydantic validation with automatic retry.
Anthropic uses tool-use-as-schema — Define a tool whose input_schema matches your desired output. Force it with tool_choice.
Schema enforcement does not replace business validation — JSON Schema catches structural errors. Pydantic validators catch semantic errors. You need both layers.
Monitor first-attempt success rate — This is the leading indicator of structured output health. A drop means your prompts, schemas, or model version changed in a way that breaks extraction.

LLM Tool Calling — Tool calling is structured outputs in action: the LLM produces typed function arguments
AI Agents — Agents chain structured outputs across multi-step reasoning loops
LLMOps — Production infrastructure for monitoring and scaling structured output pipelines
LLM Evaluation — How to measure the quality of structured extraction at scale
Python for GenAI — Pydantic, type hints, and the Python ecosystem for GenAI development

Frequently Asked Questions

What are structured outputs from LLMs?

Structured outputs force an LLM to return data in a specific format — typically JSON matching a predefined schema — instead of free-form text. This enables downstream code to parse the response reliably without regex extraction or error-prone string manipulation. OpenAI, Anthropic, and open-source models each provide different mechanisms for schema enforcement.

How does OpenAI's structured output mode work?

OpenAI's structured output mode uses the response_format parameter with type json_schema. You provide a JSON Schema definition, and the model's decoding process is constrained to only produce tokens that result in valid JSON matching that schema. This guarantees 100% schema compliance — the model physically cannot produce invalid output.

What is the Instructor library for structured outputs?

Instructor is a Python library that patches LLM client libraries (OpenAI, Anthropic, etc.) to accept Pydantic models as the desired output schema. It handles schema conversion, API calls, response parsing, and automatic retry with validation error feedback. You define a Pydantic model and call client.chat.completions.create with response_model=YourModel.

How do you get structured outputs from Anthropic Claude?

Anthropic does not have a native JSON mode equivalent to OpenAI's. Instead, you use the tool_use feature as a schema enforcement mechanism — define a single tool whose input_schema matches your desired output shape, then extract the structured data from the tool_use response block. The Instructor library automates this pattern.

What is constrained decoding for structured outputs?

Constrained decoding modifies the token sampling process so the model can only generate tokens that keep the output valid according to a grammar or schema. OpenAI's structured output mode and llama.cpp's GBNF grammars both use this approach. Unlike prompt-based methods, constrained decoding guarantees schema compliance at the token level.

When should you use structured outputs vs free text?

Use structured outputs when downstream code must parse the response programmatically — API responses, database inserts, agent tool calls, classification labels, or data extraction. Use free text when the output is meant for human consumption (summaries, explanations, creative writing) where rigid formatting would reduce quality.

How do you handle nested schemas in structured outputs?

Nested schemas work by defining Pydantic models that reference other models. For example, an Article model containing a list of Section models, each with a list of Citation models. OpenAI's structured output mode supports arbitrarily nested JSON Schemas. Instructor and Outlines both handle nested Pydantic models natively.

What happens when structured output validation fails?

With constrained decoding (OpenAI structured outputs, GBNF grammars), validation failure is impossible — the output is guaranteed valid by construction. With prompt-based approaches, validation failures require retry logic. Instructor implements automatic retries that include the validation error message in the next attempt, giving the model specific feedback on what to fix.

What is the performance cost of structured outputs?

Constrained decoding adds minimal latency — typically under 5ms per request for schema compilation. The first request with a new schema may take longer as the provider compiles the constraint grammar. Prompt-based approaches add no latency but require retry loops that can double or triple total latency when validation fails.

Can you use structured outputs with open-source models?

Yes. llama.cpp supports GBNF grammars that constrain output to match a formal grammar (including JSON schemas). The Outlines library provides structured generation for any Hugging Face model using finite-state machine constraints. vLLM supports guided decoding with JSON Schema or regex patterns.