Structured Outputs from LLMs — JSON, Pydantic & Schema Enforcement (2026)
Structured outputs let you force an LLM to return valid, schema-compliant JSON instead of free-form text. This is the bridge between natural language generation and typed software pipelines — without it, every LLM call requires fragile regex parsing and manual error handling. This guide covers the three enforcement methods (prompt-based, constrained decoding, grammar-based), working Python code for OpenAI, Anthropic, and Instructor, and the production patterns that make structured outputs reliable at scale.
Updated March 2026 — Covers OpenAI’s
response_format: json_schema, Anthropic’s tool-use-as-schema pattern, Instructor v1.x with Pydantic v2, and open-source constrained decoding with Outlines and llama.cpp GBNF grammars.
1. Why Structured Outputs Matter for AI Engineers
Section titled “1. Why Structured Outputs Matter for AI Engineers”LLMs generate free text by default. Production pipelines need typed, validated data. This mismatch is the root cause of the most common integration failures in GenAI systems.
- Typed pipelines require typed data — When an LLM feeds into a database insert, API call, or downstream function, the output must match an exact schema. Free text cannot be reliably parsed with regex across edge cases.
- Agent tool calls depend on valid JSON — Every tool calling system requires the LLM to produce structured arguments. If the JSON is malformed, the tool never executes and the agent loop breaks.
- Validation eliminates silent failures — Without schema enforcement, an LLM might return a plausible-looking response that omits required fields or uses wrong types. These failures propagate silently through your pipeline until a user reports bad results.
2. When You Need Structured Outputs — Use Cases
Section titled “2. When You Need Structured Outputs — Use Cases”Not every LLM call needs structured output. The decision depends on what consumes the response.
| Use Case | Output Format | Why Structured |
|---|---|---|
| API response generation | JSON matching OpenAPI spec | Downstream services expect exact field names and types |
| Entity extraction | List of typed objects | NER results feed into databases or knowledge graphs |
| Agent tool calls | Function name + argument JSON | Agent loops parse tool calls programmatically |
| Form filling | Key-value pairs with validation | User-facing forms require specific field types |
| Classification | Enum label + confidence score | Routing logic branches on exact label values |
| Data transformation | Source schema to target schema | ETL pipelines require deterministic field mapping |
When free text is the right choice
Section titled “When free text is the right choice”Use free text when the output is consumed by humans, not code: summaries, explanations, creative writing, conversational responses. Forcing JSON on these use cases degrades quality without adding value.
3. How Structured Outputs Work — Enforcement Methods
Section titled “3. How Structured Outputs Work — Enforcement Methods”There are three fundamentally different approaches to getting structured data from an LLM. They differ in reliability, latency, and provider support.
Three Approaches to Structured LLM Output
From least reliable (prompt-based) to most reliable (constrained decoding) — each trades flexibility for guarantees.
Prompt-based (least reliable)
Section titled “Prompt-based (least reliable)”You instruct the model to return JSON in the system prompt. This works most of the time but fails unpredictably — the model might add markdown fences, include trailing text, or omit required fields. Every call needs try/except parsing with retry logic.
Schema-constrained (provider-native)
Section titled “Schema-constrained (provider-native)”OpenAI’s response_format: { type: "json_schema" } constrains the decoding process so the model can only produce tokens that result in valid JSON matching your schema. The guarantee is at the token level — not a post-hoc check.
Grammar-based (open-source)
Section titled “Grammar-based (open-source)”llama.cpp uses GBNF (Generalized Backus-Naur Form) grammars. Outlines uses finite-state machine constraints. Both modify the token sampling mask at each generation step so only schema-valid continuations are possible.
4. Structured Outputs Tutorial — From Prompt to Typed Response
Section titled “4. Structured Outputs Tutorial — From Prompt to Typed Response”Three working patterns, from provider-native to library-based.
Pattern 1: OpenAI Structured Outputs (native)
Section titled “Pattern 1: OpenAI Structured Outputs (native)”OpenAI’s response_format parameter with json_schema type gives you guaranteed schema compliance.
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create( model="gpt-4o-2024-08-06", messages=[ {"role": "system", "content": "Extract the meeting details from the user's message."}, {"role": "user", "content": "Let's meet at 3pm on Friday at the downtown office to discuss Q2 planning."} ], response_format={ "type": "json_schema", "json_schema": { "name": "meeting_details", "strict": True, "schema": { "type": "object", "properties": { "time": {"type": "string", "description": "Meeting time in HH:MM format"}, "day": {"type": "string", "description": "Day of the week"}, "location": {"type": "string", "description": "Meeting location"}, "topic": {"type": "string", "description": "Meeting topic or agenda"} }, "required": ["time", "day", "location", "topic"], "additionalProperties": False } } })
import jsonmeeting = json.loads(response.choices[0].message.content)# {"time": "15:00", "day": "Friday", "location": "downtown office", "topic": "Q2 planning"}Key detail: "strict": True enables constrained decoding. Without it, the model uses best-effort JSON generation that can still fail.
Pattern 2: Anthropic Tool-Use-as-Schema
Section titled “Pattern 2: Anthropic Tool-Use-as-Schema”Anthropic does not have a native json_mode. Instead, define a single tool whose input_schema matches your desired output shape. The model “calls” the tool, and you extract the structured data from the tool_use block.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, tools=[{ "name": "extract_meeting", "description": "Extract structured meeting details from text.", "input_schema": { "type": "object", "properties": { "time": {"type": "string", "description": "Meeting time in HH:MM format"}, "day": {"type": "string", "description": "Day of the week"}, "location": {"type": "string", "description": "Meeting location"}, "topic": {"type": "string", "description": "Meeting topic or agenda"} }, "required": ["time", "day", "location", "topic"] } }], tool_choice={"type": "tool", "name": "extract_meeting"}, messages=[ {"role": "user", "content": "Let's meet at 3pm on Friday at the downtown office to discuss Q2 planning."} ])
# Extract from tool_use content blocktool_block = next(b for b in response.content if b.type == "tool_use")meeting = tool_block.input# {"time": "15:00", "day": "Friday", "location": "downtown office", "topic": "Q2 planning"}tool_choice: {"type": "tool", "name": "extract_meeting"} forces the model to use the tool, guaranteeing structured output instead of a free-text response.
Pattern 3: Instructor + Pydantic (cross-provider)
Section titled “Pattern 3: Instructor + Pydantic (cross-provider)”The Instructor library lets you define output schemas as Pydantic models and handles provider-specific details automatically.
import instructorfrom pydantic import BaseModel, Fieldfrom openai import OpenAI
class MeetingDetails(BaseModel): time: str = Field(description="Meeting time in HH:MM format") day: str = Field(description="Day of the week") location: str = Field(description="Meeting location") topic: str = Field(description="Meeting topic or agenda")
client = instructor.from_openai(OpenAI())
meeting = client.chat.completions.create( model="gpt-4o-2024-08-06", response_model=MeetingDetails, messages=[ {"role": "user", "content": "Let's meet at 3pm on Friday at the downtown office to discuss Q2 planning."} ])
# meeting is a MeetingDetails instance — fully typed, validatedprint(meeting.time) # "15:00"print(meeting.location) # "downtown office"Instructor patches the client to convert Pydantic models to JSON Schema, send the appropriate API format (OpenAI’s json_schema or Anthropic’s tool_use), parse the response, and retry with validation feedback if parsing fails. It works with OpenAI, Anthropic, Google, Mistral, and any OpenAI-compatible endpoint.
5. Schema Enforcement Architecture
Section titled “5. Schema Enforcement Architecture”A production structured output pipeline has six layers, from the application request down to the validated typed object.
Structured Output Pipeline
Each layer adds a guarantee — from raw text generation to fully typed, validated objects.
Why each layer matters
Section titled “Why each layer matters”Schema Definition catches design errors early. If your schema requires a price field as a float but the text contains “$29.99”, the schema tells the model to extract the number, not the string.
Constrained Decoding eliminates JSON syntax errors entirely. The model cannot produce {"name": "Alice",} (trailing comma) or unclosed brackets.
Validation Layer catches semantic errors that syntax-level constraints miss. A confidence field constrained to 0.0-1.0 in your Pydantic model will reject confidence: 95.0 even though it is valid JSON.
6. Structured Output Code Examples
Section titled “6. Structured Output Code Examples”Three production-grade patterns that go beyond basic extraction.
Example 1: Entity extraction with typed models
Section titled “Example 1: Entity extraction with typed models”Extract named entities from unstructured text into a typed Pydantic model.
from pydantic import BaseModel, Fieldfrom enum import Enum
class EntityType(str, Enum): PERSON = "person" ORGANIZATION = "organization" LOCATION = "location" DATE = "date" MONETARY = "monetary"
class Entity(BaseModel): text: str = Field(description="The entity text as it appears in the source") entity_type: EntityType = Field(description="Classification of the entity") confidence: float = Field(ge=0.0, le=1.0, description="Extraction confidence score")
class ExtractionResult(BaseModel): entities: list[Entity] = Field(description="All extracted entities") source_text: str = Field(description="Original text that was analyzed")
# Use with Instructorresult = client.chat.completions.create( model="gpt-4o-2024-08-06", response_model=ExtractionResult, messages=[{ "role": "user", "content": "Apple Inc. announced a $3 billion investment in Austin, Texas on January 15, 2026." }])
for entity in result.entities: print(f"{entity.text} -> {entity.entity_type.value} ({entity.confidence:.0%})")# Apple Inc. -> organization (95%)# $3 billion -> monetary (98%)# Austin, Texas -> location (97%)# January 15, 2026 -> date (99%)Example 2: Classification with confidence scores
Section titled “Example 2: Classification with confidence scores”Classify support tickets with a typed enum and a confidence threshold.
class TicketCategory(str, Enum): BILLING = "billing" TECHNICAL = "technical" ACCOUNT = "account" FEATURE_REQUEST = "feature_request" OTHER = "other"
class TicketClassification(BaseModel): category: TicketCategory confidence: float = Field(ge=0.0, le=1.0) reasoning: str = Field(description="One-sentence explanation for the classification")
classification = client.chat.completions.create( model="gpt-4o-2024-08-06", response_model=TicketClassification, messages=[{ "role": "user", "content": "I was charged twice for my subscription this month. Can you fix this?" }])# category=billing, confidence=0.96, reasoning="User reports duplicate charge on subscription"
# Route based on confidence thresholdif classification.confidence < 0.8: route_to_human_review(classification)else: route_to_handler(classification.category)Example 3: Multi-step agent output with nested schemas
Section titled “Example 3: Multi-step agent output with nested schemas”Define a structured plan that an agent must follow, with nested steps and dependencies.
class ToolCall(BaseModel): tool_name: str = Field(description="Name of the tool to invoke") arguments: dict = Field(description="Arguments to pass to the tool") expected_output: str = Field(description="What this tool call should return")
class PlanStep(BaseModel): step_number: int description: str tool_calls: list[ToolCall] = Field(default_factory=list) depends_on: list[int] = Field( default_factory=list, description="Step numbers that must complete before this step" )
class AgentPlan(BaseModel): goal: str = Field(description="The user's objective restated clearly") steps: list[PlanStep] = Field(description="Ordered execution plan") estimated_tool_calls: int = Field(description="Total number of tool invocations")
plan = client.chat.completions.create( model="gpt-4o-2024-08-06", response_model=AgentPlan, messages=[{ "role": "user", "content": "Find the top 3 competitors for Stripe in the payment processing space and compare their pricing." }])
# plan.steps is a list of PlanStep objects with typed ToolCall childrenfor step in plan.steps: print(f"Step {step.step_number}: {step.description}") for tc in step.tool_calls: print(f" -> {tc.tool_name}({tc.arguments})")7. Structured Outputs — Provider Comparison
Section titled “7. Structured Outputs — Provider Comparison”Two fundamentally different approaches: provider-native enforcement vs library-based enforcement.
Native JSON Mode vs Library-Based Enforcement
- Token-level constraint — model cannot produce invalid JSON
- Zero retry overhead — first response is always valid
- Supports nested objects, arrays, enums, and unions
- First request with new schema has compilation latency (~1-2s)
- Locked to OpenAI models only
- All fields must be required (no optional fields with strict mode)
- No custom Pydantic validators at the API level
- Works with OpenAI, Anthropic, Google, Mistral, and local models
- Full Pydantic v2 support including custom validators
- Automatic retry with validation error feedback on failure
- Outlines provides token-level guarantees for local models
- Prompt-based mode has 85-95% first-attempt success rate
- Retry loops add latency when validation fails
- Additional dependency in your stack
8. Structured Output Interview Questions
Section titled “8. Structured Output Interview Questions”These questions come up in system design rounds when candidates describe GenAI pipelines that consume LLM output programmatically.
Q: How do you guarantee an LLM returns valid JSON in production?
You use constrained decoding, not prompt engineering. OpenAI’s response_format: json_schema with strict: True constrains the token sampling so only schema-valid JSON can be generated. For Anthropic, use the tool-use-as-schema pattern with tool_choice forcing the specific tool. For open-source models, use Outlines or llama.cpp GBNF grammars. Prompt-based approaches (“please return JSON”) fail 5-15% of the time at scale and require retry loops.
Q: What happens when a structured output violates business rules that JSON Schema cannot express?
JSON Schema validates structure (types, required fields, enums). Business rules (e.g., end_date must be after start_date, total must equal sum(line_items)) require a validation layer on top. Use Pydantic model validators for this. Instructor’s retry mechanism feeds the validation error back to the model, so the second attempt knows exactly what constraint it violated. Design your pipeline with two validation stages: schema validation (guaranteed by constrained decoding) and business rule validation (Pydantic validators with retry).
Q: How do you handle structured outputs for streaming responses?
Streaming and structured outputs are partially compatible. OpenAI streams structured output tokens — you receive partial JSON as it generates. You cannot parse until the stream completes, but you can show progress. For real-time UX, stream the raw tokens for display while accumulating the full response for parsing at the end. Anthropic’s tool-use blocks stream the input field incrementally. Instructor supports streaming with create_partial which yields progressively more complete Pydantic objects.
Q: When would you choose prompt-based JSON over constrained decoding?
Two scenarios: (1) when you need optional fields — OpenAI’s strict mode requires all fields, so optional fields need prompt-based generation or a workaround using nullable types; (2) when you need the model to decide whether to return structured data or free text — constrained decoding always produces the schema, even when the input does not match any expected case. Hybrid approaches use constrained decoding for the outer structure and allow free-text fields within it.
9. Structured Outputs in Production — Reliability
Section titled “9. Structured Outputs in Production — Reliability”Schema enforcement alone does not make a production system reliable. You need retry strategies, fallback patterns, and monitoring.
Failure modes by enforcement method
Section titled “Failure modes by enforcement method”| Method | Can produce invalid JSON? | Can violate schema? | Can violate business rules? |
|---|---|---|---|
| Prompt-based | Yes (5-15% failure rate) | Yes | Yes |
| OpenAI strict mode | No | No | Yes |
| GBNF / Outlines | No | No | Yes |
| Instructor (auto-retry) | Rare (retries fix it) | Rare (retries fix it) | Depends on validators |
Retry strategy with Instructor
Section titled “Retry strategy with Instructor”Instructor retries include the validation error in the next prompt, giving the model targeted feedback.
import instructorfrom openai import OpenAIfrom tenacity import Retry
client = instructor.from_openai(OpenAI())
result = client.chat.completions.create( model="gpt-4o-2024-08-06", response_model=ExtractionResult, max_retries=3, # Retry up to 3 times with validation error feedback messages=[{"role": "user", "content": text_to_extract}])On each retry, Instructor appends the Pydantic ValidationError message to the conversation so the model sees exactly which field failed and why.
Fallback patterns
Section titled “Fallback patterns”- Schema simplification — If the full schema fails after retries, try a simpler schema with fewer required fields
- Model escalation — If a smaller model cannot produce valid output, escalate to a larger model for that specific request
- Graceful degradation — Return a partial result with a flag indicating which fields could not be extracted, rather than failing the entire request
Monitoring structured output quality
Section titled “Monitoring structured output quality”Track three metrics in production:
- First-attempt success rate — percentage of requests that produce valid output without retries (target: >95% with constrained decoding, >85% with prompt-based)
- Retry rate — percentage of requests needing at least one retry (alert if >10%)
- Schema violation rate after retries — requests that exhaust all retries and still fail (alert if >1%)
Log every validation failure with the raw output, the schema, and the error message. These logs are the training data for improving your schemas and prompts.
10. Summary and Key Takeaways
Section titled “10. Summary and Key Takeaways”Structured outputs are the interface layer between LLMs and typed software. The choice of enforcement method determines your reliability ceiling.
What to remember:
- Use constrained decoding when available — OpenAI’s
json_schemamode withstrict: Trueguarantees valid output at the token level. No retries needed. - Use Instructor for cross-provider workflows — It abstracts provider differences and adds Pydantic validation with automatic retry.
- Anthropic uses tool-use-as-schema — Define a tool whose
input_schemamatches your desired output. Force it withtool_choice. - Schema enforcement does not replace business validation — JSON Schema catches structural errors. Pydantic validators catch semantic errors. You need both layers.
- Monitor first-attempt success rate — This is the leading indicator of structured output health. A drop means your prompts, schemas, or model version changed in a way that breaks extraction.
Related
Section titled “Related”- LLM Tool Calling — Tool calling is structured outputs in action: the LLM produces typed function arguments
- AI Agents — Agents chain structured outputs across multi-step reasoning loops
- LLMOps — Production infrastructure for monitoring and scaling structured output pipelines
- LLM Evaluation — How to measure the quality of structured extraction at scale
- Python for GenAI — Pydantic, type hints, and the Python ecosystem for GenAI development
Frequently Asked Questions
What are structured outputs from LLMs?
Structured outputs force an LLM to return data in a specific format — typically JSON matching a predefined schema — instead of free-form text. This enables downstream code to parse the response reliably without regex extraction or error-prone string manipulation. OpenAI, Anthropic, and open-source models each provide different mechanisms for schema enforcement.
How does OpenAI's structured output mode work?
OpenAI's structured output mode uses the response_format parameter with type json_schema. You provide a JSON Schema definition, and the model's decoding process is constrained to only produce tokens that result in valid JSON matching that schema. This guarantees 100% schema compliance — the model physically cannot produce invalid output.
What is the Instructor library for structured outputs?
Instructor is a Python library that patches LLM client libraries (OpenAI, Anthropic, etc.) to accept Pydantic models as the desired output schema. It handles schema conversion, API calls, response parsing, and automatic retry with validation error feedback. You define a Pydantic model and call client.chat.completions.create with response_model=YourModel.
How do you get structured outputs from Anthropic Claude?
Anthropic does not have a native JSON mode equivalent to OpenAI's. Instead, you use the tool_use feature as a schema enforcement mechanism — define a single tool whose input_schema matches your desired output shape, then extract the structured data from the tool_use response block. The Instructor library automates this pattern.
What is constrained decoding for structured outputs?
Constrained decoding modifies the token sampling process so the model can only generate tokens that keep the output valid according to a grammar or schema. OpenAI's structured output mode and llama.cpp's GBNF grammars both use this approach. Unlike prompt-based methods, constrained decoding guarantees schema compliance at the token level.
When should you use structured outputs vs free text?
Use structured outputs when downstream code must parse the response programmatically — API responses, database inserts, agent tool calls, classification labels, or data extraction. Use free text when the output is meant for human consumption (summaries, explanations, creative writing) where rigid formatting would reduce quality.
How do you handle nested schemas in structured outputs?
Nested schemas work by defining Pydantic models that reference other models. For example, an Article model containing a list of Section models, each with a list of Citation models. OpenAI's structured output mode supports arbitrarily nested JSON Schemas. Instructor and Outlines both handle nested Pydantic models natively.
What happens when structured output validation fails?
With constrained decoding (OpenAI structured outputs, GBNF grammars), validation failure is impossible — the output is guaranteed valid by construction. With prompt-based approaches, validation failures require retry logic. Instructor implements automatic retries that include the validation error message in the next attempt, giving the model specific feedback on what to fix.
What is the performance cost of structured outputs?
Constrained decoding adds minimal latency — typically under 5ms per request for schema compilation. The first request with a new schema may take longer as the provider compiles the constraint grammar. Prompt-based approaches add no latency but require retry loops that can double or triple total latency when validation fails.
Can you use structured outputs with open-source models?
Yes. llama.cpp supports GBNF grammars that constrain output to match a formal grammar (including JSON schemas). The Outlines library provides structured generation for any Hugging Face model using finite-state machine constraints. vLLM supports guided decoding with JSON Schema or regex patterns.