Python Type Hints for AI Engineers — Typed LLM Pipelines (2026)
1. Why Type Hints Matter for AI Engineers
Section titled “1. Why Type Hints Matter for AI Engineers”Python type hints are a primary defense against the most common class of production bugs in LLM pipelines: data shape mismatches. An LLM pipeline chains multiple stages — embedding, retrieval, prompt building, generation, validation. At every boundary, the data must match the expected shape. Without type hints, these mismatches surface as runtime KeyError, TypeError, or AttributeError exceptions in production.
Type hints move these errors from runtime to development time. When you annotate a function that accepts list[DocumentChunk] and returns list[float], your IDE and mypy verify every call site. If a refactor changes the return type of the retrieval stage, mypy flags every downstream function that now receives the wrong type — before you run the code.
Three specific benefits for AI engineers:
- Catch bugs at development time. A misspelled key in a TypedDict, a swapped argument order, or a missing field in a structured output — mypy catches all of these statically.
- IDE support for complex data. LLM API responses are deeply nested dictionaries. Type hints give you autocompletion on
response["choices"][0]["message"]["content"]instead of guessing key names. - Self-documenting pipelines. A signature like
def retrieve(query: EmbeddingVector, top_k: int) -> list[ScoredDocument]communicates the interface to every engineer on the team.
2. When to Use Type Hints in AI Code
Section titled “2. When to Use Type Hints in AI Code”Type hints belong in all production AI code. The question is where they deliver the highest value first.
Always type these first: LLM response structures (eliminates dictionary key guessing), pipeline interfaces (makes data flow visible), tool definitions (validates what the LLM sends and receives), and configuration objects (constrains model names, temperature ranges, token limits).
Where type hints have the highest impact — at integration boundaries:
# Without types — what does this return?def process_query(query, context, model): ...
# With types — the contract is explicitdef process_query( query: str, context: list[DocumentChunk], model: Literal["gpt-4o", "claude-sonnet-4-20250514"],) -> GenerationResult: ...The typed version communicates the full contract. The caller knows exactly what to pass and what to expect.
When types are less critical: Throwaway scripts, notebook exploration, and one-off data analysis. But any code that enters a shared repository or runs in production should be typed.
3. Type System Architecture for AI
Section titled “3. Type System Architecture for AI”The type system in an AI pipeline follows a clear progression. Raw, untyped data enters from external sources. Each stage refines the type until the final result is fully typed and validated.
Type Flow in an AI Pipeline
Each layer serves a distinct purpose. The raw output layer acknowledges that external data is uncontrolled. The TypedDict/Pydantic layer imposes structure. The validated type layer applies domain rules. The pipeline stage layer preserves types through generic processing. The typed result layer guarantees consumers receive exactly what they expect. Skipping layers creates gaps where shape mismatches can reach production.
4. Type Hints Tutorial for AI
Section titled “4. Type Hints Tutorial for AI”Five core typing patterns that AI engineers use most frequently.
Pattern 1: TypedDict for LLM Responses
Section titled “Pattern 1: TypedDict for LLM Responses”from typing import TypedDict, NotRequired
class ChatMessage(TypedDict): role: str content: str
class UsageInfo(TypedDict): prompt_tokens: int completion_tokens: int total_tokens: int
class LLMResponse(TypedDict): id: str model: str choices: list[dict[str, ChatMessage]] usage: UsageInfo system_fingerprint: NotRequired[str]With this definition, response["usage"]["prompt_tokens"] is known to be int. Your IDE autocompletes the keys. mypy catches response["usages"] as a typo.
Pattern 2: Generic Pipelines
Section titled “Pattern 2: Generic Pipelines”from typing import TypeVar, Generic, Callable, Awaitable
T = TypeVar("T")R = TypeVar("R")
class PipelineStage(Generic[T, R]): def __init__(self, name: str, processor: Callable[[T], Awaitable[R]]) -> None: self.name = name self.processor = processor
async def execute(self, input_data: T) -> R: return await self.processor(input_data)
embed_stage: PipelineStage[str, list[float]] = PipelineStage( name="embed", processor=embed_text,)# mypy knows: embed_stage.execute("query") returns list[float]Pattern 3: Protocol for Tool Interfaces
Section titled “Pattern 3: Protocol for Tool Interfaces”from typing import Protocol, runtime_checkable
@runtime_checkableclass Tool(Protocol): @property def name(self) -> str: ... @property def description(self) -> str: ... async def execute(self, input_text: str) -> str: ...
# Any class with these methods satisfies Tool — no inheritance neededclass WebSearchTool: @property def name(self) -> str: return "web_search" @property def description(self) -> str: return "Search the web for current information" async def execute(self, input_text: str) -> str: return await search(input_text)
def register_tool(registry: dict[str, Tool], tool: Tool) -> None: registry[tool.name] = toolPattern 4: Literal for Model Names
Section titled “Pattern 4: Literal for Model Names”from typing import Literal
ModelName = Literal["gpt-4o", "gpt-4o-mini", "claude-sonnet-4-20250514"]Role = Literal["system", "user", "assistant"]
def create_completion(model: ModelName, messages: list[dict[str, str]]) -> LLMResponse: ...
create_completion(model="gpt4o", messages=[]) # mypy error: "gpt4o" not in LiteralPattern 5: Union for Multi-Modal Inputs
Section titled “Pattern 5: Union for Multi-Modal Inputs”from typing import TypedDict, Union, Literal
class TextInput(TypedDict): modality: Literal["text"] content: str
class ImageInput(TypedDict): modality: Literal["image"] url: str detail: Literal["low", "high", "auto"]
MultiModalInput = Union[TextInput, ImageInput]
def process_input(input_data: MultiModalInput) -> str: if input_data["modality"] == "text": return input_data["content"] return f"[Image: {input_data['url']}]"5. Type Safety Layers
Section titled “5. Type Safety Layers”A production AI application has multiple layers of type safety. Each layer catches a different category of error.
Type Safety Stack for AI Applications
The top layers define what data looks like — checked statically by mypy with zero runtime cost. The middle layers ensure data flows correctly between stages, preserving type information through transformations. The bottom layers operate at system boundaries where external data enters and types cannot be guaranteed statically. Together, static checking catches structural errors during development while runtime validation catches data errors in production.
6. AI Type Hint Examples
Section titled “6. AI Type Hint Examples”Three complete examples for real AI engineering problems.
Example 1: Typed RAG Pipeline
Section titled “Example 1: Typed RAG Pipeline”from typing import TypedDict
class DocumentChunk(TypedDict): id: str text: str metadata: dict[str, str] score: float
class RetrievalResult(TypedDict): query: str chunks: list[DocumentChunk] total_found: int
class GenerationResult(TypedDict): answer: str sources: list[str] tokens_used: int
async def embed_query(query: str) -> list[float]: ...async def retrieve(embedding: list[float], top_k: int = 5) -> RetrievalResult: ...async def generate(query: str, context: RetrievalResult) -> GenerationResult: ...
async def rag_pipeline(query: str) -> GenerationResult: embedding = await embed_query(query) retrieval = await retrieve(embedding) return await generate(query, retrieval)Example 2: Generic Retry Wrapper
Section titled “Example 2: Generic Retry Wrapper”import asynciofrom typing import TypeVar, Callable, Awaitable
T = TypeVar("T")
async def with_retry( fn: Callable[..., Awaitable[T]], *args: object, max_retries: int = 3, base_delay: float = 1.0,) -> T: last_exception: Exception | None = None for attempt in range(max_retries): try: return await fn(*args) except Exception as e: last_exception = e if attempt < max_retries - 1: await asyncio.sleep(base_delay * (2 ** attempt)) raise last_exception # type: ignore[misc]
# mypy preserves the return type through the wrapperresult: GenerationResult = await with_retry(generate, "What is RAG?", context)Example 3: Protocol-Based Tool Registry
Section titled “Example 3: Protocol-Based Tool Registry”from typing import Protocol, runtime_checkable
@runtime_checkableclass AgentTool(Protocol): @property def name(self) -> str: ... @property def parameters_schema(self) -> dict[str, object]: ... async def execute(self, **kwargs: object) -> str: ...
class ToolRegistry: def __init__(self) -> None: self._tools: dict[str, AgentTool] = {}
def register(self, tool: AgentTool) -> None: self._tools[tool.name] = tool
def list_schemas(self) -> list[dict[str, object]]: return [ {"name": t.name, "parameters": t.parameters_schema} for t in self._tools.values() ]7. Static vs Runtime Type Checking
Section titled “7. Static vs Runtime Type Checking”Python type hints support two complementary checking strategies. Understanding when to use each is critical for production AI code.
Static vs Runtime Type Checking
Use mypy for: internal function signatures, pipeline stage connections, configuration objects, module-level type correctness.
Use Pydantic for: LLM API response parsing, user input validation, database query results, any data crossing a system boundary.
The optimal pattern — define TypedDict for internal data shapes (zero overhead), use Pydantic models at boundaries where external data enters:
from pydantic import BaseModel, Field
class StructuredLLMOutput(BaseModel): answer: str = Field(min_length=1) confidence: float = Field(ge=0.0, le=1.0) sources: list[str] = Field(min_length=1)
# Validate at boundary, then flow as TypedDict internallyvalidated = StructuredLLMOutput.model_validate_json(raw_json)8. Interview Questions
Section titled “8. Interview Questions”Q1: How would you type a function that accepts either a single prompt or a batch?
Section titled “Q1: How would you type a function that accepts either a single prompt or a batch?”Use @overload to define separate signatures. generate("hello") returns GenerationResult. generate(["a", "b"]) returns list[GenerationResult]. mypy gives callers precise return types based on the input type.
Q2: What is the difference between Protocol and ABC for tool interfaces?
Section titled “Q2: What is the difference between Protocol and ABC for tool interfaces?”ABC uses nominal typing — a class must explicitly inherit from it. Protocol uses structural typing — any class with the required methods satisfies the Protocol without inheritance. For AI tool systems, Protocol is preferred because third-party tools do not need to import your interface.
Q3: Why should you avoid Any in AI pipeline code?
Section titled “Q3: Why should you avoid Any in AI pipeline code?”Any disables type checking at that point. If a pipeline stage returns Any, mypy cannot verify downstream stages receive the correct type. Prefer object over Any — it still requires explicit type narrowing before use.
Q4: How do you handle LLM API responses with optional fields?
Section titled “Q4: How do you handle LLM API responses with optional fields?”Use NotRequired (Python 3.11+) for TypedDict fields that may be absent. This is different from Optional[str], which means the key exists but the value may be None. NotRequired means the key itself may not exist in the dictionary.
9. Type Hints in Production AI Code
Section titled “9. Type Hints in Production AI Code”mypy Configuration
Section titled “mypy Configuration”[tool.mypy]python_version = "3.12"strict = truewarn_return_any = truedisallow_untyped_defs = true
[[tool.mypy.overrides]]module = ["langchain.*", "chromadb.*"]ignore_missing_imports = trueCI Integration
Section titled “CI Integration”Add mypy to your CI pipeline so type errors block merges. Type checking runs in seconds, even for large codebases.
Gradual Typing Strategy
Section titled “Gradual Typing Strategy”- Start at the boundaries. Type all public function signatures in pipeline modules.
- Type new code strictly. Every new module gets full strictness. No exceptions.
- Work inward. After boundaries are typed, add types to internal helpers.
- Eliminate Any. Track
Anycount as a code quality metric and reduce it each sprint.
Common Pitfalls
Section titled “Common Pitfalls”- Using
dictinstead ofTypedDict. Returningdict[str, Any]gives callers no information about keys or value types. - Overly broad Union types.
Union[str, int, float, list, dict, None]is effectively untyped. Narrow to actual types. - Ignoring generic variance.
list[Animal]is not a supertype oflist[Dog]. UseSequence[Animal]for read-only covariant access.
10. Summary and Related Resources
Section titled “10. Summary and Related Resources”Python type hints transform AI codebases from fragile dictionary-juggling into verified, self-documenting pipelines. The five core patterns — TypedDict, Generics, Protocol, Literal, and Union — address the specific challenges of typing LLM responses, building reusable pipeline stages, defining tool interfaces, constraining model parameters, and handling multi-modal inputs.
Related Guides
Section titled “Related Guides”- Python for GenAI Engineers — Async patterns, Pydantic, and production Python for AI
- Async Python Guide — asyncio fundamentals for LLM API calls and parallel pipelines
- Structured Outputs — LLM structured output techniques using typed schemas
- LLMOps — Operationalizing LLM pipelines with monitoring, versioning, and deployment
- Pydantic AI — Building type-safe AI agents with Pydantic AI framework
Frequently Asked Questions
Why are Python type hints important for AI engineering?
Type hints catch data shape mismatches at development time — before they become runtime crashes in production LLM pipelines. They enable IDE autocompletion for complex nested structures like LLM API responses, make pipeline interfaces self-documenting, and allow static analysis tools like mypy to verify that every function in your chain receives and returns the correct types.
What is the difference between TypedDict and Pydantic BaseModel for AI code?
TypedDict provides static type checking at development time with zero runtime cost — mypy verifies correct keys and value types. Pydantic BaseModel provides runtime validation, checking and coercing data when objects are created. Use TypedDict for internal data structures. Use Pydantic for external boundaries where data arrives from LLM APIs or user inputs.
How do you type LLM API responses in Python?
Use TypedDict to define the expected shape of LLM responses, including nested structures for choices, messages, and usage metadata. For structured outputs, define a TypedDict or Pydantic model for the parsed content. For streaming responses, type the generator with AsyncIterator parameterized by the chunk type.
How do Generic types improve AI pipeline code?
Generic types let you write pipeline components that work with any data type while preserving type information through the chain. A generic retry wrapper preserves the return type of the wrapped function. A generic Pipeline class ensures that chaining stages together is type-safe — connecting incompatible stages is caught before runtime.
What is the Protocol pattern for AI tool interfaces?
Protocol defines structural subtyping — any class that implements the required methods satisfies the Protocol without needing explicit inheritance. For AI tool registries, you define a Protocol with execute and description methods. Any tool class implementing these methods can be registered. Third-party tools work without modification.
How do you configure mypy for an AI Python project?
Start with strict mode in pyproject.toml: strict = true, warn_return_any = true, disallow_untyped_defs = true. Use per-module overrides for third-party AI libraries lacking type stubs. Add mypy to CI so type errors block merges.
When should you use Literal types in AI code?
Use Literal for model name parameters so mypy catches typos, for role fields in chat messages, and for configuration enums like embedding dimensions or similarity metrics. This prevents invalid values from reaching API calls where they would cause runtime errors.
How do you handle Union types for multi-modal AI inputs?
Define each modality as a distinct TypedDict with a discriminator field, then use Union to combine them. In processing functions, use isinstance checks or match statements to narrow the type. mypy verifies that you handle every variant in the Union.