LangGraph Multi-Agent — Supervisor Pattern Guide (2026)

This LangGraph multi-agent guide walks through the supervisor pattern — where a central agent routes tasks to specialized sub-agents for research, code generation, and review. You will build a complete working system with state management, message routing, error handling, and production deployment patterns.

1. Why LangGraph Multi-Agent Matters

Multi-agent systems decompose complex tasks across specialized sub-agents so each one focuses on what it does best — producing higher quality results than a single LLM call attempting everything at once.

Why Multi-Agent Systems Exist

A single LLM call handles simple tasks well. Ask it to summarize a document, generate a function, or answer a factual question — one prompt, one response, done.

Real-world problems are rarely that simple. Consider a user request like “Research the latest pricing for AWS Bedrock, write a cost comparison script, and review it for accuracy.” This requires three distinct capabilities: web research, code generation, and code review. A single prompt attempting all three produces mediocre results across the board.

Multi-agent systems solve this by decomposing complex tasks into specialized sub-tasks. Each agent focuses on what it does best, with a coordination layer managing the workflow. The result: higher quality outputs, clearer reasoning traces, and the ability to assign different tools and models to different tasks.

For background on agent architectures, see AI Agents and Agentic Design Patterns.

Why LangGraph for Multi-Agent

LangGraph provides the graph-based execution model that multi-agent systems need. Unlike simple chain-based frameworks, LangGraph gives you:

Explicit state management — shared state flows between nodes with type safety
Conditional routing — the supervisor decides which agent runs next based on the current state
Cycles and loops — agents can re-route tasks, retry on failure, or request human approval
Built-in persistence — checkpoint state for long-running workflows and recovery

If you are new to LangGraph, start with LangGraph Tutorial before continuing. For a comparison with LangChain’s simpler chain model, see LangChain vs LangGraph.

2. What’s New in 2026

Development	Impact
LangGraph 0.3+ Command API	The `Command` object replaces manual edge routing. Agents return `Command(goto="next_node")` instead of relying on conditional edge functions
Prebuilt `create_react_agent`	Sub-agents can be created with a single function call — model, tools, and system prompt in one line
Functional API	`@task` and `@entrypoint` decorators enable Python-native graph definitions without subclassing
LangGraph Platform	Managed deployment with built-in persistence, streaming, and cron jobs
Supervisor library	`langgraph-supervisor` package provides a prebuilt supervisor pattern out of the box
Swarm library	`langgraph-swarm` enables peer-to-peer agent handoffs without a central coordinator

3. How LangGraph Multi-Agent Works

The supervisor pattern is the most common multi-agent topology: one coordinator routes tasks to specialized sub-agents, then collects and merges their results.

The Supervisor Pattern Architecture

📊 Visual Explanation

LangGraph Multi-Agent — Supervisor Pattern

Supervisor receives requests, routes to specialized sub-agents, and merges results.

1. SupervisorRoutes tasks based on intent analysis

Receive request

Analyze intent

Select agent

Route task

2. Research AgentWeb search + document retrieval

Search APIs

Extract facts

Cite sources

Return findings

3. Code AgentCode generation + execution

Parse requirements

Generate code

Run tests

Return code

4. Review AgentQuality check + validation

Read outputs

Check accuracy

Flag issues

Approve or reject

Idle

The supervisor pattern uses a hub-and-spoke topology. The supervisor node sits at the center and decides which sub-agent handles each part of a request. Sub-agents execute independently and return results to the supervisor, which can then route to another agent or deliver the final response.

This differs from a sequential chain (A then B then C) because the supervisor can skip agents, re-invoke agents, or fan out to multiple agents based on the task requirements.

4. The Supervisor Pattern Explained

The supervisor uses structured output to classify each incoming task and route it to exactly one sub-agent, preventing ambiguous or overlapping execution.

Routing Logic

The supervisor agent needs to make one critical decision: which sub-agent should handle the current task? This is implemented as structured output from the supervisor LLM call using a Pydantic model:

class RouteDecision(BaseModel):
    next_agent: Literal["research", "code", "review", "FINISH"]
    reason: str           # Why this agent was selected
    task_description: str  # What the agent should do

The supervisor inspects the conversation history, identifies what needs to happen next, and returns a RouteDecision. LangGraph uses llm.with_structured_output(RouteDecision) to guarantee valid JSON from the LLM, and the graph routes execution to the correct sub-agent node based on the next_agent field.

When to Use the Supervisor Pattern

Scenario	Supervisor Pattern?	Why
Tasks requiring 2-4 distinct capabilities	Yes	Each capability maps to a sub-agent
Sequential pipeline (always A then B then C)	No	Use a simple chain instead
Peer agents with equal authority	No	Use the swarm pattern instead
Dynamic routing based on user intent	Yes	Supervisor excels at intent classification
Human-in-the-loop approval workflows	Yes	Supervisor can pause for human review

5. Building a Multi-Agent System

The implementation below uses LangGraph 0.3+ with create_react_agent for each sub-agent and the Command API for type-safe routing between nodes.

Complete Python Code — Supervisor + 3 Sub-Agents

This implementation uses LangGraph 0.3+ with the Command API and create_react_agent for sub-agents.

import operator
from typing import Annotated, Literal
from typing_extensions import TypedDict
from pydantic import BaseModel

from langchain_core.messages import HumanMessage, BaseMessage
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import create_react_agent
from langgraph.types import Command

# ── Tools for each agent ──────────────────────────
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.tools import tool

search_tool = TavilySearchResults(max_results=3)

@tool
def execute_python(code: str) -> str:
    """Execute Python code and return the output."""
    import subprocess
    result = subprocess.run(
        ["python", "-c", code],
        capture_output=True, text=True, timeout=30
    )
    if result.returncode != 0:
        return f"Error:\n{result.stderr}"
    return result.stdout or "Code executed successfully (no output)."

@tool
def review_code(code: str) -> str:
    """Review code for bugs, security issues, and best practices."""
    # In production, this would run static analysis tools
    return f"Reviewed {len(code.splitlines())} lines of code."

# ── Routing schema ────────────────────────────────
class RouteDecision(BaseModel):
    """Supervisor's routing decision."""
    next_agent: Literal["research", "code", "review", "FINISH"]
    reason: str
    task_description: str

# ── Shared state ──────────────────────────────────
class AgentState(TypedDict):
    messages: Annotated[list[BaseMessage], operator.add]
    next_agent: str

# ── Sub-agents ────────────────────────────────────
llm = ChatOpenAI(model="gpt-4o", temperature=0)

research_agent = create_react_agent(
    llm,
    tools=[search_tool],
    prompt="You are a research specialist. Search for accurate, "
           "up-to-date information. Always cite your sources. "
           "Return findings in a structured format.",
)

code_agent = create_react_agent(
    llm,
    tools=[execute_python],
    prompt="You are a senior Python developer. Write clean, "
           "well-documented code. Include error handling and "
           "type hints. Test your code before returning it.",
)

review_agent = create_react_agent(
    llm,
    tools=[review_code],
    prompt="You are a code reviewer and fact-checker. Verify "
           "accuracy of research findings and review code for "
           "bugs, security issues, and best practices. Be specific "
           "about any issues found.",
)

# ── Sub-agent node wrappers ───────────────────────
def research_node(state: AgentState) -> Command[Literal["supervisor"]]:
    result = research_agent.invoke({"messages": state["messages"]})
    return Command(
        update={"messages": result["messages"]},
        goto="supervisor",
    )

def code_node(state: AgentState) -> Command[Literal["supervisor"]]:
    result = code_agent.invoke({"messages": state["messages"]})
    return Command(
        update={"messages": result["messages"]},
        goto="supervisor",
    )

def review_node(state: AgentState) -> Command[Literal["supervisor"]]:
    result = review_agent.invoke({"messages": state["messages"]})
    return Command(
        update={"messages": result["messages"]},
        goto="supervisor",
    )

# ── Supervisor ────────────────────────────────────
SUPERVISOR_PROMPT = """You are a supervisor managing three agents:
- research: searches the web for information
- code: writes and executes Python code
- review: reviews code and verifies research accuracy

Given the conversation so far, decide which agent should act
next, or respond FINISH if the task is complete.

Respond with the agent name: research, code, review, or FINISH."""

def supervisor_node(state: AgentState) -> Command[
    Literal["research", "code", "review", "__end__"]
]:
    messages = [{"role": "system", "content": SUPERVISOR_PROMPT}] + state["messages"]
    response = llm.with_structured_output(RouteDecision).invoke(messages)

    if response.next_agent == "FINISH":
        return Command(goto=END, update={"next_agent": "FINISH"})

    return Command(
        goto=response.next_agent,
        update={
            "messages": [HumanMessage(
                content=f"[Supervisor → {response.next_agent}]: {response.task_description}"
            )],
            "next_agent": response.next_agent,
        },
    )

# ── Wire the graph ────────────────────────────────
graph = StateGraph(AgentState)
graph.add_node("supervisor", supervisor_node)
graph.add_node("research", research_node)
graph.add_node("code", code_node)
graph.add_node("review", review_node)

graph.add_edge(START, "supervisor")
app = graph.compile()

# ── Run ───────────────────────────────────────────
result = app.invoke({
    "messages": [HumanMessage(
        content="Research the current pricing for OpenAI's GPT-4o API, "
                "write a Python cost calculator function, and review "
                "the code for correctness."
    )]
})

for msg in result["messages"]:
    print(f"{msg.type}: {msg.content[:200]}")

What This Code Does

Three sub-agents are created with create_react_agent, each with its own tools and system prompt
Node wrappers invoke the sub-agent and return a Command that routes back to the supervisor
The supervisor uses structured output to decide which agent acts next or whether to finish
The graph starts at the supervisor, which fans out to sub-agents and collects results

6. State Management and Message Routing

LangGraph’s typed state schema is the shared memory that flows between all nodes, with reducer functions controlling how each field accumulates across routing steps.

The State Schema

LangGraph state is the backbone of multi-agent communication. Every node reads from and writes to the shared state.

from typing import Annotated
from typing_extensions import TypedDict
from langchain_core.messages import BaseMessage
import operator

class AgentState(TypedDict):
    # Messages accumulate using operator.add (append-only)
    messages: Annotated[list[BaseMessage], operator.add]
    # Track which agent was last active
    next_agent: str
    # Optional: track iteration count for loop limits
    iteration_count: int

The Annotated[list[BaseMessage], operator.add] pattern is critical. It tells LangGraph to append new messages rather than replace the entire list. Without this, each node would overwrite the conversation history.

Message Routing Patterns

Pattern 1: Fan-out / Fan-in — Supervisor sends tasks to multiple agents, then merges results.

def supervisor_fanout(state: AgentState) -> list[Command]:
    """Route to multiple agents in parallel."""
    return [
        Command(goto="research", update={"messages": [
            HumanMessage(content="Research the pricing data.")
        ]}),
        Command(goto="code", update={"messages": [
            HumanMessage(content="Write the calculator function.")
        ]}),
    ]

Pattern 2: Sequential with gates — Supervisor enforces ordering constraints.

def supervisor_sequential(state: AgentState) -> Command:
    """Enforce: research first, then code, then review."""
    last = state.get("next_agent", "")
    if last == "":
        return Command(goto="research")
    elif last == "research":
        return Command(goto="code")
    elif last == "code":
        return Command(goto="review")
    return Command(goto=END)

Pattern 3: Loop with exit condition — Supervisor re-routes until quality threshold is met.

def supervisor_with_retry(state: AgentState) -> Command:
    """Re-route to code agent if review finds issues."""
    if state.get("iteration_count", 0) > 3:
        return Command(goto=END)  # Safety limit

    last_message = state["messages"][-1].content
    if "APPROVED" in last_message:
        return Command(goto=END)
    return Command(goto="code")

Preventing Infinite Loops

Multi-agent systems can loop forever if the supervisor keeps routing between agents without converging. Always implement at least one of these safeguards:

Iteration counter — hard limit on total routing decisions (e.g., max 10)
Recursion limit — LangGraph’s built-in recursion_limit parameter in graph.compile()
Timeout — wall-clock time limit on the entire graph execution
Convergence check — detect repeated routing patterns and force termination

app = graph.compile()
# Built-in recursion limit prevents infinite loops
result = app.invoke(
    {"messages": [HumanMessage(content="...")]},
    config={"recursion_limit": 25},
)

7. Alternative Patterns

The supervisor pattern is one of several multi-agent architectures. Choose based on your coordination needs.

Hierarchical Multi-Agent

A supervisor delegates to mid-level supervisors, which manage their own sub-agents. Useful when the problem has natural sub-domains (e.g., a “backend team” supervisor and a “frontend team” supervisor, each with specialized agents).

Top Supervisor → Backend Supervisor → [DB Agent, API Agent]
               → Frontend Supervisor → [UI Agent, Test Agent]

Peer-to-Peer (Swarm Handoffs)

Agents hand off directly to each other without a central coordinator. Each agent decides who should act next. LangGraph’s langgraph-swarm package implements this pattern.

from langgraph_swarm import create_handoff_tool, create_swarm

research = create_react_agent(llm, [search_tool, create_handoff_tool("code")])
code = create_react_agent(llm, [execute_python, create_handoff_tool("research")])

swarm = create_swarm([research, code]).compile()

Best for: systems where agents are peers with equal authority and routing decisions depend on local context rather than global coordination.

Network Topology

Every agent can communicate with every other agent. No fixed routing structure — the graph dynamically determines paths. This is the most flexible but hardest to debug.

Pattern Comparison

Pattern	Coordination	Complexity	Best For
Supervisor	Centralized	Medium	2-5 agents with clear role separation
Hierarchical	Multi-level	High	Large teams with sub-domains
Swarm	Decentralized	Low-Medium	Peer agents with handoff logic
Network	Dynamic	Very High	Research, highly adaptive systems

For a broader comparison of agent framework options, see Agentic Frameworks Comparison.

8. Multi-Agent Trade-offs and Pitfalls

Multi-agent systems introduce compounding failure surfaces — routing confusion, token budget explosion, and infinite loops — each requiring explicit mitigation at design time.

Common Failure Modes

Routing confusion: The supervisor sends tasks to the wrong agent. This happens when agent role descriptions overlap or the supervisor prompt is ambiguous. Fix: make role boundaries explicit and non-overlapping in the supervisor prompt.

Message pollution: Sub-agents receive irrelevant messages from other agents’ conversations. The shared message list grows with every agent interaction, and later agents process all previous messages — including tool calls they do not need. Fix: filter messages per agent or use separate message channels.

Token budget explosion: Each sub-agent call includes the full conversation history. With 3 agents and 10 routing steps, the supervisor processes 10x the original message volume. Fix: summarize intermediate results, trim tool call messages, or use shorter context windows for sub-agents.

Cascading failures: If the research agent returns bad data, the code agent writes incorrect code, and the review agent may not catch it because it trusts the research. Fix: implement independent verification — the review agent should cross-check facts, not just review code syntax.

Infinite retry loops: The supervisor routes to the code agent, the review agent rejects the code, the supervisor routes back to the code agent, and the cycle repeats. The code agent may produce the same output each time. Fix: include the rejection reason in the re-routing message and enforce a retry limit.

Cost and Latency Implications

Each routing step involves at least one LLM call (the supervisor decision) plus the sub-agent’s LLM calls. A 3-agent system with one routing cycle per agent requires a minimum of 7 LLM calls (1 initial supervisor + 3 sub-agents + 3 supervisor re-evaluations). At GPT-4o pricing, a complex multi-agent interaction can cost $0.10-$0.50 per request.

Latency compounds similarly. If each LLM call takes 2 seconds, a 7-call workflow takes 14+ seconds sequentially. Parallel fan-out helps but adds complexity.

9. LangGraph Multi-Agent Interview Questions

Multi-agent questions test architectural thinking — interviewers want to see decomposition into agents, clear role boundaries, and anticipation of failure modes.

What Interviewers Expect

Multi-agent questions test architectural thinking. Interviewers want to see that you can decompose a problem into agents, define clear interfaces, and anticipate failure modes.

Strong vs Weak Answer Patterns

Q: “Design a multi-agent system for automated code review.”

Weak: “I would use multiple LLM calls — one for checking bugs, one for style, one for security.”

Strong: “I would use the supervisor pattern with three specialized agents. The supervisor receives the PR diff and routes to a security agent (checks for hardcoded secrets, SQL injection, dependency vulnerabilities), a logic agent (checks for bugs, edge cases, race conditions), and a style agent (checks naming conventions, documentation, complexity metrics). Each agent has specific tools — the security agent runs Bandit and Snyk, the logic agent can execute test cases, the style agent runs Ruff. The supervisor merges the three reviews into a single report with severity rankings. I would set a recursion limit of 5 to prevent the supervisor from endlessly re-routing, and I would filter messages so each agent only sees the PR diff and its own previous outputs, not the other agents’ reviews.”

Common Interview Questions

Compare the supervisor pattern vs swarm pattern — when would you use each?
How do you prevent infinite loops in a multi-agent graph?
Design a multi-agent system for customer support with escalation
What is the cost and latency overhead of multi-agent vs single-agent?
How does LangGraph’s state management differ from passing data between functions?
How would you test a multi-agent system? What failure modes would you check for?

10. Multi-Agent Systems in Production

Production multi-agent deployments require choosing between managed infrastructure (LangGraph Platform) and self-hosted persistence, plus streaming and observability from day one.

Production Deployment Patterns

Pattern 1: LangGraph Platform (Managed)

Client → LangGraph Cloud API → Supervisor Graph → Sub-agents → Response

Best for: teams that want managed infrastructure with built-in persistence, streaming, and monitoring. LangGraph Platform handles checkpointing, retries, and horizontal scaling.

Pattern 2: Self-Hosted with Persistence

Client → FastAPI → LangGraph (SQLite/Postgres checkpointer) → Sub-agents → Response

Best for: teams with specific infrastructure requirements or data residency constraints.

from langgraph.checkpoint.postgres import PostgresSaver

checkpointer = PostgresSaver.from_conn_string("postgresql://...")
app = graph.compile(checkpointer=checkpointer)

# Resume from a previous checkpoint
config = {"configurable": {"thread_id": "user-123"}}
result = app.invoke({"messages": [HumanMessage(content="Continue...")]}, config=config)

Pattern 3: Async with Streaming

async for event in app.astream_events(
    {"messages": [HumanMessage(content="...")]},
    version="v2",
):
    if event["event"] == "on_chat_model_stream":
        print(event["data"]["chunk"].content, end="", flush=True)

Observability

Multi-agent systems are hard to debug without proper tracing. Every routing decision, sub-agent invocation, and tool call should be logged.

LangSmith — native integration with LangGraph for trace visualization
Langfuse — open-source alternative with session-level tracing
Custom logging — add metadata to each node for structured logging

For a comparison of observability tools, see LangSmith vs Langfuse.

Monitoring Checklist

Routing accuracy: track how often the supervisor picks the correct agent (use human evaluation samples)
Sub-agent success rate: percentage of sub-agent calls that produce usable outputs
End-to-end latency: p50 and p95 across the full graph execution
Token consumption: total tokens per request — compare against single-agent baseline
Error rate: graph failures (timeouts, infinite loops, tool errors) per 1000 requests

11. Summary and Key Takeaways

Use these reference tables to quickly recall the key decisions: when to use multi-agent, which topology, how many agents, and what framework.

The Decision in 30 Seconds

Question	Answer
When to use multi-agent?	When a task requires 2+ distinct capabilities that a single agent handles poorly
Which pattern?	Supervisor for most cases. Swarm for peer handoffs. Hierarchical for large agent teams
How many agents?	Start with 2-3. Each additional agent adds latency and cost
Biggest risk?	Token budget explosion and infinite routing loops
Framework?	LangGraph 0.3+ with `Command` API and `create_react_agent`

Official Documentation

LangGraph Docs — core framework documentation
LangGraph Multi-Agent Tutorial — official supervisor example
langgraph-supervisor Package — prebuilt supervisor pattern
langgraph-swarm Package — prebuilt swarm pattern
LangGraph Platform — managed deployment

LangGraph Tutorial — foundational LangGraph concepts and single-agent graphs
LangChain vs LangGraph — when to use chains vs graphs
Agentic Design Patterns — reflection, planning, tool use, and multi-agent patterns
AI Agents — what agents are and how they work
Agentic Frameworks Comparison — LangGraph vs CrewAI vs AutoGen vs Pydantic AI
LLM Evaluation — measuring agent quality systematically

Last updated: March 2026. LangGraph’s API evolves quickly; verify current patterns against the official documentation.

Frequently Asked Questions

What is the supervisor pattern in multi-agent systems?

The supervisor pattern uses a central agent that routes tasks to specialized sub-agents for research, code generation, review, and other capabilities. The supervisor decides which agent runs next based on the current state, manages message routing between agents, and handles error recovery. This creates higher quality outputs because each agent focuses on what it does best.

Why use multi-agent systems instead of a single agent?

A single LLM call attempting multiple capabilities (research, coding, review) produces mediocre results across the board. Multi-agent systems decompose complex tasks into specialized sub-tasks where each agent focuses on one capability. This yields higher quality outputs, clearer reasoning traces, and the ability to assign different tools and models to different tasks.

How does LangGraph handle multi-agent state management?

LangGraph provides explicit state management where shared state flows between nodes with type safety. The supervisor node reads the current state to decide which agent runs next. Each sub-agent receives relevant state, performs its work, and returns state updates. Conditional routing enables re-routing on failure, and built-in persistence via checkpointers enables recovery for long-running workflows.

When should I use a multi-agent system vs a single agent?

Use multi-agent systems when the task requires multiple distinct capabilities (research plus coding plus review), when different sub-tasks benefit from different tools or models, or when you need clear separation of concerns for debugging and testing. A single agent is sufficient for focused tasks that require only one capability.

What is the Command API in LangGraph 0.3+?

The Command API replaces manual edge routing in LangGraph 0.3+. Agents return Command(goto='next_node') instead of relying on conditional edge functions, providing type-safe routing between nodes. Sub-agent node wrappers use Command to route back to the supervisor after execution.

How do you prevent infinite loops in a multi-agent graph?

Implement at least one safeguard: an iteration counter with a hard limit on total routing decisions, LangGraph's built-in recursion_limit parameter in graph.compile(), a wall-clock timeout on the entire graph execution, or a convergence check that detects repeated routing patterns and forces termination.

What tools does each sub-agent need in a supervisor system?

Each sub-agent is assigned tools matching its specialized role. A research agent uses search tools like TavilySearchResults for web research. A code agent uses code execution tools. A review agent uses code review and static analysis tools. Tools are bound to agents via create_react_agent, ensuring each agent only accesses capabilities relevant to its role.

What is the difference between the supervisor pattern and swarm pattern?

The supervisor pattern uses a central coordinator that routes tasks to specialized sub-agents based on intent classification. The swarm pattern uses peer-to-peer agent handoffs without a central coordinator — each agent decides who should act next. Use supervisor for 2-5 agents with clear role separation; use swarm for peer agents with equal authority.

How does fan-out and fan-in routing work in LangGraph?

Fan-out routing sends tasks to multiple agents in parallel by returning multiple Command objects from the supervisor node. Fan-in collects results from all parallel agents back at the supervisor, which merges them and decides whether additional routing is needed or the task is complete. This pattern reduces latency for tasks with independent sub-components.

What are the cost and latency implications of multi-agent systems?

Each routing step involves at least one LLM call for the supervisor decision plus the sub-agent's LLM calls. A 3-agent system with one routing cycle per agent requires a minimum of 7 LLM calls. At GPT-4o pricing, a complex multi-agent interaction can cost $0.10-$0.50 per request. Latency compounds similarly — a 7-call workflow takes 14+ seconds sequentially.