Skip to content

LangGraph Multi-Agent — Supervisor Pattern Guide (2026)

This LangGraph multi-agent guide walks through the supervisor pattern — where a central agent routes tasks to specialized sub-agents for research, code generation, and review. You will build a complete working system with state management, message routing, error handling, and production deployment patterns.

Multi-agent systems decompose complex tasks across specialized sub-agents so each one focuses on what it does best — producing higher quality results than a single LLM call attempting everything at once.

A single LLM call handles simple tasks well. Ask it to summarize a document, generate a function, or answer a factual question — one prompt, one response, done.

Real-world problems are rarely that simple. Consider a user request like “Research the latest pricing for AWS Bedrock, write a cost comparison script, and review it for accuracy.” This requires three distinct capabilities: web research, code generation, and code review. A single prompt attempting all three produces mediocre results across the board.

Multi-agent systems solve this by decomposing complex tasks into specialized sub-tasks. Each agent focuses on what it does best, with a coordination layer managing the workflow. The result: higher quality outputs, clearer reasoning traces, and the ability to assign different tools and models to different tasks.

For background on agent architectures, see AI Agents and Agentic Design Patterns.

LangGraph provides the graph-based execution model that multi-agent systems need. Unlike simple chain-based frameworks, LangGraph gives you:

  • Explicit state management — shared state flows between nodes with type safety
  • Conditional routing — the supervisor decides which agent runs next based on the current state
  • Cycles and loops — agents can re-route tasks, retry on failure, or request human approval
  • Built-in persistence — checkpoint state for long-running workflows and recovery

If you are new to LangGraph, start with LangGraph Tutorial before continuing. For a comparison with LangChain’s simpler chain model, see LangChain vs LangGraph.


DevelopmentImpact
LangGraph 0.3+ Command APIThe Command object replaces manual edge routing. Agents return Command(goto="next_node") instead of relying on conditional edge functions
Prebuilt create_react_agentSub-agents can be created with a single function call — model, tools, and system prompt in one line
Functional API@task and @entrypoint decorators enable Python-native graph definitions without subclassing
LangGraph PlatformManaged deployment with built-in persistence, streaming, and cron jobs
Supervisor librarylanggraph-supervisor package provides a prebuilt supervisor pattern out of the box
Swarm librarylanggraph-swarm enables peer-to-peer agent handoffs without a central coordinator

The supervisor pattern is the most common multi-agent topology: one coordinator routes tasks to specialized sub-agents, then collects and merges their results.

LangGraph Multi-Agent — Supervisor Pattern

Supervisor receives requests, routes to specialized sub-agents, and merges results.

1. SupervisorRoutes tasks based on intent analysis
Receive request
Analyze intent
Select agent
Route task
2. Research AgentWeb search + document retrieval
Search APIs
Extract facts
Cite sources
Return findings
3. Code AgentCode generation + execution
Parse requirements
Generate code
Run tests
Return code
4. Review AgentQuality check + validation
Read outputs
Check accuracy
Flag issues
Approve or reject
Idle

The supervisor pattern uses a hub-and-spoke topology. The supervisor node sits at the center and decides which sub-agent handles each part of a request. Sub-agents execute independently and return results to the supervisor, which can then route to another agent or deliver the final response.

This differs from a sequential chain (A then B then C) because the supervisor can skip agents, re-invoke agents, or fan out to multiple agents based on the task requirements.


The supervisor uses structured output to classify each incoming task and route it to exactly one sub-agent, preventing ambiguous or overlapping execution.

The supervisor agent needs to make one critical decision: which sub-agent should handle the current task? This is implemented as structured output from the supervisor LLM call using a Pydantic model:

class RouteDecision(BaseModel):
next_agent: Literal["research", "code", "review", "FINISH"]
reason: str # Why this agent was selected
task_description: str # What the agent should do

The supervisor inspects the conversation history, identifies what needs to happen next, and returns a RouteDecision. LangGraph uses llm.with_structured_output(RouteDecision) to guarantee valid JSON from the LLM, and the graph routes execution to the correct sub-agent node based on the next_agent field.

ScenarioSupervisor Pattern?Why
Tasks requiring 2-4 distinct capabilitiesYesEach capability maps to a sub-agent
Sequential pipeline (always A then B then C)NoUse a simple chain instead
Peer agents with equal authorityNoUse the swarm pattern instead
Dynamic routing based on user intentYesSupervisor excels at intent classification
Human-in-the-loop approval workflowsYesSupervisor can pause for human review

The implementation below uses LangGraph 0.3+ with create_react_agent for each sub-agent and the Command API for type-safe routing between nodes.

Complete Python Code — Supervisor + 3 Sub-Agents

Section titled “Complete Python Code — Supervisor + 3 Sub-Agents”

This implementation uses LangGraph 0.3+ with the Command API and create_react_agent for sub-agents.

import operator
from typing import Annotated, Literal
from typing_extensions import TypedDict
from pydantic import BaseModel
from langchain_core.messages import HumanMessage, BaseMessage
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import create_react_agent
from langgraph.types import Command
# ── Tools for each agent ──────────────────────────
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.tools import tool
search_tool = TavilySearchResults(max_results=3)
@tool
def execute_python(code: str) -> str:
"""Execute Python code and return the output."""
import subprocess
result = subprocess.run(
["python", "-c", code],
capture_output=True, text=True, timeout=30
)
if result.returncode != 0:
return f"Error:\n{result.stderr}"
return result.stdout or "Code executed successfully (no output)."
@tool
def review_code(code: str) -> str:
"""Review code for bugs, security issues, and best practices."""
# In production, this would run static analysis tools
return f"Reviewed {len(code.splitlines())} lines of code."
# ── Routing schema ────────────────────────────────
class RouteDecision(BaseModel):
"""Supervisor's routing decision."""
next_agent: Literal["research", "code", "review", "FINISH"]
reason: str
task_description: str
# ── Shared state ──────────────────────────────────
class AgentState(TypedDict):
messages: Annotated[list[BaseMessage], operator.add]
next_agent: str
# ── Sub-agents ────────────────────────────────────
llm = ChatOpenAI(model="gpt-4o", temperature=0)
research_agent = create_react_agent(
llm,
tools=[search_tool],
prompt="You are a research specialist. Search for accurate, "
"up-to-date information. Always cite your sources. "
"Return findings in a structured format.",
)
code_agent = create_react_agent(
llm,
tools=[execute_python],
prompt="You are a senior Python developer. Write clean, "
"well-documented code. Include error handling and "
"type hints. Test your code before returning it.",
)
review_agent = create_react_agent(
llm,
tools=[review_code],
prompt="You are a code reviewer and fact-checker. Verify "
"accuracy of research findings and review code for "
"bugs, security issues, and best practices. Be specific "
"about any issues found.",
)
# ── Sub-agent node wrappers ───────────────────────
def research_node(state: AgentState) -> Command[Literal["supervisor"]]:
result = research_agent.invoke({"messages": state["messages"]})
return Command(
update={"messages": result["messages"]},
goto="supervisor",
)
def code_node(state: AgentState) -> Command[Literal["supervisor"]]:
result = code_agent.invoke({"messages": state["messages"]})
return Command(
update={"messages": result["messages"]},
goto="supervisor",
)
def review_node(state: AgentState) -> Command[Literal["supervisor"]]:
result = review_agent.invoke({"messages": state["messages"]})
return Command(
update={"messages": result["messages"]},
goto="supervisor",
)
# ── Supervisor ────────────────────────────────────
SUPERVISOR_PROMPT = """You are a supervisor managing three agents:
- research: searches the web for information
- code: writes and executes Python code
- review: reviews code and verifies research accuracy
Given the conversation so far, decide which agent should act
next, or respond FINISH if the task is complete.
Respond with the agent name: research, code, review, or FINISH."""
def supervisor_node(state: AgentState) -> Command[
Literal["research", "code", "review", "__end__"]
]:
messages = [{"role": "system", "content": SUPERVISOR_PROMPT}] + state["messages"]
response = llm.with_structured_output(RouteDecision).invoke(messages)
if response.next_agent == "FINISH":
return Command(goto=END, update={"next_agent": "FINISH"})
return Command(
goto=response.next_agent,
update={
"messages": [HumanMessage(
content=f"[Supervisor → {response.next_agent}]: {response.task_description}"
)],
"next_agent": response.next_agent,
},
)
# ── Wire the graph ────────────────────────────────
graph = StateGraph(AgentState)
graph.add_node("supervisor", supervisor_node)
graph.add_node("research", research_node)
graph.add_node("code", code_node)
graph.add_node("review", review_node)
graph.add_edge(START, "supervisor")
app = graph.compile()
# ── Run ───────────────────────────────────────────
result = app.invoke({
"messages": [HumanMessage(
content="Research the current pricing for OpenAI's GPT-4o API, "
"write a Python cost calculator function, and review "
"the code for correctness."
)]
})
for msg in result["messages"]:
print(f"{msg.type}: {msg.content[:200]}")
  1. Three sub-agents are created with create_react_agent, each with its own tools and system prompt
  2. Node wrappers invoke the sub-agent and return a Command that routes back to the supervisor
  3. The supervisor uses structured output to decide which agent acts next or whether to finish
  4. The graph starts at the supervisor, which fans out to sub-agents and collects results

LangGraph’s typed state schema is the shared memory that flows between all nodes, with reducer functions controlling how each field accumulates across routing steps.

LangGraph state is the backbone of multi-agent communication. Every node reads from and writes to the shared state.

from typing import Annotated
from typing_extensions import TypedDict
from langchain_core.messages import BaseMessage
import operator
class AgentState(TypedDict):
# Messages accumulate using operator.add (append-only)
messages: Annotated[list[BaseMessage], operator.add]
# Track which agent was last active
next_agent: str
# Optional: track iteration count for loop limits
iteration_count: int

The Annotated[list[BaseMessage], operator.add] pattern is critical. It tells LangGraph to append new messages rather than replace the entire list. Without this, each node would overwrite the conversation history.

Pattern 1: Fan-out / Fan-in — Supervisor sends tasks to multiple agents, then merges results.

def supervisor_fanout(state: AgentState) -> list[Command]:
"""Route to multiple agents in parallel."""
return [
Command(goto="research", update={"messages": [
HumanMessage(content="Research the pricing data.")
]}),
Command(goto="code", update={"messages": [
HumanMessage(content="Write the calculator function.")
]}),
]

Pattern 2: Sequential with gates — Supervisor enforces ordering constraints.

def supervisor_sequential(state: AgentState) -> Command:
"""Enforce: research first, then code, then review."""
last = state.get("next_agent", "")
if last == "":
return Command(goto="research")
elif last == "research":
return Command(goto="code")
elif last == "code":
return Command(goto="review")
return Command(goto=END)

Pattern 3: Loop with exit condition — Supervisor re-routes until quality threshold is met.

def supervisor_with_retry(state: AgentState) -> Command:
"""Re-route to code agent if review finds issues."""
if state.get("iteration_count", 0) > 3:
return Command(goto=END) # Safety limit
last_message = state["messages"][-1].content
if "APPROVED" in last_message:
return Command(goto=END)
return Command(goto="code")

Multi-agent systems can loop forever if the supervisor keeps routing between agents without converging. Always implement at least one of these safeguards:

  • Iteration counter — hard limit on total routing decisions (e.g., max 10)
  • Recursion limit — LangGraph’s built-in recursion_limit parameter in graph.compile()
  • Timeout — wall-clock time limit on the entire graph execution
  • Convergence check — detect repeated routing patterns and force termination
app = graph.compile()
# Built-in recursion limit prevents infinite loops
result = app.invoke(
{"messages": [HumanMessage(content="...")]},
config={"recursion_limit": 25},
)

The supervisor pattern is one of several multi-agent architectures. Choose based on your coordination needs.

A supervisor delegates to mid-level supervisors, which manage their own sub-agents. Useful when the problem has natural sub-domains (e.g., a “backend team” supervisor and a “frontend team” supervisor, each with specialized agents).

Top Supervisor → Backend Supervisor → [DB Agent, API Agent]
→ Frontend Supervisor → [UI Agent, Test Agent]

Agents hand off directly to each other without a central coordinator. Each agent decides who should act next. LangGraph’s langgraph-swarm package implements this pattern.

from langgraph_swarm import create_handoff_tool, create_swarm
research = create_react_agent(llm, [search_tool, create_handoff_tool("code")])
code = create_react_agent(llm, [execute_python, create_handoff_tool("research")])
swarm = create_swarm([research, code]).compile()

Best for: systems where agents are peers with equal authority and routing decisions depend on local context rather than global coordination.

Every agent can communicate with every other agent. No fixed routing structure — the graph dynamically determines paths. This is the most flexible but hardest to debug.

PatternCoordinationComplexityBest For
SupervisorCentralizedMedium2-5 agents with clear role separation
HierarchicalMulti-levelHighLarge teams with sub-domains
SwarmDecentralizedLow-MediumPeer agents with handoff logic
NetworkDynamicVery HighResearch, highly adaptive systems

For a broader comparison of agent framework options, see Agentic Frameworks Comparison.


Multi-agent systems introduce compounding failure surfaces — routing confusion, token budget explosion, and infinite loops — each requiring explicit mitigation at design time.

Routing confusion: The supervisor sends tasks to the wrong agent. This happens when agent role descriptions overlap or the supervisor prompt is ambiguous. Fix: make role boundaries explicit and non-overlapping in the supervisor prompt.

Message pollution: Sub-agents receive irrelevant messages from other agents’ conversations. The shared message list grows with every agent interaction, and later agents process all previous messages — including tool calls they do not need. Fix: filter messages per agent or use separate message channels.

Token budget explosion: Each sub-agent call includes the full conversation history. With 3 agents and 10 routing steps, the supervisor processes 10x the original message volume. Fix: summarize intermediate results, trim tool call messages, or use shorter context windows for sub-agents.

Cascading failures: If the research agent returns bad data, the code agent writes incorrect code, and the review agent may not catch it because it trusts the research. Fix: implement independent verification — the review agent should cross-check facts, not just review code syntax.

Infinite retry loops: The supervisor routes to the code agent, the review agent rejects the code, the supervisor routes back to the code agent, and the cycle repeats. The code agent may produce the same output each time. Fix: include the rejection reason in the re-routing message and enforce a retry limit.

Each routing step involves at least one LLM call (the supervisor decision) plus the sub-agent’s LLM calls. A 3-agent system with one routing cycle per agent requires a minimum of 7 LLM calls (1 initial supervisor + 3 sub-agents + 3 supervisor re-evaluations). At GPT-4o pricing, a complex multi-agent interaction can cost $0.10-$0.50 per request.

Latency compounds similarly. If each LLM call takes 2 seconds, a 7-call workflow takes 14+ seconds sequentially. Parallel fan-out helps but adds complexity.


9. LangGraph Multi-Agent Interview Questions

Section titled “9. LangGraph Multi-Agent Interview Questions”

Multi-agent questions test architectural thinking — interviewers want to see decomposition into agents, clear role boundaries, and anticipation of failure modes.

Multi-agent questions test architectural thinking. Interviewers want to see that you can decompose a problem into agents, define clear interfaces, and anticipate failure modes.

Q: “Design a multi-agent system for automated code review.”

Weak: “I would use multiple LLM calls — one for checking bugs, one for style, one for security.”

Strong: “I would use the supervisor pattern with three specialized agents. The supervisor receives the PR diff and routes to a security agent (checks for hardcoded secrets, SQL injection, dependency vulnerabilities), a logic agent (checks for bugs, edge cases, race conditions), and a style agent (checks naming conventions, documentation, complexity metrics). Each agent has specific tools — the security agent runs Bandit and Snyk, the logic agent can execute test cases, the style agent runs Ruff. The supervisor merges the three reviews into a single report with severity rankings. I would set a recursion limit of 5 to prevent the supervisor from endlessly re-routing, and I would filter messages so each agent only sees the PR diff and its own previous outputs, not the other agents’ reviews.”

  • Compare the supervisor pattern vs swarm pattern — when would you use each?
  • How do you prevent infinite loops in a multi-agent graph?
  • Design a multi-agent system for customer support with escalation
  • What is the cost and latency overhead of multi-agent vs single-agent?
  • How does LangGraph’s state management differ from passing data between functions?
  • How would you test a multi-agent system? What failure modes would you check for?

Production multi-agent deployments require choosing between managed infrastructure (LangGraph Platform) and self-hosted persistence, plus streaming and observability from day one.

Pattern 1: LangGraph Platform (Managed)

Client → LangGraph Cloud API → Supervisor Graph → Sub-agents → Response

Best for: teams that want managed infrastructure with built-in persistence, streaming, and monitoring. LangGraph Platform handles checkpointing, retries, and horizontal scaling.

Pattern 2: Self-Hosted with Persistence

Client → FastAPI → LangGraph (SQLite/Postgres checkpointer) → Sub-agents → Response

Best for: teams with specific infrastructure requirements or data residency constraints.

from langgraph.checkpoint.postgres import PostgresSaver
checkpointer = PostgresSaver.from_conn_string("postgresql://...")
app = graph.compile(checkpointer=checkpointer)
# Resume from a previous checkpoint
config = {"configurable": {"thread_id": "user-123"}}
result = app.invoke({"messages": [HumanMessage(content="Continue...")]}, config=config)

Pattern 3: Async with Streaming

async for event in app.astream_events(
{"messages": [HumanMessage(content="...")]},
version="v2",
):
if event["event"] == "on_chat_model_stream":
print(event["data"]["chunk"].content, end="", flush=True)

Multi-agent systems are hard to debug without proper tracing. Every routing decision, sub-agent invocation, and tool call should be logged.

  • LangSmith — native integration with LangGraph for trace visualization
  • Langfuse — open-source alternative with session-level tracing
  • Custom logging — add metadata to each node for structured logging

For a comparison of observability tools, see LangSmith vs Langfuse.

  • Routing accuracy: track how often the supervisor picks the correct agent (use human evaluation samples)
  • Sub-agent success rate: percentage of sub-agent calls that produce usable outputs
  • End-to-end latency: p50 and p95 across the full graph execution
  • Token consumption: total tokens per request — compare against single-agent baseline
  • Error rate: graph failures (timeouts, infinite loops, tool errors) per 1000 requests

Use these reference tables to quickly recall the key decisions: when to use multi-agent, which topology, how many agents, and what framework.

QuestionAnswer
When to use multi-agent?When a task requires 2+ distinct capabilities that a single agent handles poorly
Which pattern?Supervisor for most cases. Swarm for peer handoffs. Hierarchical for large agent teams
How many agents?Start with 2-3. Each additional agent adds latency and cost
Biggest risk?Token budget explosion and infinite routing loops
Framework?LangGraph 0.3+ with Command API and create_react_agent

Last updated: March 2026. LangGraph’s API evolves quickly; verify current patterns against the official documentation.

Frequently Asked Questions

What is the supervisor pattern in multi-agent systems?

The supervisor pattern uses a central agent that routes tasks to specialized sub-agents for research, code generation, review, and other capabilities. The supervisor decides which agent runs next based on the current state, manages message routing between agents, and handles error recovery. This creates higher quality outputs because each agent focuses on what it does best.

Why use multi-agent systems instead of a single agent?

A single LLM call attempting multiple capabilities (research, coding, review) produces mediocre results across the board. Multi-agent systems decompose complex tasks into specialized sub-tasks where each agent focuses on one capability. This yields higher quality outputs, clearer reasoning traces, and the ability to assign different tools and models to different tasks.

How does LangGraph handle multi-agent state management?

LangGraph provides explicit state management where shared state flows between nodes with type safety. The supervisor node reads the current state to decide which agent runs next. Each sub-agent receives relevant state, performs its work, and returns state updates. Conditional routing enables re-routing on failure, and built-in persistence via checkpointers enables recovery for long-running workflows.

When should I use a multi-agent system vs a single agent?

Use multi-agent systems when the task requires multiple distinct capabilities (research plus coding plus review), when different sub-tasks benefit from different tools or models, or when you need clear separation of concerns for debugging and testing. A single agent is sufficient for focused tasks that require only one capability.

What is the Command API in LangGraph 0.3+?

The Command API replaces manual edge routing in LangGraph 0.3+. Agents return Command(goto='next_node') instead of relying on conditional edge functions, providing type-safe routing between nodes. Sub-agent node wrappers use Command to route back to the supervisor after execution.

How do you prevent infinite loops in a multi-agent graph?

Implement at least one safeguard: an iteration counter with a hard limit on total routing decisions, LangGraph's built-in recursion_limit parameter in graph.compile(), a wall-clock timeout on the entire graph execution, or a convergence check that detects repeated routing patterns and forces termination.

What tools does each sub-agent need in a supervisor system?

Each sub-agent is assigned tools matching its specialized role. A research agent uses search tools like TavilySearchResults for web research. A code agent uses code execution tools. A review agent uses code review and static analysis tools. Tools are bound to agents via create_react_agent, ensuring each agent only accesses capabilities relevant to its role.

What is the difference between the supervisor pattern and swarm pattern?

The supervisor pattern uses a central coordinator that routes tasks to specialized sub-agents based on intent classification. The swarm pattern uses peer-to-peer agent handoffs without a central coordinator — each agent decides who should act next. Use supervisor for 2-5 agents with clear role separation; use swarm for peer agents with equal authority.

How does fan-out and fan-in routing work in LangGraph?

Fan-out routing sends tasks to multiple agents in parallel by returning multiple Command objects from the supervisor node. Fan-in collects results from all parallel agents back at the supervisor, which merges them and decides whether additional routing is needed or the task is complete. This pattern reduces latency for tasks with independent sub-components.

What are the cost and latency implications of multi-agent systems?

Each routing step involves at least one LLM call for the supervisor decision plus the sub-agent's LLM calls. A 3-agent system with one routing cycle per agent requires a minimum of 7 LLM calls. At GPT-4o pricing, a complex multi-agent interaction can cost $0.10-$0.50 per request. Latency compounds similarly — a 7-call workflow takes 14+ seconds sequentially.