CrewAI Tutorial — Build Multi-Agent Systems in Python (2026)

Q: How do I install CrewAI?

Install CrewAI with pip: pip install crewai crewai-tools. This installs the core framework and the official tool library. CrewAI requires Python 3.10 or higher and works with OpenAI, Anthropic, Google, and other LLM providers out of the box.

Q: How do I add custom tools to a CrewAI agent?

Define tools using the @tool decorator from crewai, then pass them to the Agent constructor via the tools parameter. Each tool is a Python function with a descriptive name and docstring. CrewAI also integrates with LangChain tools and provides built-in tools like SerperDevTool for web search, FileReadTool for file access, and ScrapeWebsiteTool for web scraping.

Q: How do I get structured output from CrewAI?

Use the output_json or output_pydantic parameter on a Task. Define a Pydantic model with the fields you need, then set output_pydantic=YourModel on the task. CrewAI instructs the agent to return data matching that schema. This is essential for production pipelines where downstream code needs predictable data structures rather than free-form text.

CrewAI lets you build multi-agent AI systems where specialized agents collaborate on complex tasks — using plain Python. Instead of one monolithic prompt doing everything, you define agents with distinct roles, assign them tasks, and let CrewAI orchestrate the execution. This tutorial takes you from zero to a working multi-agent crew in under 20 minutes.

Who this is for:

Junior engineers: You want to build your first multi-agent system beyond a single LLM call
Senior engineers: You need a fast way to prototype role-based agent workflows before deciding on a production framework

Why CrewAI Matters

Single-agent systems break down when the task requires multiple areas of expertise. Cramming research, analysis, and writing into one prompt produces mediocre results because the LLM cannot specialize.

CrewAI solves this with role-based multi-agent coordination:

Challenge	Single Agent	With CrewAI
Research + analysis + writing in one prompt	Context overload, unfocused output	Three specialized agents, each with a clear role
Agent needs 10+ tools	Reasoning degrades with tool sprawl	Each agent gets only the tools it needs
Output quality varies unpredictably	No review step — one-shot generation	Reviewer agent checks quality before final output
Changing one part of the workflow	Rewrite the entire prompt	Swap one agent or task definition

CrewAI’s core insight: agent specialization through role assignment produces better results than general-purpose prompts. The framework handles orchestration, context passing, memory, and tool dispatch — you focus on defining who does what.

When to Use CrewAI

CrewAI fits workflows that map naturally to a team of specialists working together. If you can describe your workflow as “Agent A does X, passes the result to Agent B who does Y,” CrewAI is the right tool.

Strong use cases:

Research crews — A researcher gathers sources, an analyst extracts key findings, a writer produces the final report
Content pipelines — An SEO analyst identifies keywords, a writer drafts the article, an editor reviews tone and accuracy
Data analysis teams — A data collector pulls from APIs, a statistician runs analysis, a reporter summarizes findings
Customer support triage — A classifier categorizes tickets, a domain expert drafts responses, a QA agent reviews accuracy

When to reach for something else:

Workflows with loops and retries — LangGraph handles cycles natively. CrewAI’s sequential model does not loop.
Human-in-the-loop approval gates — CrewAI has basic human_input on tasks, but LangGraph’s interrupt() is more robust for production approval workflows.
Simple single-agent tasks — If one LLM call with a good prompt solves your problem, CrewAI overhead is unnecessary.

How CrewAI Works — Architecture

CrewAI follows a straightforward execution model: you define agents, assign them tasks, organize tasks into a crew, and kick off execution.

CrewAI Execution Pipeline

User Goal

Define the objective

Input topic or query

Configure crew parameters

Crew Manager

Orchestrates execution

Sequential or hierarchical process

Context passing between tasks

Agent Pool

Role-based specialists

Researcher agent

Writer agent

Reviewer agent

Task Execution

Agents complete assigned work

Tool calls (search, read, code)

LLM reasoning per agent

Aggregated Output

Final crew result

Structured output (Pydantic)

Raw text or file export

Idle

The Four Building Blocks

Agents — LLM-powered workers defined by a role, goal, and backstory. The backstory shapes the agent’s personality and approach. Each agent can have its own tools and LLM configuration.
Tasks — Units of work assigned to agents. Each task has a description, expected_output, and an assigned agent. Tasks can reference other tasks via context to receive their outputs.
Crews — Collections of agents and tasks with an execution strategy. Process.sequential runs tasks in order. Process.hierarchical adds a manager agent that delegates and reviews.
Tools — Python functions that give agents external capabilities like web search, file access, API calls, or code execution. Assigned per-agent to enforce role boundaries.

CrewAI Tutorial — Build Your First Crew

You will build a research crew with three agents: a researcher who gathers information, a writer who drafts a report, and a reviewer who checks quality.

Step 1: Install CrewAI

pip install crewai crewai-tools

CrewAI requires Python 3.10+ and works with OpenAI, Anthropic, and Google models. Set your API key:

export OPENAI_API_KEY="your-api-key-here"

Step 2: Define Your Agents

Each agent needs a role (job title), goal (what success looks like), and backstory (personality and expertise).

from crewai import Agent

researcher = Agent(
    role="Senior Research Analyst",
    goal="Find comprehensive, accurate information on the given topic",
    backstory=(
        "You are an experienced research analyst who excels at "
        "finding reliable sources, cross-referencing data points, "
        "and identifying the most important trends. You always "
        "cite your sources and flag uncertainty."
    ),
    verbose=True,
    allow_delegation=False
)

writer = Agent(
    role="Technical Content Writer",
    goal="Transform research findings into clear, engaging content",
    backstory=(
        "You are a skilled technical writer who turns complex "
        "research into accessible, well-structured articles. "
        "You prioritize clarity over jargon and always include "
        "concrete examples."
    ),
    verbose=True,
    allow_delegation=False
)

reviewer = Agent(
    role="Quality Assurance Editor",
    goal="Ensure content is accurate, complete, and well-structured",
    backstory=(
        "You are a meticulous editor who catches factual errors, "
        "unclear explanations, and structural problems. You provide "
        "specific, actionable feedback rather than vague suggestions."
    ),
    verbose=True,
    allow_delegation=False
)

Step 3: Define Your Tasks

Each task describes what needs to be done, what the output should look like, and which agent owns it.

from crewai import Task

research_task = Task(
    description=(
        "Research the current state of AI agents in enterprise "
        "software. Cover: key frameworks, adoption trends, "
        "common use cases, and challenges. Focus on data from "
        "2025-2026."
    ),
    expected_output=(
        "A structured research brief with 5 key findings, "
        "each supported by specific data points or examples. "
        "Include source references."
    ),
    agent=researcher
)

writing_task = Task(
    description=(
        "Write a 500-word article based on the research findings. "
        "Use clear headings, concrete examples, and a professional "
        "but accessible tone."
    ),
    expected_output=(
        "A polished article in markdown format with an introduction, "
        "3-4 sections with headings, and a conclusion."
    ),
    agent=writer,
    context=[research_task]  # Writer receives researcher's output
)

review_task = Task(
    description=(
        "Review the article for factual accuracy, clarity, and "
        "completeness. Check that all research findings are "
        "accurately represented and the writing is engaging."
    ),
    expected_output=(
        "The final article with any corrections applied, plus "
        "a brief editorial note listing changes made."
    ),
    agent=reviewer,
    context=[research_task, writing_task]  # Reviewer sees both
)

Step 4: Create and Run the Crew

from crewai import Crew, Process

crew = Crew(
    agents=[researcher, writer, reviewer],
    tasks=[research_task, writing_task, review_task],
    process=Process.sequential,
    verbose=True
)

result = crew.kickoff()
print(result.raw)

What happens when you run this:

The researcher executes first, producing a structured research brief
The writer receives the research brief via context and drafts the article
The reviewer receives both the research and the article, then produces the final output
crew.kickoff() returns a CrewOutput with .raw (string), .tasks_output (per-task results), and optional structured data

CrewAI Architecture Deep Dive

Understanding the full stack helps you debug issues and optimize performance.

CrewAI Architecture Stack

Your Application

crew.kickoff() entry point

Crew Orchestrator

Sequential or hierarchical process

Agent Definitions

Role, goal, backstory, LLM config

Task Queue

Ordered tasks with context dependencies

Tool Registry

@tool functions, LangChain tools, built-in tools

LLM Providers

OpenAI, Anthropic, Google, local models

Idle

Key Architecture Details

Memory system: Three layers — short-term (current run), long-term (across runs), and entity memory (tracks people, companies, concepts). Enable with memory=True on the Crew.

LLM flexibility: Each agent can use a different LLM. Set llm="gpt-4o" or llm="anthropic/claude-sonnet-4-20250514" on individual agents to mix models based on task requirements.

Tool isolation: Tools are assigned per-agent, not globally. This prevents a writer agent from calling a database deletion tool that only the admin agent should access.

Delegation: When allow_delegation=True, an agent can ask another agent in the crew for help. The manager agent in hierarchical mode uses this to route subtasks dynamically.

CrewAI Advanced Examples

Example 1: Research Crew with Web Search

from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool, ScrapeWebsiteTool

search_tool = SerperDevTool()
scrape_tool = ScrapeWebsiteTool()

researcher = Agent(
    role="Web Research Specialist",
    goal="Find the most relevant and recent information on the topic",
    backstory="Expert at web research who verifies facts across sources",
    tools=[search_tool, scrape_tool],
    verbose=True
)

analyst = Agent(
    role="Data Analyst",
    goal="Extract actionable insights from raw research data",
    backstory="Analytical thinker who spots patterns and trends in data",
    verbose=True
)

research_task = Task(
    description="Search for the latest developments in {topic}. Find at least 5 sources.",
    expected_output="A list of 5 key findings with source URLs",
    agent=researcher
)

analysis_task = Task(
    description="Analyze the research findings and produce 3 actionable recommendations",
    expected_output="3 prioritized recommendations with supporting evidence",
    agent=analyst,
    context=[research_task]
)

crew = Crew(
    agents=[researcher, analyst],
    tasks=[research_task, analysis_task],
    process=Process.sequential,
    verbose=True
)

result = crew.kickoff(inputs={"topic": "AI agent frameworks in 2026"})

Example 2: Code Review Crew with Structured Output

from crewai import Agent, Task, Crew, Process
from pydantic import BaseModel

class ReviewResult(BaseModel):
    issues_found: list[str]
    suggestions: list[str]
    approval_status: str

security_reviewer = Agent(
    role="Security Auditor",
    goal="Identify security vulnerabilities and unsafe patterns",
    backstory="Cybersecurity expert focused on code-level vulnerabilities",
    verbose=True
)

style_reviewer = Agent(
    role="Code Style Reviewer",
    goal="Ensure code follows best practices and is maintainable",
    backstory="Senior engineer who maintains coding standards for the team",
    verbose=True
)

security_task = Task(
    description="Review this code for security issues: {code_snippet}",
    expected_output="List of security vulnerabilities with severity ratings",
    agent=security_reviewer
)

summary_task = Task(
    description="Combine security review with style analysis into a final verdict",
    expected_output="Structured review result with approval status",
    agent=style_reviewer,
    context=[security_task],
    output_pydantic=ReviewResult  # Enforces structured output
)

review_crew = Crew(
    agents=[security_reviewer, style_reviewer],
    tasks=[security_task, summary_task],
    process=Process.sequential,
    verbose=True
)

result = review_crew.kickoff(inputs={
    "code_snippet": "def process_user_input(data): return eval(data)"
})
print(result.pydantic)  # ReviewResult instance

Example 3: Hierarchical Data Analysis Pipeline

from crewai import Agent, Task, Crew, Process

collector = Agent(
    role="Data Collection Specialist",
    goal="Gather and clean raw data from multiple sources",
    backstory="Data engineer skilled at ETL and data quality validation",
    verbose=True
)

statistician = Agent(
    role="Statistical Analyst",
    goal="Apply statistical methods to extract meaningful patterns",
    backstory="Statistician who translates numbers into business insights",
    verbose=True
)

collect_task = Task(
    description="Collect Q4 2025 sales data. Identify missing values and outliers.",
    expected_output="Clean dataset summary with data quality report",
    agent=collector
)

analyze_task = Task(
    description="Perform trend analysis and write an executive summary",
    expected_output="One-page summary with 3 key trends and recommendations",
    agent=statistician,
    context=[collect_task],
    output_file="executive_summary.md"
)

pipeline = Crew(
    agents=[collector, statistician],
    tasks=[collect_task, analyze_task],
    process=Process.hierarchical,  # Manager agent coordinates
    manager_llm="gpt-4o",
    memory=True,
    verbose=True
)

result = pipeline.kickoff()

CrewAI vs LangGraph Agents

The two most popular multi-agent frameworks in 2026 take fundamentally different approaches to agent coordination.

CrewAI vs LangGraph for Multi-Agent Systems

CrewAI

Role-based teams, fast prototyping

Intuitive role/goal/backstory agent design
Working crew in under 50 lines of Python
Built-in memory (short-term, long-term, entity)
Structured output via Pydantic on tasks
No native cycles or retry loops
No checkpoint-based state persistence
Basic human-in-the-loop (human_input flag)

LangGraph

Graph-based state machines, full control

Explicit execution graphs with conditional routing
Built-in checkpointing (SQLite, PostgreSQL, Redis)
First-class human-in-the-loop via interrupt()
Cycles enable retry loops and iterative refinement
More boilerplate — state schema + edge definitions
Steeper learning curve for simple tasks
No role-based abstractions — agents are just nodes

Verdict: Use CrewAI for team-based workflows with clear roles and sequential handoffs. Use LangGraph when you need cycles, durable state, or human approval gates.

Use case

Building multi-agent AI systems

Decision framework: If your workflow looks like an org chart (clear roles, sequential handoffs), start with CrewAI. If it looks like a flowchart (branches, loops, conditional routing), start with LangGraph. For a detailed breakdown, see LangGraph vs CrewAI.

Interview Questions

These four questions cover the multi-agent architecture concepts that come up in GenAI engineering interviews when discussing CrewAI and role-based agent coordination.

Q1: “What is CrewAI and how does it differ from single-agent systems?”

What they are testing: Do you understand why multi-agent architectures exist?

Strong answer: “CrewAI is a role-based multi-agent framework where you define agents with distinct roles, goals, and backstories, then assign them tasks in a crew. Unlike single-agent systems where one LLM handles everything, CrewAI splits work across specialized agents — a researcher, a writer, a reviewer — each focused on what it does best. This produces higher quality output because each agent’s context window is focused on its specific task rather than overloaded with the entire workflow.”

Weak answer: “CrewAI lets you run multiple LLMs at once.” (Misses the specialization and orchestration point)

Q2: “When would you choose CrewAI over LangGraph?”

What they are testing: Framework selection judgment — can you match the tool to the problem?

Strong answer: “I choose CrewAI when the workflow maps to a team of specialists with clear handoffs — like a content pipeline where a researcher, writer, and editor work sequentially. I choose LangGraph when the workflow has cycles, needs durable checkpointing, or requires human-in-the-loop approval gates. CrewAI gets me a working prototype faster; LangGraph gives me more control over execution flow.”

Q3: “How does context flow between tasks in CrewAI?”

What they are testing: Implementation-level understanding of the framework.

Strong answer: “Each task can reference other tasks via the context parameter. When a writing task sets context=[research_task], the writer agent receives the researcher’s output as additional context in its prompt. CrewAI also supports crew-level memory — short-term for the current run, long-term across runs, and entity memory for tracking specific subjects. The combination of explicit context passing and implicit memory gives agents both structured and ambient awareness.”

Q4: “What are the risks of multi-agent systems?”

What they are testing: Production maturity — do you think beyond the happy path?

Strong answer: “Three main risks: cost multiplication (each agent makes its own LLM calls, so a 3-agent crew costs roughly 3x a single agent), error propagation (a bad output from the first agent poisons all downstream tasks), and non-determinism (agent outputs vary between runs). I mitigate these with structured outputs via Pydantic to enforce consistency, verbose=True for debugging, cost monitoring per agent, and guardrails on critical tasks.”

CrewAI in Production

Moving from prototype to production requires attention to cost, reliability, and observability.

Cost management: A 3-agent crew with GPT-4o can cost $0.10-0.50 per run depending on task complexity. Use cheaper models (GPT-4o-mini, Claude Haiku) for formatting or classification tasks, and reserve expensive models for complex reasoning.

Structured outputs: Always use output_pydantic or output_json on tasks that feed into downstream code. Intermediate task outputs should be structured so the next agent can parse them reliably.

Error handling: Set max_retry_limit on the Crew for automatic retries. Wrap tool functions in try/except blocks and return descriptive error messages — the agent can adapt its approach when it gets a clear error instead of a stack trace.

Observability: Enable verbose=True during development. For production, CrewAI integrates with LangSmith. Log result.tasks_output to track per-task execution times and output quality.

Scaling: Use crew.kickoff_async() for concurrent execution or crew.kickoff_for_each(inputs=[...]) for batch processing. Pin crewai and crewai-tools versions in requirements.txt.

Summary and Key Takeaways

CrewAI models multi-agent workflows as teams — agents with roles, goals, and backstories collaborate on tasks
Four building blocks: Agents (who), Tasks (what), Crews (how), and Tools (with what)
Sequential process runs tasks in order; hierarchical process adds a manager agent for delegation and quality control
Context passing via the context parameter chains task outputs — the writer receives the researcher’s findings automatically
Structured outputs with output_pydantic enforce predictable data formats for production reliability
Start small — 2-3 agents with clear role boundaries. Expand only when you identify a genuine need for additional specialization
Know the limits — CrewAI does not support cycles or checkpoint-based persistence. For those, use LangGraph

LangGraph vs CrewAI — Detailed comparison of graph-based vs role-based orchestration
AI Agents — Agent architectures and when to use multi-agent systems
Agentic Frameworks Compared — CrewAI vs LangGraph vs AutoGen
Agentic Design Patterns — ReAct, Plan-and-Execute, and delegation patterns
Agent Debugging — Techniques for debugging multi-agent workflows

Frequently Asked Questions

What is CrewAI and what is it used for?

CrewAI is a Python framework for building multi-agent AI systems using role-based orchestration. You define agents with specific roles, goals, and backstories, assign them tasks, and organize them into crews that execute sequentially or hierarchically. CrewAI is used for research automation, content pipelines, data analysis teams, and any workflow where multiple specialized AI agents need to collaborate.

How do I install CrewAI?

Install CrewAI with pip: pip install crewai crewai-tools. This installs the core framework and the official tool library. CrewAI requires Python 3.10 or higher and works with OpenAI, Anthropic, Google, and other LLM providers out of the box.

What is the difference between sequential and hierarchical process in CrewAI?

Sequential process executes tasks in the order you define them — Task 1 completes, its output feeds into Task 2, and so on. Hierarchical process adds a manager agent that coordinates task delegation, reviews outputs, and decides when work meets quality standards. Use sequential for simple linear workflows and hierarchical when tasks require quality gates or dynamic delegation.

How do I add custom tools to a CrewAI agent?

Define tools using the @tool decorator from crewai, then pass them to the Agent constructor via the tools parameter. Each tool is a Python function with a descriptive name and docstring. CrewAI also integrates with LangChain tools and provides built-in tools like SerperDevTool for web search and ScrapeWebsiteTool for web scraping.

How does CrewAI compare to LangGraph?

CrewAI uses role-based orchestration where agents have roles, goals, and backstories — ideal for workflows that map to team structures. LangGraph uses graph-based state machines with explicit nodes, edges, and conditional routing — ideal for complex workflows with cycles, retries, and human-in-the-loop. CrewAI is faster to prototype; LangGraph gives more fine-grained execution control.

Can CrewAI agents share context between tasks?

Yes. Use the context parameter on a Task to pass outputs from previous tasks. When you set context=[research_task] on a writing task, the writer agent receives the researcher's output as additional context. CrewAI also supports crew-level memory (short-term, long-term, and entity memory) that persists information across all tasks in the crew run.

How do I get structured output from CrewAI?

Use the output_pydantic or output_json parameter on a Task. Define a Pydantic model with the fields you need, then set output_pydantic=YourModel on the task. CrewAI instructs the agent to return data matching that schema, which is essential for production pipelines where downstream code needs predictable data structures.

What are common mistakes when building CrewAI crews?

Common mistakes include writing vague agent backstories that do not constrain behavior, omitting expected_output on tasks so agents produce inconsistent formats, not using the context parameter to chain task outputs, giving agents too many tools which degrades reasoning quality, and skipping verbose=True during development which makes debugging nearly impossible.

Is CrewAI free and open source?

Yes. CrewAI is MIT-licensed and free to use. The core framework and tools library are open source on GitHub. CrewAI also offers CrewAI+ as a paid enterprise tier with managed deployment and monitoring dashboards, but the open-source version is fully functional for production use.

How do I handle errors and retries in CrewAI?

Set max_retry_limit on the Crew to allow automatic retries when tasks fail. For tool-level errors, wrap tool functions in try/except blocks and return descriptive error messages so the agent can adapt its approach. Use verbose=True to monitor agent reasoning during execution, and implement task callbacks to log outcomes and trigger alerts on failures.

Last updated: March 2026 | CrewAI 0.100+ / Python 3.10+