Build an AI Agent in Python — From Scratch, No Framework (2026)

This tutorial walks you through building a working AI agent in Python from scratch — no LangChain, no CrewAI, no framework. You will implement tool calling, conversation memory, a ReAct planning loop, and error recovery using only the OpenAI Python SDK. By the end, you will have a fully functional agent you can run, extend, and use as a foundation for production systems.

1. Why Build an Agent from Scratch

Frameworks abstract away the agent loop, which is a problem when you need to debug, optimize, or explain what your agent is doing.

Understanding the Internals

Every agent framework — LangChain, LangGraph, CrewAI, OpenAI Agents SDK — implements the same core pattern: a loop that sends messages to an LLM, checks for tool calls, executes those tools, and repeats until the model produces a final answer. When you use a framework without understanding this loop, you cannot debug why your agent is stuck in an infinite cycle, why it is calling the wrong tool, or why it is ignoring context from three turns ago.

Building from scratch forces you to confront every decision the framework normally hides: how tool definitions are structured, how tool results are formatted, how conversation history accumulates, and how the loop terminates. This understanding transfers directly to any framework you use later — because every framework is a wrapper around the same loop.

What This Tutorial Covers

You will build a complete agent in four incremental steps:

A basic ReAct loop that reasons and acts using OpenAI tool calling
A tool registry with three practical tools (web search, calculator, file reader)
Conversation memory that persists across turns
Error recovery with retry logic and maximum iteration limits

Every code block is copy-paste ready. The complete agent is under 200 lines of Python.

2. What You’ll Build

The agent you build in this tutorial handles multi-step tasks that require reasoning, tool use, and iterative refinement — the same capabilities that production agents need.

Agent Capabilities

Tool calling — The agent receives a set of tool definitions and decides when to call them. It parses the LLM’s structured tool call response, executes the corresponding Python function, and feeds the result back into the conversation.

Conversation memory — Every message (user, assistant, tool result) is stored in a message history list. The agent passes this full history on every LLM call, giving it context about everything that has happened in the session.

Planning loop (ReAct) — The agent does not just answer in one shot. It loops: reason about the current state, select a tool, execute it, observe the result, and decide whether to continue or return a final answer. This loop is what makes it an agent rather than a single LLM call.

Error recovery — Tool calls can fail. The LLM can produce malformed arguments. The agent catches these errors, reports them back to the LLM as tool results, and lets the model retry or adjust its approach. A maximum iteration limit prevents infinite loops.

Prerequisites

Python 3.10+
An OpenAI API key (set as OPENAI_API_KEY environment variable)
The openai package: pip install openai

3. Agent Architecture

The agent follows a standard ReAct (Reasoning + Acting) loop. The LLM reasons about what to do, selects a tool, the agent executes it, and the result feeds back into the next reasoning step.

AI Agent ReAct Loop

The core execution cycle: reason, act, observe, repeat

Input

User request

User Message

System Prompt

Tool Definitions

Message History

Planning Loop

ReAct cycle

LLM Reasoning

Tool Selection

Tool Execution

Result Observation

Output

Final response

Answer Ready?

Loop Back or Finish

Format Response

Return to User

Idle

The key insight: the LLM itself decides when to use tools and when to stop. When finish_reason is "tool_calls", the agent executes tools and loops. When finish_reason is "stop", the agent returns the final answer. This is the entire control flow.

4. Build the Agent Step by Step

This is the core of the tutorial. Each step builds on the previous one, and every code block runs as-is.

Step 1: Basic ReAct Loop

The minimal agent is a while loop that calls the OpenAI API, checks for tool calls, and either executes them or returns the response.

import json
from openai import OpenAI

client = OpenAI()  # Uses OPENAI_API_KEY env var

def run_agent(user_message: str, tools: list, tool_functions: dict,
              system_prompt: str = "You are a helpful assistant.",
              max_iterations: int = 10) -> str:
    """Run a ReAct agent loop until the model produces a final answer."""
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_message},
    ]

    for iteration in range(max_iterations):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
        )

        choice = response.choices[0]
        assistant_message = choice.message

        # Append the assistant's response to history
        messages.append(assistant_message)

        # If no tool calls, the model is done — return the answer
        if choice.finish_reason == "stop":
            return assistant_message.content

        # Process each tool call
        if assistant_message.tool_calls:
            for tool_call in assistant_message.tool_calls:
                func_name = tool_call.function.name
                func_args = json.loads(tool_call.function.arguments)

                # Execute the tool
                result = tool_functions[func_name](**func_args)

                # Append the tool result to history
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": json.dumps(result),
                })

    return "Agent reached maximum iterations without a final answer."

This is the complete agent loop. Everything else builds on this function.

Step 2: Add Tool Definitions

Tools are defined as JSON objects that the OpenAI API understands. Each tool has a name, description, and parameter schema. The description is critical — it tells the LLM when to use the tool.

import math

# --- Tool implementations ---

def web_search(query: str) -> dict:
    """Simulate a web search. Replace with a real API (SerpAPI, Tavily, etc.)."""
    # In production, call an actual search API here
    return {
        "results": [
            {"title": f"Result for: {query}", "snippet": f"Information about {query}..."},
        ],
        "source": "web_search"
    }

def calculator(expression: str) -> dict:
    """Evaluate a mathematical expression safely."""
    allowed_names = {"abs": abs, "round": round, "min": min, "max": max,
                     "pow": pow, "sqrt": math.sqrt, "pi": math.pi, "e": math.e}
    try:
        result = eval(expression, {"__builtins__": {}}, allowed_names)
        return {"result": result, "expression": expression}
    except Exception as e:
        return {"error": str(e), "expression": expression}

def read_file(filepath: str) -> dict:
    """Read the contents of a text file."""
    try:
        with open(filepath, "r") as f:
            content = f.read(10000)  # Limit to 10K chars
        return {"content": content, "filepath": filepath}
    except FileNotFoundError:
        return {"error": f"File not found: {filepath}"}
    except Exception as e:
        return {"error": str(e)}

# --- Tool definitions for the OpenAI API ---

tools = [
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": "Search the web for current information. Use when the user asks about recent events, facts you are unsure about, or anything requiring up-to-date data.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query string"
                    }
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "calculator",
            "description": "Evaluate a mathematical expression. Use for arithmetic, unit conversions, or any calculation. Pass a Python math expression as a string.",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "A Python math expression, e.g. '(25 * 4) + 10' or 'sqrt(144)'"
                    }
                },
                "required": ["expression"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read the contents of a local text file. Use when the user asks about a specific file or wants to analyze file contents.",
            "parameters": {
                "type": "object",
                "properties": {
                    "filepath": {
                        "type": "string",
                        "description": "Path to the file to read"
                    }
                },
                "required": ["filepath"]
            }
        }
    },
]

# Map function names to actual Python functions
tool_functions = {
    "web_search": web_search,
    "calculator": calculator,
    "read_file": read_file,
}

Step 3: Add Conversation Memory

The agent already has within-session memory (the messages list). To support multi-turn conversations, extract the message history so it persists across calls.

class Agent:
    """AI agent with persistent conversation memory and tool execution."""

    def __init__(self, system_prompt: str, tools: list, tool_functions: dict,
                 model: str = "gpt-4o", max_iterations: int = 10,
                 memory_limit: int = 50):
        self.client = OpenAI()
        self.model = model
        self.tools = tools
        self.tool_functions = tool_functions
        self.max_iterations = max_iterations
        self.memory_limit = memory_limit
        self.messages = [{"role": "system", "content": system_prompt}]

    def _trim_memory(self):
        """Keep the system prompt and the most recent messages."""
        if len(self.messages) > self.memory_limit:
            system_msg = self.messages[0]
            recent = self.messages[-(self.memory_limit - 1):]
            self.messages = [system_msg] + recent

    def run(self, user_message: str) -> str:
        """Process a user message through the ReAct loop."""
        self.messages.append({"role": "user", "content": user_message})
        self._trim_memory()

        for iteration in range(self.max_iterations):
            response = self.client.chat.completions.create(
                model=self.model,
                messages=self.messages,
                tools=self.tools,
            )

            choice = response.choices[0]
            assistant_message = choice.message
            self.messages.append(assistant_message)

            if choice.finish_reason == "stop":
                return assistant_message.content

            if assistant_message.tool_calls:
                for tool_call in assistant_message.tool_calls:
                    result = self._execute_tool(tool_call)
                    self.messages.append({
                        "role": "tool",
                        "tool_call_id": tool_call.id,
                        "content": json.dumps(result),
                    })

        return "Reached maximum iterations. Partial results may be in the conversation history."

    def _execute_tool(self, tool_call) -> dict:
        """Execute a single tool call and return the result."""
        func_name = tool_call.function.name
        func_args = json.loads(tool_call.function.arguments)
        return self.tool_functions[func_name](**func_args)

Now the agent remembers previous turns. Ask it a question, get an answer, then ask a follow-up — it has the full context.

agent = Agent(
    system_prompt="You are a research assistant with access to web search, a calculator, and file reading. Break complex tasks into steps. Use tools when you need external information or computation.",
    tools=tools,
    tool_functions=tool_functions,
)

# Multi-turn conversation
print(agent.run("What is the population of Tokyo?"))
print(agent.run("How does that compare to New York City?"))  # Remembers Tokyo
print(agent.run("What is the ratio of the two?"))  # Uses calculator

Step 4: Add Error Recovery and Retry

Production agents need to handle failures at every tool boundary. The LLM can produce malformed JSON arguments. Tools can throw exceptions. Network calls can time out.

class ProductionAgent(Agent):
    """Agent with error recovery, retry logic, and execution tracking."""

    def __init__(self, *args, max_retries: int = 2, **kwargs):
        super().__init__(*args, **kwargs)
        self.max_retries = max_retries
        self.execution_log = []

    def _execute_tool(self, tool_call) -> dict:
        """Execute a tool with error handling and retry logic."""
        func_name = tool_call.function.name

        # Check if the tool exists
        if func_name not in self.tool_functions:
            error_msg = f"Unknown tool: {func_name}. Available: {list(self.tool_functions.keys())}"
            self._log("error", func_name, error_msg)
            return {"error": error_msg}

        # Parse arguments with error handling
        try:
            func_args = json.loads(tool_call.function.arguments)
        except json.JSONDecodeError as e:
            error_msg = f"Invalid JSON arguments: {e}"
            self._log("error", func_name, error_msg)
            return {"error": error_msg}

        # Execute with retry
        for attempt in range(self.max_retries + 1):
            try:
                result = self.tool_functions[func_name](**func_args)
                self._log("success", func_name, result, attempt=attempt)
                return result
            except TypeError as e:
                error_msg = f"Wrong arguments for {func_name}: {e}"
                self._log("error", func_name, error_msg, attempt=attempt)
                return {"error": error_msg}
            except Exception as e:
                if attempt < self.max_retries:
                    self._log("retry", func_name, str(e), attempt=attempt)
                    continue
                error_msg = f"Tool {func_name} failed after {self.max_retries + 1} attempts: {e}"
                self._log("error", func_name, error_msg, attempt=attempt)
                return {"error": error_msg}

        return {"error": "Unexpected execution path"}

    def _log(self, status: str, tool: str, detail, attempt: int = 0):
        """Record tool execution for debugging and observability."""
        self.execution_log.append({
            "status": status,
            "tool": tool,
            "detail": str(detail)[:200],
            "attempt": attempt,
        })

    def get_execution_summary(self) -> dict:
        """Return a summary of all tool executions in this session."""
        total = len(self.execution_log)
        successes = sum(1 for e in self.execution_log if e["status"] == "success")
        errors = sum(1 for e in self.execution_log if e["status"] == "error")
        retries = sum(1 for e in self.execution_log if e["status"] == "retry")
        return {"total": total, "successes": successes, "errors": errors, "retries": retries}

This is the complete agent. Under 200 lines total, it handles tool calling, memory, error recovery, and execution tracking — the same core capabilities that frameworks provide.

# Full usage example
agent = ProductionAgent(
    system_prompt="You are a research assistant. Use tools to answer questions accurately. If a tool fails, explain what went wrong and try an alternative approach.",
    tools=tools,
    tool_functions=tool_functions,
    max_iterations=10,
    max_retries=2,
    memory_limit=50,
)

answer = agent.run("What is 15% of 2,847, and is that more or less than 400?")
print(answer)
print(agent.get_execution_summary())

5. Agent Component Stack

Every agent — whether built from scratch or with a framework — contains the same six layers. The code above implements each one.

AI Agent Component Stack

Six layers present in every agent implementation, from scratch or framework

User Interface

CLI, API endpoint, or chat UI — where messages enter and exit the agent

Agent Controller (ReAct Loop)

The while loop that orchestrates LLM calls, tool execution, and termination — the run() method

Tool Registry

JSON tool definitions + Python function map — tools list and tool_functions dict

Memory Store

Conversation history list with sliding window — the messages array with trim logic

LLM Provider

OpenAI client.chat.completions.create() — any provider with tool calling support

Error Handler

Try/except per tool call, retry logic, execution logging — the _execute_tool method

Idle

When you switch to a framework like LangGraph, these same six layers exist — they are just spread across the framework’s abstractions. The controller becomes a StateGraph. The tool registry becomes ToolNode. Memory becomes a checkpointer. Understanding the raw layers makes the framework patterns immediately recognizable.

6. Agent Enhancement Examples

Once the base agent works, extending it is straightforward because you control every layer.

Adding a New Tool

To add a tool, write a Python function, create a JSON definition, and register it. The agent picks it up automatically.

import datetime

def get_current_time(timezone: str = "UTC") -> dict:
    """Get the current date and time."""
    now = datetime.datetime.now(datetime.timezone.utc)
    return {"datetime": now.isoformat(), "timezone": timezone}

# Add the tool definition
tools.append({
    "type": "function",
    "function": {
        "name": "get_current_time",
        "description": "Get the current date and time. Use when the user asks about the current time or needs a timestamp.",
        "parameters": {
            "type": "object",
            "properties": {
                "timezone": {
                    "type": "string",
                    "description": "Timezone name (default: UTC)"
                }
            },
            "required": []
        }
    }
})

# Register the function
tool_functions["get_current_time"] = get_current_time

That is the entire process. No framework configuration, no chain rebuilding, no graph recompilation.

Implementing Multi-Step Planning

For complex tasks, add a planning step before the ReAct loop. The model creates an explicit plan, then executes it step by step.

def run_with_planning(self, user_message: str) -> str:
    """Plan first, then execute. Useful for complex multi-step tasks."""
    # Step 1: Ask the LLM to create a plan
    planning_prompt = f"""Break this task into numbered steps.
    For each step, note which tool (if any) you will need.
    Task: {user_message}"""

    self.messages.append({"role": "user", "content": planning_prompt})

    plan_response = self.client.chat.completions.create(
        model=self.model,
        messages=self.messages,
    )
    plan = plan_response.choices[0].message.content
    self.messages.append(plan_response.choices[0].message)

    # Step 2: Execute the plan using the standard ReAct loop
    execution_prompt = f"Now execute the plan you created above. Use tools as needed for each step. Original task: {user_message}"
    return self.run(execution_prompt)

Adding Streaming Output

For real-time output, switch from create() to create(stream=True) and yield tokens as they arrive.

def run_streaming(self, user_message: str):
    """Stream the final response token by token."""
    self.messages.append({"role": "user", "content": user_message})

    # Run tool calls in non-streaming mode (tool calls need full response)
    for iteration in range(self.max_iterations):
        response = self.client.chat.completions.create(
            model=self.model,
            messages=self.messages,
            tools=self.tools,
        )

        choice = response.choices[0]
        self.messages.append(choice.message)

        if choice.finish_reason == "stop":
            break  # Final answer ready — now stream it

        if choice.message.tool_calls:
            for tool_call in choice.message.tool_calls:
                result = self._execute_tool(tool_call)
                self.messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": json.dumps(result),
                })
            continue

    # Stream the final response
    stream = self.client.chat.completions.create(
        model=self.model,
        messages=self.messages,
        stream=True,
    )
    for chunk in stream:
        if chunk.choices[0].delta.content:
            yield chunk.choices[0].delta.content

7. Framework vs From-Scratch

Building from scratch and using a framework solve different problems. The right choice depends on your stage and requirements.

Framework vs From-Scratch Agent

From Scratch (Pure Python)

Full control, minimal dependencies, educational

Complete visibility into every LLM call and tool execution
No dependency lock-in — swap providers or models freely
Easier to debug because you wrote every line
Under 200 lines for a fully functional agent
Multi-agent orchestration requires significant custom code
No built-in checkpointing, persistence, or human-in-the-loop

Framework (LangGraph, CrewAI)

Batteries-included, production patterns, team velocity

Built-in state management and checkpointing for long-running agents
Multi-agent orchestration with supervisor patterns out of the box
Human-in-the-loop breakpoints and approval workflows included
Community ecosystem of pre-built tools and integrations
Abstraction layers make debugging harder when things fail
Framework updates can break existing agent implementations

Verdict: Start from scratch to learn agent internals and for simple single-agent use cases. Adopt a framework when you need multi-agent coordination, persistent state, or human-in-the-loop workflows.

Use From Scratch (Pure Python) when…

Learning, prototypes, single-agent tools, portfolio projects, custom pipelines

Use Framework (LangGraph, CrewAI) when…

Production multi-agent systems, stateful workflows, team-based development

The most effective path: build from scratch first to understand every layer, then adopt a framework when the complexity of your use case justifies it. Engineers who understand the raw loop can diagnose framework issues that stump engineers who only know the abstraction.

8. Interview Questions

These questions test whether you understand agent internals — not just framework APIs.

Q: How does the OpenAI tool calling API work at the protocol level?

You define tools as JSON objects with type: "function", each containing a name, description, and parameters schema following JSON Schema format. You pass the tools list to client.chat.completions.create(). When the model decides to use a tool, the response has finish_reason: "tool_calls" and message.tool_calls contains an array of tool call objects. Each has an id, function.name, and function.arguments (a JSON string). You parse the arguments with json.loads(), execute the function, and send the result back as a message with role: "tool" and the matching tool_call_id. The model uses this result in its next reasoning step.

Q: What happens if a tool call fails? How should the agent handle it?

Never crash the agent loop on a tool failure. Catch the exception, format a clear error message, and return it as the tool result — the LLM will see the error and can adjust its approach. Implement retry logic for transient failures (network timeouts, rate limits) with a maximum retry count. For permanent failures (unknown tool name, invalid arguments), return the error immediately so the model does not waste iterations retrying the same broken call. Always log every tool execution with status, arguments, and result for post-hoc debugging.

Q: How do you prevent an agent from looping forever?

Three safeguards: a maximum iteration limit (typically 10-20) that hard-stops the loop after N cycles, a token budget that tracks cumulative token usage across all LLM calls and terminates when a threshold is exceeded, and a time limit that caps wall-clock execution time. The iteration limit is the simplest and most important — without it, a confused agent can generate unlimited API charges. When any limit is hit, return the best partial answer available rather than an empty failure.

Q: How would you implement memory for an agent that needs to remember across sessions?

Within a session, memory is the conversation history list. Across sessions, persist the conversation to storage. The simplest approach: at the end of each session, generate a summary of key facts and decisions by asking the LLM to compress the conversation history. Store this summary (in a file, database, or vector store). At the start of the next session, load the summary and inject it into the system prompt or as an initial user message. For more sophisticated retrieval, embed past conversation summaries and use vector similarity search to load only the most relevant past context.

9. Taking Your Agent to Production

The from-scratch agent works for prototypes and learning. Production systems add three concerns: observability, testing, and cost control.

Observability

Log every iteration of the ReAct loop with: the iteration number, the full assistant message (including any reasoning), each tool call with its arguments and result, token usage per call, and wall-clock time. The execution_log in our ProductionAgent class is a starting point — production systems feed these logs into tools like LangSmith, Arize, or Helicone for trace visualization.

Testing Agents

Agent testing requires evaluation of complete trajectories, not individual function outputs:

Unit test each tool independently with known inputs and expected outputs
Test the ReAct loop with mock tool functions that return predefined results — verify the agent makes correct tool choices for given queries
Trajectory evaluation — for a set of test queries, verify the agent calls the right tools in the right order and produces correct final answers
Boundary testing — verify the agent handles max iterations gracefully, malformed tool responses, and unknown tool names

Cost Control

Every iteration costs tokens. Track cumulative usage.total_tokens across all calls within a single run() invocation. Set a per-run budget (e.g., 50,000 tokens) and terminate the loop when it is exceeded. Log the token cost per run so you can detect regressions — if average cost per query doubles after a prompt change, investigate before it reaches production traffic.

When to Switch to a Framework

Switch when you need capabilities that would be complex to build from scratch:

Persistent state — your agent needs to pause and resume across server restarts (LangGraph checkpointers)
Human-in-the-loop — certain tool calls need human approval before execution (LangGraph breakpoints)
Multi-agent orchestration — your task requires multiple specialized agents coordinating through a supervisor (LangGraph, CrewAI, OpenAI Agents SDK)
Complex graph routing — conditional branching, parallel execution, and merge nodes that would be error-prone to implement as if/else statements in a while loop

10. Summary

You built a complete AI agent in Python from scratch: a ReAct loop with tool calling via the OpenAI API, conversation memory with sliding window, error recovery with retry logic, and execution tracking for observability. The entire implementation is under 200 lines.

The core insight: every agent framework wraps the same loop you built here. Understanding this loop is the difference between using a framework effectively and being stuck when the framework’s abstractions break down.

AI Agents Guide — Conceptual deep dive into agent architecture, memory systems, and multi-agent orchestration patterns
Agentic Design Patterns — ReAct, Plan-and-Execute, Reflection, and supervisor patterns with trade-offs
Agentic Frameworks Comparison — LangGraph vs CrewAI vs AutoGen vs OpenAI Agents SDK with architecture diagrams
Python for GenAI — Python fundamentals for GenAI engineering including async, type hints, and SDK patterns

Frequently Asked Questions

Can you build an AI agent in Python without a framework?

Yes. An AI agent is a loop: send messages and tool definitions to an LLM, check if the response contains tool calls, execute those tools, append results, and repeat until the model returns a final text answer. The OpenAI Python SDK provides everything you need — client.chat.completions.create() with the tools parameter handles tool calling natively. Frameworks like LangChain add abstractions on top of this loop, but they are not required.

What is the ReAct loop in an AI agent?

ReAct (Reasoning + Acting) is the core loop that drives most AI agents. The agent reasons about the current state, selects and executes a tool (action), observes the result, and repeats until the task is complete. In code, this is a while loop that calls the LLM, checks for tool_calls in the response, executes the matching function, appends the tool result as a message with role 'tool', and calls the LLM again.

How does tool calling work with the OpenAI API?

You define tools as a list of JSON objects with type 'function', each containing a name, description, and parameters schema. Pass this list in the tools parameter of client.chat.completions.create(). When the model decides to use a tool, the response has finish_reason 'tool_calls' and the message contains a tool_calls array. Each tool call has an id, function name, and JSON string of arguments. You parse the arguments, execute the function, and send the result back as a message with role 'tool' and the matching tool_call_id.

How do you add memory to a Python AI agent?

The simplest memory is the conversation history — a list of message dictionaries that you pass to every LLM call. For short conversations this works directly. For longer sessions, implement a sliding window that keeps the system prompt and the most recent N messages, or summarize older messages into a condensed form. For cross-session memory, store conversation summaries in a file or database and load relevant context at the start of each new session.

When should you use a framework instead of building from scratch?

Build from scratch when you need to understand agent internals, when your use case is simple enough that a framework adds unnecessary complexity, or when you need full control over the execution loop. Switch to a framework like LangGraph when you need stateful multi-agent orchestration, persistent checkpointing, human-in-the-loop approval workflows, or when your agent graph has complex branching and conditional routing that would be error-prone to implement manually.

What are the most common failure modes when building AI agents?

The most common failures are infinite loops (the agent keeps calling tools without converging on an answer), malformed tool arguments (the LLM generates invalid JSON), unhandled tool errors (a tool throws an exception that crashes the loop), context window exhaustion (conversation history exceeds the model's token limit), and cost overruns (unbounded loops generate excessive API charges). Every production agent needs maximum iteration limits, try/except around tool execution, and token budget tracking.