Build an AI Agent in Python — From Scratch, No Framework (2026)
This tutorial walks you through building a working AI agent in Python from scratch — no LangChain, no CrewAI, no framework. You will implement tool calling, conversation memory, a ReAct planning loop, and error recovery using only the OpenAI Python SDK. By the end, you will have a fully functional agent you can run, extend, and use as a foundation for production systems.
1. Why Build an Agent from Scratch
Section titled “1. Why Build an Agent from Scratch”Frameworks abstract away the agent loop, which is a problem when you need to debug, optimize, or explain what your agent is doing.
Understanding the Internals
Section titled “Understanding the Internals”Every agent framework — LangChain, LangGraph, CrewAI, OpenAI Agents SDK — implements the same core pattern: a loop that sends messages to an LLM, checks for tool calls, executes those tools, and repeats until the model produces a final answer. When you use a framework without understanding this loop, you cannot debug why your agent is stuck in an infinite cycle, why it is calling the wrong tool, or why it is ignoring context from three turns ago.
Building from scratch forces you to confront every decision the framework normally hides: how tool definitions are structured, how tool results are formatted, how conversation history accumulates, and how the loop terminates. This understanding transfers directly to any framework you use later — because every framework is a wrapper around the same loop.
What This Tutorial Covers
Section titled “What This Tutorial Covers”You will build a complete agent in four incremental steps:
- A basic ReAct loop that reasons and acts using OpenAI tool calling
- A tool registry with three practical tools (web search, calculator, file reader)
- Conversation memory that persists across turns
- Error recovery with retry logic and maximum iteration limits
Every code block is copy-paste ready. The complete agent is under 200 lines of Python.
2. What You’ll Build
Section titled “2. What You’ll Build”The agent you build in this tutorial handles multi-step tasks that require reasoning, tool use, and iterative refinement — the same capabilities that production agents need.
Agent Capabilities
Section titled “Agent Capabilities”Tool calling — The agent receives a set of tool definitions and decides when to call them. It parses the LLM’s structured tool call response, executes the corresponding Python function, and feeds the result back into the conversation.
Conversation memory — Every message (user, assistant, tool result) is stored in a message history list. The agent passes this full history on every LLM call, giving it context about everything that has happened in the session.
Planning loop (ReAct) — The agent does not just answer in one shot. It loops: reason about the current state, select a tool, execute it, observe the result, and decide whether to continue or return a final answer. This loop is what makes it an agent rather than a single LLM call.
Error recovery — Tool calls can fail. The LLM can produce malformed arguments. The agent catches these errors, reports them back to the LLM as tool results, and lets the model retry or adjust its approach. A maximum iteration limit prevents infinite loops.
Prerequisites
Section titled “Prerequisites”- Python 3.10+
- An OpenAI API key (set as
OPENAI_API_KEYenvironment variable) - The
openaipackage:pip install openai
3. Agent Architecture
Section titled “3. Agent Architecture”The agent follows a standard ReAct (Reasoning + Acting) loop. The LLM reasons about what to do, selects a tool, the agent executes it, and the result feeds back into the next reasoning step.
AI Agent ReAct Loop
The core execution cycle: reason, act, observe, repeat
The key insight: the LLM itself decides when to use tools and when to stop. When finish_reason is "tool_calls", the agent executes tools and loops. When finish_reason is "stop", the agent returns the final answer. This is the entire control flow.
4. Build the Agent Step by Step
Section titled “4. Build the Agent Step by Step”This is the core of the tutorial. Each step builds on the previous one, and every code block runs as-is.
Step 1: Basic ReAct Loop
Section titled “Step 1: Basic ReAct Loop”The minimal agent is a while loop that calls the OpenAI API, checks for tool calls, and either executes them or returns the response.
import jsonfrom openai import OpenAI
client = OpenAI() # Uses OPENAI_API_KEY env var
def run_agent(user_message: str, tools: list, tool_functions: dict, system_prompt: str = "You are a helpful assistant.", max_iterations: int = 10) -> str: """Run a ReAct agent loop until the model produces a final answer.""" messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_message}, ]
for iteration in range(max_iterations): response = client.chat.completions.create( model="gpt-4o", messages=messages, tools=tools, )
choice = response.choices[0] assistant_message = choice.message
# Append the assistant's response to history messages.append(assistant_message)
# If no tool calls, the model is done — return the answer if choice.finish_reason == "stop": return assistant_message.content
# Process each tool call if assistant_message.tool_calls: for tool_call in assistant_message.tool_calls: func_name = tool_call.function.name func_args = json.loads(tool_call.function.arguments)
# Execute the tool result = tool_functions[func_name](**func_args)
# Append the tool result to history messages.append({ "role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(result), })
return "Agent reached maximum iterations without a final answer."This is the complete agent loop. Everything else builds on this function.
Step 2: Add Tool Definitions
Section titled “Step 2: Add Tool Definitions”Tools are defined as JSON objects that the OpenAI API understands. Each tool has a name, description, and parameter schema. The description is critical — it tells the LLM when to use the tool.
import math
# --- Tool implementations ---
def web_search(query: str) -> dict: """Simulate a web search. Replace with a real API (SerpAPI, Tavily, etc.).""" # In production, call an actual search API here return { "results": [ {"title": f"Result for: {query}", "snippet": f"Information about {query}..."}, ], "source": "web_search" }
def calculator(expression: str) -> dict: """Evaluate a mathematical expression safely.""" allowed_names = {"abs": abs, "round": round, "min": min, "max": max, "pow": pow, "sqrt": math.sqrt, "pi": math.pi, "e": math.e} try: result = eval(expression, {"__builtins__": {}}, allowed_names) return {"result": result, "expression": expression} except Exception as e: return {"error": str(e), "expression": expression}
def read_file(filepath: str) -> dict: """Read the contents of a text file.""" try: with open(filepath, "r") as f: content = f.read(10000) # Limit to 10K chars return {"content": content, "filepath": filepath} except FileNotFoundError: return {"error": f"File not found: {filepath}"} except Exception as e: return {"error": str(e)}
# --- Tool definitions for the OpenAI API ---
tools = [ { "type": "function", "function": { "name": "web_search", "description": "Search the web for current information. Use when the user asks about recent events, facts you are unsure about, or anything requiring up-to-date data.", "parameters": { "type": "object", "properties": { "query": { "type": "string", "description": "The search query string" } }, "required": ["query"] } } }, { "type": "function", "function": { "name": "calculator", "description": "Evaluate a mathematical expression. Use for arithmetic, unit conversions, or any calculation. Pass a Python math expression as a string.", "parameters": { "type": "object", "properties": { "expression": { "type": "string", "description": "A Python math expression, e.g. '(25 * 4) + 10' or 'sqrt(144)'" } }, "required": ["expression"] } } }, { "type": "function", "function": { "name": "read_file", "description": "Read the contents of a local text file. Use when the user asks about a specific file or wants to analyze file contents.", "parameters": { "type": "object", "properties": { "filepath": { "type": "string", "description": "Path to the file to read" } }, "required": ["filepath"] } } },]
# Map function names to actual Python functionstool_functions = { "web_search": web_search, "calculator": calculator, "read_file": read_file,}Step 3: Add Conversation Memory
Section titled “Step 3: Add Conversation Memory”The agent already has within-session memory (the messages list). To support multi-turn conversations, extract the message history so it persists across calls.
class Agent: """AI agent with persistent conversation memory and tool execution."""
def __init__(self, system_prompt: str, tools: list, tool_functions: dict, model: str = "gpt-4o", max_iterations: int = 10, memory_limit: int = 50): self.client = OpenAI() self.model = model self.tools = tools self.tool_functions = tool_functions self.max_iterations = max_iterations self.memory_limit = memory_limit self.messages = [{"role": "system", "content": system_prompt}]
def _trim_memory(self): """Keep the system prompt and the most recent messages.""" if len(self.messages) > self.memory_limit: system_msg = self.messages[0] recent = self.messages[-(self.memory_limit - 1):] self.messages = [system_msg] + recent
def run(self, user_message: str) -> str: """Process a user message through the ReAct loop.""" self.messages.append({"role": "user", "content": user_message}) self._trim_memory()
for iteration in range(self.max_iterations): response = self.client.chat.completions.create( model=self.model, messages=self.messages, tools=self.tools, )
choice = response.choices[0] assistant_message = choice.message self.messages.append(assistant_message)
if choice.finish_reason == "stop": return assistant_message.content
if assistant_message.tool_calls: for tool_call in assistant_message.tool_calls: result = self._execute_tool(tool_call) self.messages.append({ "role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(result), })
return "Reached maximum iterations. Partial results may be in the conversation history."
def _execute_tool(self, tool_call) -> dict: """Execute a single tool call and return the result.""" func_name = tool_call.function.name func_args = json.loads(tool_call.function.arguments) return self.tool_functions[func_name](**func_args)Now the agent remembers previous turns. Ask it a question, get an answer, then ask a follow-up — it has the full context.
agent = Agent( system_prompt="You are a research assistant with access to web search, a calculator, and file reading. Break complex tasks into steps. Use tools when you need external information or computation.", tools=tools, tool_functions=tool_functions,)
# Multi-turn conversationprint(agent.run("What is the population of Tokyo?"))print(agent.run("How does that compare to New York City?")) # Remembers Tokyoprint(agent.run("What is the ratio of the two?")) # Uses calculatorStep 4: Add Error Recovery and Retry
Section titled “Step 4: Add Error Recovery and Retry”Production agents need to handle failures at every tool boundary. The LLM can produce malformed JSON arguments. Tools can throw exceptions. Network calls can time out.
class ProductionAgent(Agent): """Agent with error recovery, retry logic, and execution tracking."""
def __init__(self, *args, max_retries: int = 2, **kwargs): super().__init__(*args, **kwargs) self.max_retries = max_retries self.execution_log = []
def _execute_tool(self, tool_call) -> dict: """Execute a tool with error handling and retry logic.""" func_name = tool_call.function.name
# Check if the tool exists if func_name not in self.tool_functions: error_msg = f"Unknown tool: {func_name}. Available: {list(self.tool_functions.keys())}" self._log("error", func_name, error_msg) return {"error": error_msg}
# Parse arguments with error handling try: func_args = json.loads(tool_call.function.arguments) except json.JSONDecodeError as e: error_msg = f"Invalid JSON arguments: {e}" self._log("error", func_name, error_msg) return {"error": error_msg}
# Execute with retry for attempt in range(self.max_retries + 1): try: result = self.tool_functions[func_name](**func_args) self._log("success", func_name, result, attempt=attempt) return result except TypeError as e: error_msg = f"Wrong arguments for {func_name}: {e}" self._log("error", func_name, error_msg, attempt=attempt) return {"error": error_msg} except Exception as e: if attempt < self.max_retries: self._log("retry", func_name, str(e), attempt=attempt) continue error_msg = f"Tool {func_name} failed after {self.max_retries + 1} attempts: {e}" self._log("error", func_name, error_msg, attempt=attempt) return {"error": error_msg}
return {"error": "Unexpected execution path"}
def _log(self, status: str, tool: str, detail, attempt: int = 0): """Record tool execution for debugging and observability.""" self.execution_log.append({ "status": status, "tool": tool, "detail": str(detail)[:200], "attempt": attempt, })
def get_execution_summary(self) -> dict: """Return a summary of all tool executions in this session.""" total = len(self.execution_log) successes = sum(1 for e in self.execution_log if e["status"] == "success") errors = sum(1 for e in self.execution_log if e["status"] == "error") retries = sum(1 for e in self.execution_log if e["status"] == "retry") return {"total": total, "successes": successes, "errors": errors, "retries": retries}This is the complete agent. Under 200 lines total, it handles tool calling, memory, error recovery, and execution tracking — the same core capabilities that frameworks provide.
# Full usage exampleagent = ProductionAgent( system_prompt="You are a research assistant. Use tools to answer questions accurately. If a tool fails, explain what went wrong and try an alternative approach.", tools=tools, tool_functions=tool_functions, max_iterations=10, max_retries=2, memory_limit=50,)
answer = agent.run("What is 15% of 2,847, and is that more or less than 400?")print(answer)print(agent.get_execution_summary())5. Agent Component Stack
Section titled “5. Agent Component Stack”Every agent — whether built from scratch or with a framework — contains the same six layers. The code above implements each one.
AI Agent Component Stack
Six layers present in every agent implementation, from scratch or framework
When you switch to a framework like LangGraph, these same six layers exist — they are just spread across the framework’s abstractions. The controller becomes a StateGraph. The tool registry becomes ToolNode. Memory becomes a checkpointer. Understanding the raw layers makes the framework patterns immediately recognizable.
6. Agent Enhancement Examples
Section titled “6. Agent Enhancement Examples”Once the base agent works, extending it is straightforward because you control every layer.
Adding a New Tool
Section titled “Adding a New Tool”To add a tool, write a Python function, create a JSON definition, and register it. The agent picks it up automatically.
import datetime
def get_current_time(timezone: str = "UTC") -> dict: """Get the current date and time.""" now = datetime.datetime.now(datetime.timezone.utc) return {"datetime": now.isoformat(), "timezone": timezone}
# Add the tool definitiontools.append({ "type": "function", "function": { "name": "get_current_time", "description": "Get the current date and time. Use when the user asks about the current time or needs a timestamp.", "parameters": { "type": "object", "properties": { "timezone": { "type": "string", "description": "Timezone name (default: UTC)" } }, "required": [] } }})
# Register the functiontool_functions["get_current_time"] = get_current_timeThat is the entire process. No framework configuration, no chain rebuilding, no graph recompilation.
Implementing Multi-Step Planning
Section titled “Implementing Multi-Step Planning”For complex tasks, add a planning step before the ReAct loop. The model creates an explicit plan, then executes it step by step.
def run_with_planning(self, user_message: str) -> str: """Plan first, then execute. Useful for complex multi-step tasks.""" # Step 1: Ask the LLM to create a plan planning_prompt = f"""Break this task into numbered steps. For each step, note which tool (if any) you will need. Task: {user_message}"""
self.messages.append({"role": "user", "content": planning_prompt})
plan_response = self.client.chat.completions.create( model=self.model, messages=self.messages, ) plan = plan_response.choices[0].message.content self.messages.append(plan_response.choices[0].message)
# Step 2: Execute the plan using the standard ReAct loop execution_prompt = f"Now execute the plan you created above. Use tools as needed for each step. Original task: {user_message}" return self.run(execution_prompt)Adding Streaming Output
Section titled “Adding Streaming Output”For real-time output, switch from create() to create(stream=True) and yield tokens as they arrive.
def run_streaming(self, user_message: str): """Stream the final response token by token.""" self.messages.append({"role": "user", "content": user_message})
# Run tool calls in non-streaming mode (tool calls need full response) for iteration in range(self.max_iterations): response = self.client.chat.completions.create( model=self.model, messages=self.messages, tools=self.tools, )
choice = response.choices[0] self.messages.append(choice.message)
if choice.finish_reason == "stop": break # Final answer ready — now stream it
if choice.message.tool_calls: for tool_call in choice.message.tool_calls: result = self._execute_tool(tool_call) self.messages.append({ "role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(result), }) continue
# Stream the final response stream = self.client.chat.completions.create( model=self.model, messages=self.messages, stream=True, ) for chunk in stream: if chunk.choices[0].delta.content: yield chunk.choices[0].delta.content7. Framework vs From-Scratch
Section titled “7. Framework vs From-Scratch”Building from scratch and using a framework solve different problems. The right choice depends on your stage and requirements.
Framework vs From-Scratch Agent
- Complete visibility into every LLM call and tool execution
- No dependency lock-in — swap providers or models freely
- Easier to debug because you wrote every line
- Under 200 lines for a fully functional agent
- Multi-agent orchestration requires significant custom code
- No built-in checkpointing, persistence, or human-in-the-loop
- Built-in state management and checkpointing for long-running agents
- Multi-agent orchestration with supervisor patterns out of the box
- Human-in-the-loop breakpoints and approval workflows included
- Community ecosystem of pre-built tools and integrations
- Abstraction layers make debugging harder when things fail
- Framework updates can break existing agent implementations
The most effective path: build from scratch first to understand every layer, then adopt a framework when the complexity of your use case justifies it. Engineers who understand the raw loop can diagnose framework issues that stump engineers who only know the abstraction.
8. Interview Questions
Section titled “8. Interview Questions”These questions test whether you understand agent internals — not just framework APIs.
Q: How does the OpenAI tool calling API work at the protocol level?
Section titled “Q: How does the OpenAI tool calling API work at the protocol level?”You define tools as JSON objects with type: "function", each containing a name, description, and parameters schema following JSON Schema format. You pass the tools list to client.chat.completions.create(). When the model decides to use a tool, the response has finish_reason: "tool_calls" and message.tool_calls contains an array of tool call objects. Each has an id, function.name, and function.arguments (a JSON string). You parse the arguments with json.loads(), execute the function, and send the result back as a message with role: "tool" and the matching tool_call_id. The model uses this result in its next reasoning step.
Q: What happens if a tool call fails? How should the agent handle it?
Section titled “Q: What happens if a tool call fails? How should the agent handle it?”Never crash the agent loop on a tool failure. Catch the exception, format a clear error message, and return it as the tool result — the LLM will see the error and can adjust its approach. Implement retry logic for transient failures (network timeouts, rate limits) with a maximum retry count. For permanent failures (unknown tool name, invalid arguments), return the error immediately so the model does not waste iterations retrying the same broken call. Always log every tool execution with status, arguments, and result for post-hoc debugging.
Q: How do you prevent an agent from looping forever?
Section titled “Q: How do you prevent an agent from looping forever?”Three safeguards: a maximum iteration limit (typically 10-20) that hard-stops the loop after N cycles, a token budget that tracks cumulative token usage across all LLM calls and terminates when a threshold is exceeded, and a time limit that caps wall-clock execution time. The iteration limit is the simplest and most important — without it, a confused agent can generate unlimited API charges. When any limit is hit, return the best partial answer available rather than an empty failure.
Q: How would you implement memory for an agent that needs to remember across sessions?
Section titled “Q: How would you implement memory for an agent that needs to remember across sessions?”Within a session, memory is the conversation history list. Across sessions, persist the conversation to storage. The simplest approach: at the end of each session, generate a summary of key facts and decisions by asking the LLM to compress the conversation history. Store this summary (in a file, database, or vector store). At the start of the next session, load the summary and inject it into the system prompt or as an initial user message. For more sophisticated retrieval, embed past conversation summaries and use vector similarity search to load only the most relevant past context.
9. Taking Your Agent to Production
Section titled “9. Taking Your Agent to Production”The from-scratch agent works for prototypes and learning. Production systems add three concerns: observability, testing, and cost control.
Observability
Section titled “Observability”Log every iteration of the ReAct loop with: the iteration number, the full assistant message (including any reasoning), each tool call with its arguments and result, token usage per call, and wall-clock time. The execution_log in our ProductionAgent class is a starting point — production systems feed these logs into tools like LangSmith, Arize, or Helicone for trace visualization.
Testing Agents
Section titled “Testing Agents”Agent testing requires evaluation of complete trajectories, not individual function outputs:
- Unit test each tool independently with known inputs and expected outputs
- Test the ReAct loop with mock tool functions that return predefined results — verify the agent makes correct tool choices for given queries
- Trajectory evaluation — for a set of test queries, verify the agent calls the right tools in the right order and produces correct final answers
- Boundary testing — verify the agent handles max iterations gracefully, malformed tool responses, and unknown tool names
Cost Control
Section titled “Cost Control”Every iteration costs tokens. Track cumulative usage.total_tokens across all calls within a single run() invocation. Set a per-run budget (e.g., 50,000 tokens) and terminate the loop when it is exceeded. Log the token cost per run so you can detect regressions — if average cost per query doubles after a prompt change, investigate before it reaches production traffic.
When to Switch to a Framework
Section titled “When to Switch to a Framework”Switch when you need capabilities that would be complex to build from scratch:
- Persistent state — your agent needs to pause and resume across server restarts (LangGraph checkpointers)
- Human-in-the-loop — certain tool calls need human approval before execution (LangGraph breakpoints)
- Multi-agent orchestration — your task requires multiple specialized agents coordinating through a supervisor (LangGraph, CrewAI, OpenAI Agents SDK)
- Complex graph routing — conditional branching, parallel execution, and merge nodes that would be error-prone to implement as if/else statements in a while loop
10. Summary
Section titled “10. Summary”You built a complete AI agent in Python from scratch: a ReAct loop with tool calling via the OpenAI API, conversation memory with sliding window, error recovery with retry logic, and execution tracking for observability. The entire implementation is under 200 lines.
The core insight: every agent framework wraps the same loop you built here. Understanding this loop is the difference between using a framework effectively and being stuck when the framework’s abstractions break down.
Related Guides
Section titled “Related Guides”- AI Agents Guide — Conceptual deep dive into agent architecture, memory systems, and multi-agent orchestration patterns
- Agentic Design Patterns — ReAct, Plan-and-Execute, Reflection, and supervisor patterns with trade-offs
- Agentic Frameworks Comparison — LangGraph vs CrewAI vs AutoGen vs OpenAI Agents SDK with architecture diagrams
- Python for GenAI — Python fundamentals for GenAI engineering including async, type hints, and SDK patterns
Frequently Asked Questions
Can you build an AI agent in Python without a framework?
Yes. An AI agent is a loop: send messages and tool definitions to an LLM, check if the response contains tool calls, execute those tools, append results, and repeat until the model returns a final text answer. The OpenAI Python SDK provides everything you need — client.chat.completions.create() with the tools parameter handles tool calling natively. Frameworks like LangChain add abstractions on top of this loop, but they are not required.
What is the ReAct loop in an AI agent?
ReAct (Reasoning + Acting) is the core loop that drives most AI agents. The agent reasons about the current state, selects and executes a tool (action), observes the result, and repeats until the task is complete. In code, this is a while loop that calls the LLM, checks for tool_calls in the response, executes the matching function, appends the tool result as a message with role 'tool', and calls the LLM again.
How does tool calling work with the OpenAI API?
You define tools as a list of JSON objects with type 'function', each containing a name, description, and parameters schema. Pass this list in the tools parameter of client.chat.completions.create(). When the model decides to use a tool, the response has finish_reason 'tool_calls' and the message contains a tool_calls array. Each tool call has an id, function name, and JSON string of arguments. You parse the arguments, execute the function, and send the result back as a message with role 'tool' and the matching tool_call_id.
How do you add memory to a Python AI agent?
The simplest memory is the conversation history — a list of message dictionaries that you pass to every LLM call. For short conversations this works directly. For longer sessions, implement a sliding window that keeps the system prompt and the most recent N messages, or summarize older messages into a condensed form. For cross-session memory, store conversation summaries in a file or database and load relevant context at the start of each new session.
When should you use a framework instead of building from scratch?
Build from scratch when you need to understand agent internals, when your use case is simple enough that a framework adds unnecessary complexity, or when you need full control over the execution loop. Switch to a framework like LangGraph when you need stateful multi-agent orchestration, persistent checkpointing, human-in-the-loop approval workflows, or when your agent graph has complex branching and conditional routing that would be error-prone to implement manually.
What are the most common failure modes when building AI agents?
The most common failures are infinite loops (the agent keeps calling tools without converging on an answer), malformed tool arguments (the LLM generates invalid JSON), unhandled tool errors (a tool throws an exception that crashes the loop), context window exhaustion (conversation history exceeds the model's token limit), and cost overruns (unbounded loops generate excessive API charges). Every production agent needs maximum iteration limits, try/except around tool execution, and token budget tracking.