LLM Tool Calling — How AI Agents Use Functions (2026)
Tool calling (also called function calling) is the mechanism that turns an LLM from a text generator into an agent. Instead of just producing words, the model can request the execution of functions — calling APIs, querying databases, performing calculations, or triggering workflows. This guide covers the architecture, working code for OpenAI and Anthropic, and the production patterns that make tool calling reliable.
Updated March 2026 — Covers Claude tool_use with streaming, OpenAI parallel tool calls, and the Model Context Protocol (MCP) for standardized tool integration.
1. Why Tool Calling Matters
Section titled “1. Why Tool Calling Matters”Tool calling transforms an LLM from a text generator into an agent that can take real actions — calling APIs, querying databases, and triggering external workflows.
Why Tool Calling Changes Everything
Section titled “Why Tool Calling Changes Everything”Without tool calling, an LLM can only reason about what it already knows from training. Ask it “What’s the current weather in Tokyo?” and it gives you a plausible but potentially outdated answer.
With tool calling, the LLM recognizes that it needs live data, requests a get_weather(city="Tokyo") function call, your code executes the API call, and the LLM incorporates the real result into its response.
This is the foundation of every AI agent. RAG retrieves documents. Tool calling takes actions. Together, they give LLMs access to the real world.
Who this is for:
- Senior engineers building production agent systems
- Junior engineers learning how agents work under the hood
- Teams evaluating tool calling vs. RAG for their use cases
2. Real-World Problem Context
Section titled “2. Real-World Problem Context”Every production LLM application eventually needs capabilities beyond text generation: live data, database queries, calculations, and side-effecting actions.
What Tool Calling Solves
Section titled “What Tool Calling Solves”Every production LLM application eventually needs to do something beyond generating text:
| Need | Without Tool Calling | With Tool Calling |
|---|---|---|
| Current data | Stale training data | Live API calls |
| Database queries | Hallucinated results | Actual query execution |
| Calculations | Approximate math | Exact computation |
| External actions | Impossible | API triggers (emails, payments, etc.) |
| Multi-step reasoning | Single-pass guess | Iterative tool use + reasoning |
The Critical Misconception
Section titled “The Critical Misconception”The LLM does not execute functions. This is the most important concept in tool calling. The LLM outputs a structured JSON request — “call get_weather with city=Tokyo” — and your application code handles the actual execution. The LLM never touches your API keys, database connections, or file system directly.
This separation is both a security feature and an engineering pattern. You control what gets executed, with what permissions, and with what validation.
3. How Tool Calling Works
Section titled “3. How Tool Calling Works”Tool calling follows a request-decide-execute-synthesize loop: the LLM requests a function, your code runs it, and the result feeds back into the next LLM turn.
The Tool Calling Loop
Section titled “The Tool Calling Loop”Think of tool calling as a conversation with a structured detour:
- User sends a message → “What’s the weather in Tokyo?”
- LLM responds with a tool request →
{"name": "get_weather", "input": {"city": "Tokyo"}} - Your code executes the function → calls weather API → gets
{"temp": 22, "condition": "cloudy"} - You send the result back → tool result message with the JSON response
- LLM generates final answer → “It’s currently 22°C and cloudy in Tokyo.”
Steps 2-4 can repeat multiple times — the LLM might call several tools before generating a final response. This loop is the foundation of every ReAct agent.
The Tool Calling Architecture
Section titled “The Tool Calling Architecture”📊 Visual Explanation
Section titled “📊 Visual Explanation”LLM Tool Calling Loop
The model requests functions, your code executes them, results feed back — this loop is the foundation of every AI agent.
4. Step-by-Step Implementation
Section titled “4. Step-by-Step Implementation”Implementation requires four steps: define tool schemas with JSON Schema, send them with the API request, handle the tool_use response, and loop until the LLM returns a final text answer.
Step 1: Define Tool Schemas
Section titled “Step 1: Define Tool Schemas”Both OpenAI and Anthropic use JSON Schema to define tools:
# Anthropic (Claude) tool definitiontools = [ { "name": "get_weather", "description": "Get current weather for a city. Use when the user asks about weather conditions.", "input_schema": { "type": "object", "properties": { "city": { "type": "string", "description": "City name, e.g. 'Tokyo' or 'San Francisco'" } }, "required": ["city"] } }]Key rule: The description field matters more than you think. The LLM uses it to decide when to call the tool. Vague descriptions lead to incorrect tool selection.
Step 2: Send the Request
Section titled “Step 2: Send the Request”import anthropic
client = anthropic.Anthropic()
response = client.messages.create( model="claude-sonnet-4-6-20250514", max_tokens=1024, tools=tools, messages=[{"role": "user", "content": "What's the weather in Tokyo?"}])Step 3: Handle the Tool Call
Section titled “Step 3: Handle the Tool Call”# Check if the model wants to use a toolfor block in response.content: if block.type == "tool_use": tool_name = block.name # "get_weather" tool_input = block.input # {"city": "Tokyo"} tool_use_id = block.id # unique ID for this call
# YOUR code executes the function result = get_weather(tool_input["city"])
# Send result back to the LLM follow_up = client.messages.create( model="claude-sonnet-4-6-20250514", max_tokens=1024, tools=tools, messages=[ {"role": "user", "content": "What's the weather in Tokyo?"}, {"role": "assistant", "content": response.content}, { "role": "user", "content": [{ "type": "tool_result", "tool_use_id": tool_use_id, "content": str(result) }] } ] )Step 4: The Multi-Turn Loop
Section titled “Step 4: The Multi-Turn Loop”Production agents need a loop that continues until the LLM stops requesting tools:
def run_agent(user_message: str, tools: list, max_turns: int = 10): messages = [{"role": "user", "content": user_message}]
for _ in range(max_turns): response = client.messages.create( model="claude-sonnet-4-6-20250514", max_tokens=1024, tools=tools, messages=messages )
# If no tool calls, we're done if response.stop_reason == "end_turn": return response.content
# Process tool calls messages.append({"role": "assistant", "content": response.content}) tool_results = []
for block in response.content: if block.type == "tool_use": result = execute_tool(block.name, block.input) tool_results.append({ "type": "tool_result", "tool_use_id": block.id, "content": str(result) })
messages.append({"role": "user", "content": tool_results})
return "Max turns reached"5. Tool Calling Across Providers
Section titled “5. Tool Calling Across Providers”OpenAI and Anthropic share the same core concept but differ in schema keys, response types, and result format — and MCP standardizes tool integration across all providers.
OpenAI vs Anthropic Tool Calling
Section titled “OpenAI vs Anthropic Tool Calling”The concept is identical; the API shapes differ:
| Aspect | OpenAI | Anthropic |
|---|---|---|
| Tool definition key | functions or tools | tools |
| Schema location | parameters | input_schema |
| Response type | function_call or tool_calls | tool_use content block |
| Multi-tool per turn | Yes (parallel) | Yes (parallel) |
| Result format | tool role message | tool_result content block |
Model Context Protocol (MCP)
Section titled “Model Context Protocol (MCP)”MCP standardizes how LLMs connect to external tools and data sources. Instead of defining tools per-provider, MCP provides a universal protocol. Claude Code, Cursor, and other tools use MCP servers for standardized tool access.
6. Tool Calling Code Examples in Python
Section titled “6. Tool Calling Code Examples in Python”A research agent with web_search, read_url, and save_finding tools demonstrates how the multi-turn loop handles complex, multi-step tasks automatically.
Example: A Research Agent with Tools
Section titled “Example: A Research Agent with Tools”Here’s what a production research agent looks like with tool calling:
research_tools = [ { "name": "web_search", "description": "Search the web for current information on a topic", "input_schema": { "type": "object", "properties": { "query": {"type": "string", "description": "Search query"} }, "required": ["query"] } }, { "name": "read_url", "description": "Read and extract text content from a URL", "input_schema": { "type": "object", "properties": { "url": {"type": "string", "description": "URL to read"} }, "required": ["url"] } }, { "name": "save_finding", "description": "Save a research finding with source attribution", "input_schema": { "type": "object", "properties": { "finding": {"type": "string"}, "source_url": {"type": "string"}, "confidence": {"type": "string", "enum": ["high", "medium", "low"]} }, "required": ["finding", "source_url", "confidence"] } }]The agent will search → read results → save findings → search again → synthesize. The multi-turn loop handles the iteration automatically.
7. Tool Calling Trade-offs and Pitfalls
Section titled “7. Tool Calling Trade-offs and Pitfalls”The most common failures are vague tool descriptions, missing argument validation, unbounded tool loops, and hallucinated tool calls — each with a clear mitigation pattern.
Where Engineers Get Burned
Section titled “Where Engineers Get Burned”Tool description quality: The most common failure is vague tool descriptions. If the LLM can’t tell when to use a tool, it either never calls it or calls it incorrectly. Invest time in descriptions.
Argument validation: The LLM generates arguments, but they can be wrong — misspelled city names, out-of-range numbers, invalid enum values. Always validate before execution.
Cost of tool loops: Each tool call is a separate API turn. An agent that calls 5 tools generates 5x the token usage of a single response. Budget accordingly.
Hallucinated tool calls: The LLM may try to call tools that don’t exist or pass arguments for functions it doesn’t have. Always validate the tool name against your registry before execution.
Tool Calling vs RAG — When to Use Which
Section titled “Tool Calling vs RAG — When to Use Which”| Dimension | Tool Calling | RAG |
|---|---|---|
| Data freshness | Real-time | As fresh as your index |
| Data scope | Any function you define | Your document corpus |
| Actions | Can trigger side effects | Read-only |
| Cost per query | Higher (multi-turn) | Lower (single retrieval) |
| Best for | Live data, actions, calculations | Knowledge retrieval, Q&A |
Most production agents use both. RAG for knowledge, tool calling for actions and live data.
8. Function Calling Interview Questions
Section titled “8. Function Calling Interview Questions”Interviewers test whether you understand the LLM-does-not-execute separation, can design multi-tool agents, and know when to combine tool calling with RAG.
What Interviewers Ask
Section titled “What Interviewers Ask”Q: “Explain how tool calling works in LLMs.”
Strong answer: “Tool calling is a structured output format where the LLM generates a JSON request instead of text. The LLM outputs the tool name and arguments, but never executes the function — my application code handles execution, validation, and error handling. The result is sent back as a tool_result message, and the LLM incorporates it into its next response. This loop continues until the LLM produces a final text answer. It’s the mechanism that turns an LLM into an agent.”
Q: “Design a customer support agent that can look up orders and process refunds.”
Strong answer: “I’d define three tools: lookup_order(order_id) for retrieving order details, check_refund_eligibility(order_id) for policy validation, and process_refund(order_id, amount, reason) for the actual refund. The key design decision is making process_refund require prior check_refund_eligibility — I’d enforce this in the application code, not the LLM prompt. The system prompt instructs the agent to always verify eligibility before processing. I’d add a human-in-the-loop interrupt before any process_refund execution using LangGraph checkpointing.”
9. Tool Calling in Production
Section titled “9. Tool Calling in Production”Production tool calling requires narrow tool definitions, timeouts on every execution, rate limiting per external API, and full audit logging for debugging and compliance.
Production Tool Calling Patterns
Section titled “Production Tool Calling Patterns”1. Tool routing with specialization: Define narrow, focused tools rather than broad ones. search_products(query) is better than database_query(sql).
2. Timeout and retry: External API calls fail. Implement timeouts on every tool execution and return structured error messages the LLM can reason about.
3. Rate limiting: Tools that call external APIs need rate limiting. The LLM will happily call the same API 100 times in a loop if you let it.
4. Audit logging: Log every tool call — name, arguments, result, latency. This is essential for debugging agent behavior and for compliance in regulated industries.
10. Summary and Key Takeaways
Section titled “10. Summary and Key Takeaways”- Tool calling lets LLMs invoke functions — the LLM requests, your code executes
- JSON Schema defines tools —
name,description, andinput_schemaare the three required fields - The multi-turn loop is the foundation of every ReAct agent
- Tool descriptions drive selection quality — invest in clear, specific descriptions
- Always validate arguments before execution — the LLM can produce invalid inputs
- Combine tool calling with RAG — retrieval for knowledge, tools for actions and live data
- MCP standardizes tool integration across providers and environments
Related
Section titled “Related”- AI Agents — How agents use tool calling for multi-step reasoning
- Anthropic API Guide — Working code for Claude tool_use
- Model Context Protocol — Standardized tool integration across LLM environments
- RAG Architecture — Retrieval-augmented generation for knowledge access
- LangGraph Tutorial — Building stateful agents with tool calling nodes
- Prompt Engineering — System prompts that guide tool selection
Frequently Asked Questions
What is tool calling in LLMs?
Tool calling (also called function calling) lets an LLM request the execution of external functions during a conversation. The LLM does not execute the function itself — it outputs a structured JSON request specifying which function to call and with what arguments. Your application code executes the function and returns the result to the LLM for the next turn.
How is tool calling different from RAG?
RAG retrieves static documents to augment the LLM's context. Tool calling invokes live functions — APIs, databases, calculations, or any code. RAG answers questions from existing knowledge. Tool calling takes actions and retrieves real-time data. Most production agents combine both: RAG for knowledge retrieval and tool calling for actions.
Which LLMs support tool calling?
As of 2026, OpenAI (GPT-4, GPT-4o), Anthropic (Claude 3.5 Sonnet, Claude Opus), Google (Gemini 1.5 Pro), and most major LLM providers support native tool calling. The JSON Schema format for tool definitions is largely standardized across providers, though response formats differ slightly.
Does the LLM actually execute functions during tool calling?
No. The LLM never executes functions directly. It outputs a structured JSON request specifying the tool name and arguments. Your application code handles the actual execution, validation, and error handling. The LLM never touches your API keys, database connections, or file system. This separation is both a security feature and an engineering pattern.
What is the multi-turn tool calling loop?
The multi-turn loop is the foundation of every ReAct agent. The LLM receives a user message, decides whether to call a tool or respond directly, and if it calls a tool, your code executes it and sends the result back. This loop continues — potentially calling multiple tools across multiple turns — until the LLM produces a final text response.
How do you define tools for LLM APIs?
Tools are defined using JSON Schema with three required fields: name (the function identifier), description (tells the LLM when to use the tool), and input_schema (parameter types and constraints). The description field is critical because the LLM uses it to decide when to call the tool. Vague descriptions lead to incorrect tool selection.
What is the Model Context Protocol (MCP)?
MCP standardizes how LLMs connect to external tools and data sources. Instead of defining tools per-provider, MCP provides a universal protocol for tool integration. Claude Code, Cursor, and other tools use MCP servers for standardized tool access, making it easier to share tool definitions across different LLM environments.
What are common tool calling failure modes?
The most common failures are vague tool descriptions (the LLM cannot tell when to use a tool), missing argument validation (the LLM may pass misspelled names or out-of-range values), unbounded tool loops (the LLM calling the same API repeatedly without limits), and hallucinated tool calls (attempting to call tools that do not exist). Always validate tool names and arguments before execution.
How do OpenAI and Anthropic tool calling differ?
The core concept is identical but API shapes differ. OpenAI uses the parameters key for schemas and returns function_call or tool_calls responses. Anthropic uses input_schema for definitions and returns tool_use content blocks. Both support parallel tool calls within a single turn.
What production patterns are essential for tool calling?
Production tool calling requires four patterns: tool routing with narrow, focused tool definitions; timeouts and retries on every external API call with structured error messages the LLM can reason about; rate limiting to prevent unbounded API loops; and full audit logging of every tool call including name, arguments, result, and latency for debugging and compliance.