LLM Tool Calling — How AI Agents Use Functions (2026)

Tool calling (also called function calling) is the mechanism that turns an LLM from a text generator into an agent. Instead of just producing words, the model can request the execution of functions — calling APIs, querying databases, performing calculations, or triggering workflows. This guide covers the architecture, working code for OpenAI and Anthropic, and the production patterns that make tool calling reliable.

Updated March 2026 — Covers Claude tool_use with streaming, OpenAI parallel tool calls, and the Model Context Protocol (MCP) for standardized tool integration.

1. Why Tool Calling Matters

Tool calling transforms an LLM from a text generator into an agent that can take real actions — calling APIs, querying databases, and triggering external workflows.

Why Tool Calling Changes Everything

Without tool calling, an LLM can only reason about what it already knows from training. Ask it “What’s the current weather in Tokyo?” and it gives you a plausible but potentially outdated answer.

With tool calling, the LLM recognizes that it needs live data, requests a get_weather(city="Tokyo") function call, your code executes the API call, and the LLM incorporates the real result into its response.

This is the foundation of every AI agent. RAG retrieves documents. Tool calling takes actions. Together, they give LLMs access to the real world.

Who this is for:

Senior engineers building production agent systems
Junior engineers learning how agents work under the hood
Teams evaluating tool calling vs. RAG for their use cases

2. Real-World Problem Context

Every production LLM application eventually needs capabilities beyond text generation: live data, database queries, calculations, and side-effecting actions.

What Tool Calling Solves

Every production LLM application eventually needs to do something beyond generating text:

Need	Without Tool Calling	With Tool Calling
Current data	Stale training data	Live API calls
Database queries	Hallucinated results	Actual query execution
Calculations	Approximate math	Exact computation
External actions	Impossible	API triggers (emails, payments, etc.)
Multi-step reasoning	Single-pass guess	Iterative tool use + reasoning

The Critical Misconception

The LLM does not execute functions. This is the most important concept in tool calling. The LLM outputs a structured JSON request — “call get_weather with city=Tokyo” — and your application code handles the actual execution. The LLM never touches your API keys, database connections, or file system directly.

This separation is both a security feature and an engineering pattern. You control what gets executed, with what permissions, and with what validation.

3. How Tool Calling Works

Tool calling follows a request-decide-execute-synthesize loop: the LLM requests a function, your code runs it, and the result feeds back into the next LLM turn.

The Tool Calling Loop

Think of tool calling as a conversation with a structured detour:

User sends a message → “What’s the weather in Tokyo?”
LLM responds with a tool request → {"name": "get_weather", "input": {"city": "Tokyo"}}
Your code executes the function → calls weather API → gets {"temp": 22, "condition": "cloudy"}
You send the result back → tool result message with the JSON response
LLM generates final answer → “It’s currently 22°C and cloudy in Tokyo.”

Steps 2-4 can repeat multiple times — the LLM might call several tools before generating a final response. This loop is the foundation of every ReAct agent.

The Tool Calling Architecture

📊 Visual Explanation

LLM Tool Calling Loop

The model requests functions, your code executes them, results feed back — this loop is the foundation of every AI agent.

Request PhaseUser prompt + tool definitions sent to LLM

User message

Tool schemas (JSON)

System prompt

API request

Decision PhaseLLM decides: respond directly or call a tool

LLM processes context

Route: text response?

Route: tool_use?

Output structured JSON

Execution PhaseYour code runs the function — LLM never touches it

Parse tool name + args

Validate arguments

Execute function

Return tool_result

Synthesis PhaseLLM incorporates results into final response

Tool result in context

LLM reasons over result

More tools needed?

Final text response

Idle

4. Step-by-Step Implementation

Implementation requires four steps: define tool schemas with JSON Schema, send them with the API request, handle the tool_use response, and loop until the LLM returns a final text answer.

Step 1: Define Tool Schemas

Both OpenAI and Anthropic use JSON Schema to define tools:

# Anthropic (Claude) tool definition
tools = [
    {
        "name": "get_weather",
        "description": "Get current weather for a city. Use when the user asks about weather conditions.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name, e.g. 'Tokyo' or 'San Francisco'"
                }
            },
            "required": ["city"]
        }
    }
]

Key rule: The description field matters more than you think. The LLM uses it to decide when to call the tool. Vague descriptions lead to incorrect tool selection.

Step 2: Send the Request

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)

Step 3: Handle the Tool Call

# Check if the model wants to use a tool
for block in response.content:
    if block.type == "tool_use":
        tool_name = block.name      # "get_weather"
        tool_input = block.input    # {"city": "Tokyo"}
        tool_use_id = block.id      # unique ID for this call

        # YOUR code executes the function
        result = get_weather(tool_input["city"])

        # Send result back to the LLM
        follow_up = client.messages.create(
            model="claude-sonnet-4-6-20250514",
            max_tokens=1024,
            tools=tools,
            messages=[
                {"role": "user", "content": "What's the weather in Tokyo?"},
                {"role": "assistant", "content": response.content},
                {
                    "role": "user",
                    "content": [{
                        "type": "tool_result",
                        "tool_use_id": tool_use_id,
                        "content": str(result)
                    }]
                }
            ]
        )

Step 4: The Multi-Turn Loop

Production agents need a loop that continues until the LLM stops requesting tools:

def run_agent(user_message: str, tools: list, max_turns: int = 10):
    messages = [{"role": "user", "content": user_message}]

    for _ in range(max_turns):
        response = client.messages.create(
            model="claude-sonnet-4-6-20250514",
            max_tokens=1024,
            tools=tools,
            messages=messages
        )

        # If no tool calls, we're done
        if response.stop_reason == "end_turn":
            return response.content

        # Process tool calls
        messages.append({"role": "assistant", "content": response.content})
        tool_results = []

        for block in response.content:
            if block.type == "tool_use":
                result = execute_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": str(result)
                })

        messages.append({"role": "user", "content": tool_results})

    return "Max turns reached"

5. Tool Calling Across Providers

OpenAI and Anthropic share the same core concept but differ in schema keys, response types, and result format — and MCP standardizes tool integration across all providers.

OpenAI vs Anthropic Tool Calling

The concept is identical; the API shapes differ:

Aspect	OpenAI	Anthropic
Tool definition key	`functions` or `tools`	`tools`
Schema location	`parameters`	`input_schema`
Response type	`function_call` or `tool_calls`	`tool_use` content block
Multi-tool per turn	Yes (parallel)	Yes (parallel)
Result format	`tool` role message	`tool_result` content block

Model Context Protocol (MCP)

MCP standardizes how LLMs connect to external tools and data sources. Instead of defining tools per-provider, MCP provides a universal protocol. Claude Code, Cursor, and other tools use MCP servers for standardized tool access.

6. Tool Calling Code Examples in Python

A research agent with web_search, read_url, and save_finding tools demonstrates how the multi-turn loop handles complex, multi-step tasks automatically.

Example: A Research Agent with Tools

Here’s what a production research agent looks like with tool calling:

research_tools = [
    {
        "name": "web_search",
        "description": "Search the web for current information on a topic",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search query"}
            },
            "required": ["query"]
        }
    },
    {
        "name": "read_url",
        "description": "Read and extract text content from a URL",
        "input_schema": {
            "type": "object",
            "properties": {
                "url": {"type": "string", "description": "URL to read"}
            },
            "required": ["url"]
        }
    },
    {
        "name": "save_finding",
        "description": "Save a research finding with source attribution",
        "input_schema": {
            "type": "object",
            "properties": {
                "finding": {"type": "string"},
                "source_url": {"type": "string"},
                "confidence": {"type": "string", "enum": ["high", "medium", "low"]}
            },
            "required": ["finding", "source_url", "confidence"]
        }
    }
]

The agent will search → read results → save findings → search again → synthesize. The multi-turn loop handles the iteration automatically.

7. Tool Calling Trade-offs and Pitfalls

The most common failures are vague tool descriptions, missing argument validation, unbounded tool loops, and hallucinated tool calls — each with a clear mitigation pattern.

Where Engineers Get Burned

Tool description quality: The most common failure is vague tool descriptions. If the LLM can’t tell when to use a tool, it either never calls it or calls it incorrectly. Invest time in descriptions.

Argument validation: The LLM generates arguments, but they can be wrong — misspelled city names, out-of-range numbers, invalid enum values. Always validate before execution.

Cost of tool loops: Each tool call is a separate API turn. An agent that calls 5 tools generates 5x the token usage of a single response. Budget accordingly.

Hallucinated tool calls: The LLM may try to call tools that don’t exist or pass arguments for functions it doesn’t have. Always validate the tool name against your registry before execution.

Tool Calling vs RAG — When to Use Which

Dimension	Tool Calling	RAG
Data freshness	Real-time	As fresh as your index
Data scope	Any function you define	Your document corpus
Actions	Can trigger side effects	Read-only
Cost per query	Higher (multi-turn)	Lower (single retrieval)
Best for	Live data, actions, calculations	Knowledge retrieval, Q&A

Most production agents use both. RAG for knowledge, tool calling for actions and live data.

8. Function Calling Interview Questions

Interviewers test whether you understand the LLM-does-not-execute separation, can design multi-tool agents, and know when to combine tool calling with RAG.

What Interviewers Ask

Q: “Explain how tool calling works in LLMs.”

Strong answer: “Tool calling is a structured output format where the LLM generates a JSON request instead of text. The LLM outputs the tool name and arguments, but never executes the function — my application code handles execution, validation, and error handling. The result is sent back as a tool_result message, and the LLM incorporates it into its next response. This loop continues until the LLM produces a final text answer. It’s the mechanism that turns an LLM into an agent.”

Q: “Design a customer support agent that can look up orders and process refunds.”

Strong answer: “I’d define three tools: lookup_order(order_id) for retrieving order details, check_refund_eligibility(order_id) for policy validation, and process_refund(order_id, amount, reason) for the actual refund. The key design decision is making process_refund require prior check_refund_eligibility — I’d enforce this in the application code, not the LLM prompt. The system prompt instructs the agent to always verify eligibility before processing. I’d add a human-in-the-loop interrupt before any process_refund execution using LangGraph checkpointing.”

9. Tool Calling in Production

Production tool calling requires narrow tool definitions, timeouts on every execution, rate limiting per external API, and full audit logging for debugging and compliance.

Production Tool Calling Patterns

1. Tool routing with specialization: Define narrow, focused tools rather than broad ones. search_products(query) is better than database_query(sql).

2. Timeout and retry: External API calls fail. Implement timeouts on every tool execution and return structured error messages the LLM can reason about.

3. Rate limiting: Tools that call external APIs need rate limiting. The LLM will happily call the same API 100 times in a loop if you let it.

4. Audit logging: Log every tool call — name, arguments, result, latency. This is essential for debugging agent behavior and for compliance in regulated industries.

10. Summary and Key Takeaways

Tool calling lets LLMs invoke functions — the LLM requests, your code executes
JSON Schema defines tools — name, description, and input_schema are the three required fields
The multi-turn loop is the foundation of every ReAct agent
Tool descriptions drive selection quality — invest in clear, specific descriptions
Always validate arguments before execution — the LLM can produce invalid inputs
Combine tool calling with RAG — retrieval for knowledge, tools for actions and live data
MCP standardizes tool integration across providers and environments

AI Agents — How agents use tool calling for multi-step reasoning
Anthropic API Guide — Working code for Claude tool_use
Model Context Protocol — Standardized tool integration across LLM environments
RAG Architecture — Retrieval-augmented generation for knowledge access
LangGraph Tutorial — Building stateful agents with tool calling nodes
Prompt Engineering — System prompts that guide tool selection

Frequently Asked Questions

What is tool calling in LLMs?

Tool calling (also called function calling) lets an LLM request the execution of external functions during a conversation. The LLM does not execute the function itself — it outputs a structured JSON request specifying which function to call and with what arguments. Your application code executes the function and returns the result to the LLM for the next turn.

How is tool calling different from RAG?

RAG retrieves static documents to augment the LLM's context. Tool calling invokes live functions — APIs, databases, calculations, or any code. RAG answers questions from existing knowledge. Tool calling takes actions and retrieves real-time data. Most production agents combine both: RAG for knowledge retrieval and tool calling for actions.

Which LLMs support tool calling?

As of 2026, OpenAI (GPT-4, GPT-4o), Anthropic (Claude 3.5 Sonnet, Claude Opus), Google (Gemini 1.5 Pro), and most major LLM providers support native tool calling. The JSON Schema format for tool definitions is largely standardized across providers, though response formats differ slightly.

Does the LLM actually execute functions during tool calling?

No. The LLM never executes functions directly. It outputs a structured JSON request specifying the tool name and arguments. Your application code handles the actual execution, validation, and error handling. The LLM never touches your API keys, database connections, or file system. This separation is both a security feature and an engineering pattern.

What is the multi-turn tool calling loop?

The multi-turn loop is the foundation of every ReAct agent. The LLM receives a user message, decides whether to call a tool or respond directly, and if it calls a tool, your code executes it and sends the result back. This loop continues — potentially calling multiple tools across multiple turns — until the LLM produces a final text response.

How do you define tools for LLM APIs?

Tools are defined using JSON Schema with three required fields: name (the function identifier), description (tells the LLM when to use the tool), and input_schema (parameter types and constraints). The description field is critical because the LLM uses it to decide when to call the tool. Vague descriptions lead to incorrect tool selection.

What is the Model Context Protocol (MCP)?

MCP standardizes how LLMs connect to external tools and data sources. Instead of defining tools per-provider, MCP provides a universal protocol for tool integration. Claude Code, Cursor, and other tools use MCP servers for standardized tool access, making it easier to share tool definitions across different LLM environments.

What are common tool calling failure modes?

The most common failures are vague tool descriptions (the LLM cannot tell when to use a tool), missing argument validation (the LLM may pass misspelled names or out-of-range values), unbounded tool loops (the LLM calling the same API repeatedly without limits), and hallucinated tool calls (attempting to call tools that do not exist). Always validate tool names and arguments before execution.

How do OpenAI and Anthropic tool calling differ?

The core concept is identical but API shapes differ. OpenAI uses the parameters key for schemas and returns function_call or tool_calls responses. Anthropic uses input_schema for definitions and returns tool_use content blocks. Both support parallel tool calls within a single turn.

What production patterns are essential for tool calling?

Production tool calling requires four patterns: tool routing with narrow, focused tool definitions; timeouts and retries on every external API call with structured error messages the LLM can reason about; rate limiting to prevent unbounded API loops; and full audit logging of every tool call including name, arguments, result, and latency for debugging and compliance.