Agentic Tool Use and Function Calling in Production

Master Agentic Tool Use and Function Calling to turn LLMs into active systems. Learn to define schemas, implement execution loops, and parse outputs safely.

AgenticTool UseFunction CallingLLMsProductionPythonaimachine-learning

Previously in this course, we covered Context Management and Windowing to ensure our models have the right information. While RAG allows an LLM to read data, Agentic Tool Use allows it to act on that data. By implementing Function Calling, you transform your model from a passive text generator into an autonomous system capable of querying databases, triggering APIs, or performing calculations.

Defining Tool Schemas

To make an LLM aware of your tools, you must provide a structured specification, typically in JSON Schema. The model doesn't "see" your Python code; it sees a semantic description of what a function does and what parameters it requires.

A well-defined schema includes:

Name: The identifier the model will use to invoke the tool.
Description: A clear, imperative statement explaining when to use the tool.
Parameters: A JSON object defining expected inputs, types, and required fields.


PYTHON
# Example: Schema for a database search tool
tool_schema = {
    "type": "function",
    "function": {
        "name": "query_inventory",
        "description": "Look up current stock levels for a specific product ID.",
        "parameters": {
            "type": "object",
            "properties": {
                "product_id": {"type": "string", "description": "The SKU of the item"},
                "warehouse_zone": {"type": "string", "enum": ["north", "south", "east"]}
            },
            "required": ["product_id"]
        }
    }
}

The quality of the description is the most critical factor in tool selection accuracy. If the model is unsure when to call the tool, it will either hallucinate a call or ignore it entirely.

Implementing Tool Execution Loops

An agentic loop is a feedback cycle: the model generates a request, your system executes it, and the result is fed back into the model to generate the final answer. This is not a linear request-response; it is a state machine.


Flow diagram: User Prompt → LLM; LLM → Tool Call Request Execution Engine; Execution Engine → Execute Function API/Database; API/Database → Observation LLM; LLM → Final Answer User Response

To implement this, you need a robust loop that captures the "tool call" state. Do not attempt to parse raw model output manually; use the provider's structured output format (e.g., OpenAI's tool_calls field).


PYTHON
def run_agent(prompt, tools):
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(model="gpt-4o", messages=messages, tools=tools)
    
    if response.choices[0].message.tool_calls:
        tool_call = response.choices[0].message.tool_calls[0]
        # Execute the function defined in your local code
        result = execute_tool(tool_call.function.name, tool_call.function.arguments)
        
        # Append tool output back to conversation history
        messages.append(response.choices[0].message)
        messages.append({"role": "tool", "content": str(result), "tool_call_id": tool_call.id})
        
        # Final generation
        return client.chat.completions.create(model="gpt-4o", messages=messages)
    return response

Handling Function Output Parsing

The most common point of failure in agentic systems is the "Argument Hallucination" problem. LLMs often attempt to pass parameters that aren't in your schema or fail to escape JSON correctly.

Best Practices for Parsing:

Strict Typing: Always validate the LLM's output against your schema using a library like pydantic before passing it to your function.
Error Reporting: If a function fails (e.g., invalid product_id), pass the error message back to the LLM as a "tool" message. Let the model attempt to correct itself.
Security: Never eval() or exec() model output. Map tool names to a hardcoded dictionary of allowed functions.

Method	Benefit	Risk
Direct JSON Parse	Fast, no dependencies	Brittle, fails on malformed JSON
Pydantic Validation	Type-safe, robust	Adds overhead, requires schema sync
LLM-as-a-Parser	Flexible	Higher latency, potential for circular loops

Hands-on Exercise

Create a "Calculator Agent."

Define a schema for an add(a, b) and multiply(a, b) function.
Write a Python script that accepts a user prompt like "What is 5 plus 10 multiplied by 3?".
Implement the loop such that the LLM realizes it needs to call add and multiply in sequence.
Constraint: If the model tries to call a function not in your list, return an error message to the model instructing it to only use provided tools.

Common Pitfalls

The "Infinite Loop" Trap: An agent might get stuck calling the same tool repeatedly if the observation provided doesn't explicitly help it move toward the goal. Ensure your tool outputs are descriptive enough to signal "task complete."
Context Bloat: Each tool call and observation increases the context window usage. In long-running agents, you must prune the history or summarize previous tool interactions.
Implicit Assumptions: If your tool expects a date but the LLM provides it in a different format, the execution will crash. Always explicitly state the required format in the description field of your schema (e.g., "YYYY-MM-DD").

Recap

Agentic tool use requires precise schema definition and a robust, loop-based execution architecture. By treating tool outputs as state updates rather than simple responses, you create systems that can reliably interface with external infrastructure.

Up next: Chain-of-Thought and Multi-Step Reasoning — moving from single-tool calls to complex, multi-stage agentic problem solving.

Back to Blog

Agentic Tool Use and Function Calling in Production

Defining Tool Schemas

Implementing Tool Execution Loops

Handling Function Output Parsing

Hands-on Exercise

Common Pitfalls

Recap

Similar Posts

Logging and Observability for Production ML Pipelines

Monitoring Data Drift: A Practical Guide for ML Engineers

Mixture-of-Experts (MoE) Layers: Scaling Efficiently with Sparsity