AI/MLJune 21, 20264 min read

LLM Function Calling: A Guide to Dynamic Tool Selection

Master LLM function calling to build reliable agentic workflows. Learn to implement dynamic tool selection with strict schema validation for production apps.

LLMAI EngineeringPythonPydanticAPI IntegrationAgentic WorkflowsAIRAGPrompt Engineering

Last month, I spent about three days debugging a "smart" assistant that kept hallucinating arguments for our internal shipping API. It was a classic case of an agentic workflow gone wrong; the model was confident, but its tool selection logic was completely detached from our actual backend requirements.

If you’re building AI features, you know that LLM function calling is the backbone of any non-trivial agent. It allows your model to stop guessing and start interacting with your infrastructure. But getting it right requires more than just passing a JSON blob to an OpenAI or Anthropic endpoint. You need a bridge between the model's intent and your code's safety.

Why Simple Tool Use Fails

When we first started, we simply dumped our entire API documentation into the system prompt. The model would "select" the right tool, but it frequently ignored required fields or passed strings where we needed integers. This is where most prototypes die.

Before you even touch the model, you need to treat your tool definitions as strict contracts. If you aren't already using Structured output: Implementing Deterministic JSON Schema Validation, stop and fix that first. The model is a probabilistic engine; your schema validation is the deterministic gatekeeper.

Implementing LLM Function Calling with Pydantic

I’ve found that using Pydantic models to define tools is the cleanest way to enforce tool use in production. By defining a schema that mirrors your function signatures, you can automate the conversion process.

Here is a simplified pattern I use with the OpenAI SDK (v1.x+):


PYTHON
from pydantic import BaseModel, Field
from typing import Optional

class GetWeather(BaseModel):
    CE9178">"""Get the current weather for a specific location."""
    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")
    unit: str = Field("celsius", description="The unit of measurement")

# Map your tool to a function
def execute_weather_tool(args: GetWeather):
    # This is your safe execution zone
    return fetch_weather_api(args.location, args.unit)

By binding these Pydantic models to the tools parameter in your API call, you ensure the model has a clear map of what it can do. If the model tries to call a function with an invalid schema, the API will reject it before your code even tries to run it.

The Reality of Agentic Workflows

Agentic workflows aren't just about calling one tool. They are about the model's ability to chain them. I’ve noticed that as soon as you give an LLM more than five tools, its performance on selecting the "right" one drops significantly.

To mitigate this, I implement a tiered tool-routing strategy:

Contextual Pruning: Only provide tools relevant to the current user intent. If the user is asking about billing, don't pass the shipping or inventory tools into the context.
Strict Schema Validation: Always validate the model's output against your Pydantic schema before passing it to your backend.
Guardrails: Use LLM Guardrails for Production: Input Validation and Output Filtering to ensure the arguments the model produces aren't malicious or nonsensical.

Handling Failures Gracefully

Even with perfect schemas, the model will occasionally output a malformed JSON object. Instead of letting your application crash, catch the validation error and feed it back to the model as a system message.


PYTHON
try:
    tool_args = GetWeather.model_validate_json(raw_json)
except ValidationError as e:
    # Tell the model it messed up and why
    return f"Error: {e}. Please correct the arguments."

This feedback loop is often the difference between a brittle prototype and a resilient production service. If you are struggling with intermittent JSON formatting issues, Getting reliable structured output from an LLM in production goes into detail on how to handle these edge cases at the tokenizer level.

What I'm Still Figuring Out

I’m currently experimenting with "tool-switching" latency. Every tool you add to the prompt consumes tokens, which directly impacts latency—I've seen responses slow down by around 150ms just by adding a few complex tool descriptions.

I'm still not entirely sold on whether it's better to have one massive "God-agent" with 20 tools or a swarm of smaller, specialized agents. The swarm approach is theoretically better for performance, but it introduces a whole new class of state-management bugs. For now, I’m sticking to a hybrid: a router agent that selects the right subset of tools, which keeps the context window manageable and the tool selection accurate.

If you’re just starting, keep it simple. Don't build a complex agent orchestration layer until your basic tool-calling pipeline is stable.

Back to Blog

LLM Function Calling: A Guide to Dynamic Tool Selection

Why Simple Tool Use Fails

Implementing LLM Function Calling with Pydantic

The Reality of Agentic Workflows

Handling Failures Gracefully

What I'm Still Figuring Out

Similar Posts

Mastering Query Decomposition for RAG Pipelines: A Practical Guide

LLM Cost Control: Mastering Dynamic Context Window Management

LLM agents self-correction: Building Recursive Feedback Loops