Master LLM function calling to build reliable agentic workflows. Learn to implement dynamic tool selection with strict schema validation for production apps.
Last month, I spent about three days debugging a "smart" assistant that kept hallucinating arguments for our internal shipping API. It was a classic case of an agentic workflow gone wrong; the model was confident, but its tool selection logic was completely detached from our actual backend requirements.
If you’re building AI features, you know that LLM function calling is the backbone of any non-trivial agent. It allows your model to stop guessing and start interacting with your infrastructure. But getting it right requires more than just passing a JSON blob to an OpenAI or Anthropic endpoint. You need a bridge between the model's intent and your code's safety.
When we first started, we simply dumped our entire API documentation into the system prompt. The model would "select" the right tool, but it frequently ignored required fields or passed strings where we needed integers. This is where most prototypes die.
Before you even touch the model, you need to treat your tool definitions as strict contracts. If you aren't already using Structured output: Implementing Deterministic JSON Schema Validation, stop and fix that first. The model is a probabilistic engine; your schema validation is the deterministic gatekeeper.
I’ve found that using Pydantic models to define tools is the cleanest way to enforce tool use in production. By defining a schema that mirrors your function signatures, you can automate the conversion process.
Here is a simplified pattern I use with the OpenAI SDK (v1.x+):
PYTHONfrom pydantic import BaseModel, Field from typing import Optional class GetWeather(BaseModel): CE9178">"""Get the current weather for a specific location.""" location: str = Field(..., description="The city and state, e.g. San Francisco, CA") unit: str = Field("celsius", description="The unit of measurement") # Map your tool to a function def execute_weather_tool(args: GetWeather): # This is your safe execution zone return fetch_weather_api(args.location, args.unit)
By binding these Pydantic models to the tools parameter in your API call, you ensure the model has a clear map of what it can do. If the model tries to call a function with an invalid schema, the API will reject it before your code even tries to run it.
Agentic workflows aren't just about calling one tool. They are about the model's ability to chain them. I’ve noticed that as soon as you give an LLM more than five tools, its performance on selecting the "right" one drops significantly.
To mitigate this, I implement a tiered tool-routing strategy:
Even with perfect schemas, the model will occasionally output a malformed JSON object. Instead of letting your application crash, catch the validation error and feed it back to the model as a system message.
PYTHONtry: tool_args = GetWeather.model_validate_json(raw_json) except ValidationError as e: # Tell the model it messed up and why return f"Error: {e}. Please correct the arguments."
This feedback loop is often the difference between a brittle prototype and a resilient production service. If you are struggling with intermittent JSON formatting issues, Getting reliable structured output from an LLM in production goes into detail on how to handle these edge cases at the tokenizer level.
I’m currently experimenting with "tool-switching" latency. Every tool you add to the prompt consumes tokens, which directly impacts latency—I've seen responses slow down by around 150ms just by adding a few complex tool descriptions.
I'm still not entirely sold on whether it's better to have one massive "God-agent" with 20 tools or a swarm of smaller, specialized agents. The swarm approach is theoretically better for performance, but it introduces a whole new class of state-management bugs. For now, I’m sticking to a hybrid: a router agent that selects the right subset of tools, which keeps the context window manageable and the tool selection accurate.
If you’re just starting, keep it simple. Don't build a complex agent orchestration layer until your basic tool-calling pipeline is stable.
LLM cost control is vital for production RAG pipelines. Learn how to implement dynamic context window management to optimize token usage and reduce latency.