Mahamudul Hasan Rubel
HomeBlogCoursesAboutProjectsSkillsExperiencePhotosContact
Mahamudul Hasan Rubel

Senior Software Engineer crafting high-performance web applications and SaaS platforms.

Navigation

  • Home
  • Blog
  • Courses
  • About
  • Projects
  • Skills
  • Experience
  • Photos
  • Contact

Get in Touch

Available for senior/lead roles and consulting.

bd.mhrubel@gmail.comHire Me

Subscribe to the newsletter

Get new articles and course lessons delivered to your inbox. No spam, unsubscribe anytime.

© 2026 Mahamudul Hasan Rubel. All rights reserved.

Built with using Next.js 16 & Tailwind v4

Back to Blog
Lesson 24 of the Advanced AI/ML: Deep Learning, LLMs & Production Systems course
AI/MLJune 28, 20264 min read

Agentic Tool Use and Function Calling in Production

Master Agentic Tool Use and Function Calling to turn LLMs into active systems. Learn to define schemas, implement execution loops, and parse outputs safely.

AgenticTool UseFunction CallingLLMsProductionPythonaimachine-learning

Previously in this course, we covered Context Management and Windowing to ensure our models have the right information. While RAG allows an LLM to read data, Agentic Tool Use allows it to act on that data. By implementing Function Calling, you transform your model from a passive text generator into an autonomous system capable of querying databases, triggering APIs, or performing calculations.

Defining Tool Schemas

To make an LLM aware of your tools, you must provide a structured specification, typically in JSON Schema. The model doesn't "see" your Python code; it sees a semantic description of what a function does and what parameters it requires.

A well-defined schema includes:

  1. Name: The identifier the model will use to invoke the tool.
  2. Description: A clear, imperative statement explaining when to use the tool.
  3. Parameters: A JSON object defining expected inputs, types, and required fields.
PYTHON
# Example: Schema for a database search tool
tool_schema = {
    "type": "function",
    "function": {
        "name": "query_inventory",
        "description": "Look up current stock levels for a specific product ID.",
        "parameters": {
            "type": "object",
            "properties": {
                "product_id": {"type": "string", "description": "The SKU of the item"},
                "warehouse_zone": {"type": "string", "enum": ["north", "south", "east"]}
            },
            "required": ["product_id"]
        }
    }
}

The quality of the description is the most critical factor in tool selection accuracy. If the model is unsure when to call the tool, it will either hallucinate a call or ignore it entirely.

Implementing Tool Execution Loops

An agentic loop is a feedback cycle: the model generates a request, your system executes it, and the result is fed back into the model to generate the final answer. This is not a linear request-response; it is a state machine.

Flow diagram: User Prompt → LLM; LLM → Tool Call Request Execution Engine; Execution Engine → Execute Function API/Database; API/Database → Observation LLM; LLM → Final Answer User Response

To implement this, you need a robust loop that captures the "tool call" state. Do not attempt to parse raw model output manually; use the provider's structured output format (e.g., OpenAI's tool_calls field).

PYTHON
def run_agent(prompt, tools):
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(model="gpt-4o", messages=messages, tools=tools)
    
    if response.choices[0].message.tool_calls:
        tool_call = response.choices[0].message.tool_calls[0]
        # Execute the function defined in your local code
        result = execute_tool(tool_call.function.name, tool_call.function.arguments)
        
        # Append tool output back to conversation history
        messages.append(response.choices[0].message)
        messages.append({"role": "tool", "content": str(result), "tool_call_id": tool_call.id})
        
        # Final generation
        return client.chat.completions.create(model="gpt-4o", messages=messages)
    return response

Handling Function Output Parsing

The most common point of failure in agentic systems is the "Argument Hallucination" problem. LLMs often attempt to pass parameters that aren't in your schema or fail to escape JSON correctly.

Best Practices for Parsing:

  • Strict Typing: Always validate the LLM's output against your schema using a library like pydantic before passing it to your function.
  • Error Reporting: If a function fails (e.g., invalid product_id), pass the error message back to the LLM as a "tool" message. Let the model attempt to correct itself.
  • Security: Never eval() or exec() model output. Map tool names to a hardcoded dictionary of allowed functions.
MethodBenefitRisk
Direct JSON ParseFast, no dependenciesBrittle, fails on malformed JSON
Pydantic ValidationType-safe, robustAdds overhead, requires schema sync
LLM-as-a-ParserFlexibleHigher latency, potential for circular loops

Hands-on Exercise

Create a "Calculator Agent."

  1. Define a schema for an add(a, b) and multiply(a, b) function.
  2. Write a Python script that accepts a user prompt like "What is 5 plus 10 multiplied by 3?".
  3. Implement the loop such that the LLM realizes it needs to call add and multiply in sequence.
  4. Constraint: If the model tries to call a function not in your list, return an error message to the model instructing it to only use provided tools.

Common Pitfalls

  1. The "Infinite Loop" Trap: An agent might get stuck calling the same tool repeatedly if the observation provided doesn't explicitly help it move toward the goal. Ensure your tool outputs are descriptive enough to signal "task complete."
  2. Context Bloat: Each tool call and observation increases the context window usage. In long-running agents, you must prune the history or summarize previous tool interactions.
  3. Implicit Assumptions: If your tool expects a date but the LLM provides it in a different format, the execution will crash. Always explicitly state the required format in the description field of your schema (e.g., "YYYY-MM-DD").

Recap

Agentic tool use requires precise schema definition and a robust, loop-based execution architecture. By treating tool outputs as state updates rather than simple responses, you create systems that can reliably interface with external infrastructure.

Up next: Chain-of-Thought and Multi-Step Reasoning — moving from single-tool calls to complex, multi-stage agentic problem solving.

Previous lessonContext Management and WindowingNext lesson Chain-of-Thought and Multi-Step Reasoning
Back to Blog

Similar Posts

AI/MLJune 26, 20264 min read

Logging and Observability for Production ML Pipelines

Master production logging and observability to track execution times and build robust audit trails for your ML pipelines. Ensure your models remain debuggable.

Read more
AI/MLJune 26, 20264 min read

Monitoring Data Drift: A Practical Guide for ML Engineers

Data drift occurs when production data shifts away from your training baseline. Learn to calculate the Population Stability Index and set up alerts to catch it.

Part of the course

Advanced AI/ML: Deep Learning, LLMs & Production Systems

advanced · Lesson 24 of 48

  1. 1

    Advanced Weight Initialization Strategies

    4 min
  2. 2

    Normalization Techniques at Scale

    3 min
  3. 3

    High-Dimensional Optimization Landscapes

    4 min
Read more
AI/MLJune 28, 20264 min read

Mixture-of-Experts (MoE) Layers: Scaling Efficiently with Sparsity

Master Mixture-of-Experts (MoE) layers to build scalable, compute-efficient LLMs. Learn to design expert routers, implement sparse layers, and balance load.

Read more
  • 4

    Residual Connections and Gradient Stability

    4 min
  • 5

    Gating Units and Activation Functions

    4 min
  • 6

    Implementing Multi-Head Attention

    4 min
  • 7

    Positional Encoding Architectures

    4 min
  • 8

    Transformer Encoder-Decoder Design

    3 min
  • 9

    Project Milestone: Custom Architecture Setup

    3 min
  • 10

    Tokenization Strategies for LLMs

    3 min
  • 11

    Scaling Laws and Compute Budgets

    4 min
  • 12

    Data Parallelism Strategies

    3 min
  • 13

    Tensor and Pipeline Parallelism

    4 min
  • 14

    Efficient Dataset Loading and Prefetching

    4 min
  • 15

    Fine-tuning Methodologies Overview

    4 min
  • 16

    Parameter-Efficient Fine-Tuning (LoRA)

    4 min
  • 17

    Quantized LoRA (QLoRA)

    4 min
  • 18

    Alignment with RLHF

    4 min
  • 19

    Direct Preference Optimization (DPO)

    4 min
  • 20

    Project Milestone: Domain-Specific Fine-Tuning

    3 min
  • 21

    Vector Databases and Similarity Search

    4 min
  • 22

    Retrieval Strategies for RAG

    3 min
  • 23

    Context Management and Windowing

    4 min
  • 24

    Agentic Tool Use and Function Calling

    4 min
  • 25

    Chain-of-Thought and Multi-Step Reasoning

    4 min
  • 26

    Self-Correction and Iterative Refinement

    4 min
  • 27

    Project Milestone: RAG and Agent Integration

    3 min
  • 28

    Post-Training Quantization (PTQ)

    4 min
  • 29

    Model Pruning Techniques

    4 min
  • 30

    Knowledge Distillation

    4 min
  • 31

    Optimized Inference Runtimes (vLLM)

    4 min
  • 32

    TensorRT-LLM for High-Performance Serving

    3 min
  • 33

    ONNX Runtime for Cross-Platform Inference

    3 min
  • 34

    Project Milestone: Inference Optimization

    3 min
  • 35

    CI/CD for ML (MLOps)

    4 min
  • 36

    Continuous Training (CT) Pipelines

    4 min
  • 37

    Observability and Logging

    4 min
  • 38

    Drift Detection and Data Monitoring

    4 min
  • 39

    LLM-as-a-Judge for Evaluation

    4 min
  • 40

    Scaling Deployments with Kubernetes

    4 min
  • 41

    GPU Resource Allocation and Scheduling

    3 min
  • 42

    Project Milestone: Production Deployment

    3 min
  • 43

    Advanced Activation Checkpointing

    4 min
  • 44

    Mixed Precision Training (FP8/BF16)

    4 min
  • 45

    Distributed Optimizer States

    4 min
  • 46

    Gradient Accumulation and Batch Sizing

    4 min
  • 47

    Multi-Modal Model Architectures

    4 min
  • 48

    Mixture-of-Experts (MoE) Layers

    4 min
  • View full course