Mahamudul Hasan Rubel
HomeBlogCoursesAboutProjectsSkillsExperiencePhotosContact
Mahamudul Hasan Rubel

Senior Software Engineer crafting high-performance web applications and SaaS platforms.

Navigation

  • Home
  • Blog
  • Courses
  • About
  • Projects
  • Skills
  • Experience
  • Photos
  • Contact

Get in Touch

Available for senior/lead roles and consulting.

bd.mhrubel@gmail.comHire Me

© 2026 Mahamudul Hasan Rubel. All rights reserved.

Built with using Next.js 16 & Tailwind v4

Back to Blog
Lesson 44 of the Intermediate Machine Learning: Real-World Pipelines course
AI/MLJune 26, 20264 min read

Logging and Observability for Production ML Pipelines

Master production logging and observability to track execution times and build robust audit trails for your ML pipelines. Ensure your models remain debuggable.

MLOpsLoggingObservabilityProductionPythonPipelineaimachine-learning

Previously in this course, we explored Tracking Performance Degradation in Production ML Pipelines to identify when models fail silently. While that lesson focused on metrics, this lesson adds the "how-to" of system-level visibility: implementing comprehensive logging and observability to ensure every inference request is traceable and every pipeline bottleneck is visible.

In production, silence is not golden—it’s a liability. If a model starts returning unexpected results or latency spikes, you need structured logs to reconstruct the state of the world at that exact moment.

The Principles of Production Observability

Observability in MLOps isn't just about printing statements to the console. It is the practice of emitting high-cardinality, structured data that allows you to ask arbitrary questions about your system's internal state. For ML pipelines, this breaks down into three pillars:

  1. Standardized Logging: Every pipeline component must emit logs in a machine-readable format (JSON).
  2. Performance Tracing: You must record the wall-clock time of every transform and inference step.
  3. Audit Trails: Every prediction must be logged with its corresponding input features, model version, and timestamp.

Implementing Structured Logging

Avoid print() statements. They lack timestamps, severity levels, and structured context. Instead, use Python’s logging library configured to output JSON. This allows tools like Datadog, ELK, or CloudWatch to parse your logs automatically.

PYTHON
import logging
import json
import time
from datetime import datetime

# Configure a structured JSON logger
def get_logger(name="ml_pipeline"):
    logger = logging.getLogger(name)
    handler = logging.StreamHandler()
    
    class JsonFormatter(logging.Formatter):
        def format(self, record):
            log_record = {
                "timestamp": datetime.utcnow().isoformat(),
                "level": record.levelname,
                "message": record.getMessage(),
                "module": record.module
            }
            # Add extra context if provided
            if hasattr(record, "extra_data"):
                log_record.update(record.extra_data)
            return json.dumps(log_record)
            
    handler.setFormatter(JsonFormatter())
    logger.addHandler(handler)
    logger.setLevel(logging.INFO)
    return logger

logger = get_logger()

Tracking Execution Times

To identify bottlenecks, we need a decorator that wraps our pipeline steps. This ensures we don't pollute our business logic with timing code.

PYTHON
def track_time(func):
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        duration = time.perf_counter() - start
        logger.info(f"Execution of {func.__name__} completed", 
                    extra={"extra_data": {"duration_seconds": round(duration, 4)}})
        return result
    return wrapper

@track_time
def run_inference(input_data):
    # Simulate inference logic
    time.sleep(0.12)
    return {"prediction": 0.85}

Storing Audit Trails for Predictions

An audit trail is your "black box" flight recorder. In a production environment, you should never just return a prediction to the user; you must log the request, the prediction, and the metadata (model ID, feature version) to a persistent store or a dedicated log stream.

For our project, we will append a logging step to our prediction function:

PYTHON
def log_prediction(input_features, prediction, model_version):
    audit_log = {
        "event": "prediction_audit",
        "input_features": input_features,
        "prediction": prediction,
        "model_version": model_version,
        "timestamp": datetime.utcnow().isoformat()
    }
    # In production, send this to a database or a structured log aggregator
    logger.info("Prediction generated", extra={"extra_data": audit_log})

# Example usage
features = {"age": 30, "income": 50000}
pred = run_inference(features)
log_prediction(features, pred, model_version="v1.2.0")

Hands-on Exercise

Modify your existing inference script to include a try-except block within the logging decorator. If the model fails, log the error with the severity ERROR and capture the input_features that caused the crash. This is the first step in Monitoring Data Drift: A Practical Guide for ML Engineers, as you'll eventually need to analyze these failures to detect if they correlate with specific data segments.

Common Pitfalls

  • Logging Sensitive Data: Never log PII (Personally Identifiable Information). If your input features contain names or emails, mask them before calling the logger.
  • Log Verbosity: Logging every single intermediate array in a transformer will destroy your I/O performance and inflate storage costs. Log metadata, not raw data blobs.
  • Blocking Calls: If your logging implementation writes to a network socket synchronously, you will introduce latency into your prediction path. Use non-blocking handlers or log to stdout and let a sidecar process (like Fluentd or Vector) handle the shipping.

Recap

Effective logging and observability are what separate a "notebook model" from a reliable production service. By standardizing your logs into JSON, wrapping execution steps with timing decorators, and maintaining a strict audit trail, you ensure that when the system fails—and it will—you have the data required to perform a post-mortem.

Up next: Automated Retraining Triggers. We will take these logs and turn them into actionable signals that force your pipeline to retrain when performance dips below a threshold.

Previous lessonTracking Performance DegradationNext lesson Automated Retraining Triggers
Back to Blog

Similar Posts

AI/MLJune 26, 20264 min read

Monitoring Data Drift: A Practical Guide for ML Engineers

Data drift occurs when production data shifts away from your training baseline. Learn to calculate the Population Stability Index and set up alerts to catch it.

Read more
AI/MLJune 26, 20263 min read

Project Milestone: Deployment Readiness for ML Pipelines

Learn how to finalize your ML pipeline for production. We cover final validation, dependency locking, and operational readiness for a seamless deployment.

Part of the course

Intermediate Machine Learning: Real-World Pipelines

intermediate · Lesson 44 of 49

  1. 1

    Pipeline Architecture Essentials

    4 min
  2. 2

    ColumnTransformer for Heterogeneous Data

    3 min
  3. 3

    Custom Transformers for Feature Engineering

    3 min
Read more
AI/MLJune 26, 20263 min read

Handling Environment Parity: Ensuring ML Pipeline Consistency

Master environment parity in your ML pipelines. Learn how to use virtual environments, containerization, and secure config management to avoid deployment drift.

Read more
  • 4

    Handling Missing Values Strategically

    4 min
  • 5

    Scaling and Normalization Pipelines

    3 min
  • 6

    Encoding Categorical Variables

    3 min
  • 7

    Feature Selection in Pipelines

    3 min
  • 8

    Data Leakage Prevention Strategies

    4 min
  • 9

    Designing Reproducible Pipelines

    3 min
  • 10

    Project Initialization: Defining the Prediction Problem

    3 min
  • 11

    Introduction to Cross-Validation

    3 min
  • 12

    Stratification for Imbalanced Data

    4 min
  • 13

    Time-Series Validation Strategies

    4 min
  • 14

    Confusion Matrices and Beyond

    4 min
  • 15

    Precision-Recall Curves

    4 min
  • 16

    ROC-AUC Analysis

    3 min
  • 17

    Cost-Sensitive Learning

    4 min
  • 18

    Handling Class Imbalance with Resampling

    3 min
  • 19

    Advanced Metrics for Imbalanced Datasets

    4 min
  • 20

    Project Milestone: Building the Baseline Pipeline

    3 min
  • 21

    Introduction to GridSearchCV

    3 min
  • 22

    RandomizedSearchCV for Efficiency

    3 min
  • 23

    Bayesian Optimization Principles

    3 min
  • 24

    Early Stopping in Iterative Models

    4 min
  • 25

    Managing Computational Resources

    3 min
  • 26

    Hyperparameter Stability Analysis

    4 min
  • 27

    Pipeline Parameter Nesting

    3 min
  • 28

    Project Milestone: Tuning the Champion Model

    3 min
  • 29

    Baseline-to-Champion Framework

    3 min
  • 30

    Statistical Significance in Model Comparison

    3 min
  • 31

    Model Ensembling: Voting and Averaging

    3 min
  • 32

    Stacking Architectures

    4 min
  • 33

    Blending Techniques

    4 min
  • 34

    Interpreting Complex Ensembles

    3 min
  • 35

    Managing Model Complexity

    3 min
  • 36

    Bias-Variance Tradeoff in Ensembles

    4 min
  • 37

    Project Milestone: The Ensemble Strategy

    3 min
  • 38

    Serializing Pipelines with Joblib

    4 min
  • 39

    Versioning Models and Data

    3 min
  • 40

    Designing Inference APIs

    3 min
  • 41

    Input Validation and Schema Enforcement

    4 min
  • 42

    Monitoring Data Drift

    4 min
  • 43

    Tracking Performance Degradation

    3 min
  • 44

    Logging and Observability

    4 min
  • 45

    Automated Retraining Triggers

    4 min
  • 46

    Containerization Basics

    4 min
  • 47

    Handling Environment Parity

    3 min
  • 48

    Documentation for Production

    4 min
  • 49

    Project Milestone: Deployment Readiness

    3 min
  • View full course