Learn to implement structured logging and track request latency for production ML systems. Master MLOps observability to catch failures before they scale.
Previously in this course, we covered CI/CD for ML pipelines, establishing the foundation for automated deployments. However, a deployment is not the end of the lifecycle. In production, "it works on my machine" is a dangerous fallacy. You need Observability—the ability to infer the internal state of your system based on its external outputs—to ensure your model is performing as expected.
While Logging and Observability for Production ML Pipelines covers the high-level strategy, this lesson focuses on the implementation: how to instrument your code to track request latency and error rates, transforming raw logs into actionable intelligence.
Standard print statements or unstructured log strings are useless in production. They require expensive parsing, making it nearly impossible to filter for specific request IDs or model versions. Structured logging (JSON format) turns logs into queryable data.
Instead of print(f"Request {id} took {time}s"), emit a JSON object:
PYTHONimport structlog # Standard for production structured logging logger = structlog.get_logger() def log_inference_event(request_id, model_version, latency, status_code): logger.info("inference_completed", request_id=request_id, model_version=model_version, latency_seconds=latency, status=status_code )
By standardizing your schema, you can immediately run queries in tools like Datadog, ELK, or CloudWatch to calculate the p99 latency of specific model versions across your entire fleet.
In MLOps, latency is often a proxy for hardware saturation. If your latency spikes, it’s usually due to GPU memory pressure or inefficient batching. To track this, wrap your inference calls in a context manager that records the duration and captures exceptions.
PYTHONimport time from functools import wraps def track_performance(func): @wraps(func) def wrapper(*args, **kwargs): start = time.perf_counter() try: result = func(*args, **kwargs) status = "success" return result except Exception as e: status = "error" logger.error("inference_failed", error=str(e)) raise e finally: latency = time.perf_counter() - start logger.info("request_metrics", latency=latency, status=status) return wrapper @track_performance def run_model_inference(input_data): # Your model call here pass
This decorator provides a non-intrusive way to gather telemetry. It ensures that every request is accounted for, providing the data necessary to monitor production performance effectively.
Raw logs are for debugging; dashboards are for observability. A production-ready dashboard for an LLM application should prioritize four key metrics:
| Metric | Business Impact | Infrastructure Signal |
|---|---|---|
| p99 Latency | User churn | GPU/Memory bottleneck |
| Error Rate | Service downtime | Model/Dependency failure |
| RPS | Capacity planning | Traffic spikes/DDoS |
| Token Count | Operational cost | Prompt complexity |
For our running project, add a custom logging middleware to your serving layer (e.g., FastAPI or Flask).
logger.py that configures structlog to output JSON.track_performance decorator shown above.correlation_id is passed through all of them. Without it, you cannot trace a single request's lifecycle.Fluentd or Logstash) to ship logs out-of-band.We have moved beyond simple print statements to structured JSON logging, implemented decorators for automated latency tracking, and defined the essential metrics for our production dashboard. These practices are the prerequisite to detecting the silent model failures that plague unmonitored systems.
Up next: We will dive into Drift Detection and Data Monitoring, where we’ll learn to identify when your production input data no longer matches the distribution your model was trained on.
Master production logging and observability to track execution times and build robust audit trails for your ML pipelines. Ensure your models remain debuggable.
Read moreMaster production monitoring for ML. Learn to design effective health checks, track performance metrics, and build alerts to catch silent model failures.
Observability and Logging