Master Kubernetes observability by implementing distributed tracing with OpenTelemetry and Grafana Tempo. Follow this guide to debug microservices faster.
If you’ve ever stared at a 500 error in a microservices architecture and wondered which service dropped the ball, you know the pain of "distributed debugging." Logs are great, but they don't tell the whole story. To get the full picture, you need distributed tracing.
In this post, I’ll show you how to implement Kubernetes observability using OpenTelemetry and Grafana Tempo. We’ll move from scattered logs to a unified view of every request traversing your cluster.
I’ve spent years wrestling with proprietary agents, and the shift toward vendor-neutral standards has been a relief. OpenTelemetry (OTel) is now the industry standard for collecting traces, metrics, and logs. Grafana Tempo is my go-to backend because it’s cost-effective—it stores traces in object storage (S3/GCS) rather than an expensive, high-memory database.
First, we need a central point to receive, process, and export our trace data. I prefer using the OTel Collector as a Deployment in Kubernetes.
Here’s a simplified configuration for your otel-collector-config.yaml:
YAMLreceivers: otlp: protocols: grpc: http: exporters: otlp: endpoint: tempo:4317 tls: insecure: true service: pipelines: traces: receivers: [otlp] exporters: [otlp]
Deploy this as a service within your cluster. Your application pods will send their trace spans to this collector, which then forwards them to Tempo.
You don't want to manually write tracing code for every function. That’s a fast track to burnout. Instead, use the OpenTelemetry auto-instrumentation libraries.
If you’re running a Go service, it’s as simple as importing the SDK:
Goimport ( "go.opentelemetry.io/otel" "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc" ) // Configure the exporter to point to your OTel Collector service exporter, _ := otlptracegrpc.New(ctx, otlptracegrpc.WithEndpoint("otel-collector:4317"))
For Java or Python, use the OTel agent or auto-instrumentation wrappers. This injects context propagation headers (like traceparent) into your HTTP requests automatically. This is the "magic" that links Service A to Service B.
Tempo needs to know where to put the data. If you’re on AWS, use S3. If you’re on-prem, use MinIO.
YAMLtarget_conf: storage: trace: backend: s3 s3: bucket: my-tempo-traces endpoint: s3.amazonaws.com
Once Tempo is running, add it as a Data Source in Grafana. Go to Connections > Data Sources > Add data source > Tempo. Point the URL to your Tempo query frontend service.
Now that the plumbing is in place, go to your Grafana dashboard. Open the Explore tab and select Tempo.
When you query a traceID, you’ll see a waterfall chart. You can finally see the latency breakdown:
This is cloud-native monitoring at its best. You stop guessing and start fixing.
probabilistic_sampler to keep only 5-10% of traces unless you have an infinite budget.Ingress is configured to pass the traceparent and tracestate headers.Implementing distributed tracing isn't just about adding cool graphs to your dashboard; it’s about reducing MTTR (Mean Time To Resolution). When a production incident hits, you'll be glad you spent the time setting this up.
OpenTelemetry + Kubernetes observability explained: a step‑by‑step guide to collect distributed traces, metrics, and logs from Java, Go, and Python services using Otel Collector, Prometheus, and Loki.
Read moreOpenTelemetry, Kubernetes observability, and distributed tracing made simple. Learn to collect traces, metrics, and logs in one pipeline with ready‑to‑use configs and code samples.