OpenTelemetry, Kubernetes observability, and distributed tracing made simple. Learn to collect traces, metrics, and logs in one pipeline with ready‑to‑use configs and code samples.
Keywords: OpenTelemetry, Kubernetes observability, distributed tracing, metrics collection, logging integration
I spent two years juggling Jaeger, Prometheus, and Fluent Bit separately. The context switch cost me time and introduced gaps—some requests never showed up in the trace graph, metrics lagged, and logs were hard to correlate. When the OpenTelemetry project hit v1.26.0 (Oct 2023) with stable collector binaries, I finally had a single, vendor‑agnostic data plane.
In this post I’ll walk you through a production‑ready setup:
You can copy the snippets, drop them into a cluster running Kubernetes 1.27, and be up and running in under an hour.
For Go services I use go.opentelemetry.io/otel v1.20.0. It supports automatic instrumentation for HTTP, gRPC, and database drivers. If you’re on Java, the equivalent is io.opentelemetry:opentelemetry-sdk 1.32.0.
Gopackage main import ( "context" "log" "net/http" "go.opentelemetry.io/otel" "go.opentelemetry.io/otel/attribute" "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc" "go.opentelemetry.io/otel/sdk/resource" sdktrace "go.opentelemetry.io/otel/sdk/trace" semconv "go.opentelemetry.io/otel/semconv/v1.22.0" ) func initTracer() func(context.Context) error { // Collector endpoint is injected via env var OTEL_EXPORTER_OTLP_ENDPOINT exporter, err := otlptracegrpc.New(context.Background()) if err != nil { log.Fatalf("failed to create exporter: %v", err) } r := resource.NewWithAttributes( semconv.SchemaURL, semconv.ServiceNameKey.String("orders-api"), attribute.String("environment", "prod"), ) bsp := sdktrace.NewBatchSpanProcessor(exporter) tp := sdktrace.NewTracerProvider( sdktrace.WithResource(r), sdktrace.WithSpanProcessor(bsp), ) otel.SetTracerProvider(tp) return tp.Shutdown } func main() { shutdown := initTracer() defer func() { if err := shutdown(context.Background()); err != nil { log.Fatalf("tracer shutdown error: %v", err) } }() mux := http.NewServeMux() mux.HandleFunc("/order", func(w http.ResponseWriter, r *http.Request) { ctx, span := otel.Tracer("orders-handler").Start(r.Context(), "CreateOrder") defer span.End() // Business logic here… _ = ctx w.WriteHeader(http.StatusCreated) }) http.ListenAndServe(":8080", mux) }
Key points
OTEL_EXPORTER_OTLP_ENDPOINT, which the collector will expose on localhost:4317.resource to tag every span with service name and environment—critical for filtering in Jaeger.Goimport ( "go.opentelemetry.io/otel/metric" "go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc" sdkmetric "go.opentelemetry.io/otel/sdk/metric" ) func initMeter() func(context.Context) error { exp, _ := otlpmetricgrpc.New(context.Background()) provider := sdkmetric.NewMeterProvider( sdkmetric.WithReader(sdkmetric.NewPeriodicReader(exp, sdkmetric.WithInterval(15*time.Second))), sdkmetric.WithResource(r), ) otel.SetMeterProvider(provider) return provider.Shutdown }
Define a counter:
Govar ordersCreated = otel.Meter("orders-api").Int64Counter("orders_created_total") ordersCreated.Add(ctx, 1, attribute.String("status", "success"))
OpenTelemetry doesn’t ship a logger yet, but you can inject trace IDs into logs. Using logrus v1.9.0:
Goimport ( "github.com/sirupsen/logrus" "go.opentelemetry.io/otel/trace" ) func logWithTrace(ctx context.Context, msg string) { span := trace.SpanFromContext(ctx) fields := logrus.Fields{ "trace_id": span.SpanContext().TraceID().String(), "span_id": span.SpanContext().SpanID().String(), } logrus.WithFields(fields).Info(msg) }
Now every log line carries the trace context, making correlation in Loki trivial.
I run otel/opentelemetry-collector-contrib:0.103.0. The “contrib” build includes receivers for Prometheus, Loki, and Jaeger.
YAMLapiVersion: apps/v1 kind: DaemonSet metadata: name: otel-collector namespace: observability spec: selector: matchLabels: app: otel-collector template: metadata: labels: app: otel-collector spec: serviceAccountName: otel-collector-sa containers: - name: otel-collector image: otel/opentelemetry-collector-contrib:0.103.0 args: ["--config=/etc/otel-collector-config.yaml"] ports: - containerPort: 4317 # OTLP gRPC - containerPort: 55679 # Prometheus metrics (collector health) volumeMounts: - name: config mountPath: /etc/otel-collector-config.yaml subPath: otel-collector-config.yaml volumes: - name: config configMap: name: otel-collector-config
YAMLapiVersion: v1 kind: ConfigMap metadata: name: otel-collector-config namespace: observability data: otel-collector-config.yaml: | receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 prometheus: config: scrape_configs: - job_name: 'kubernetes-nodes' kubernetes_sd_configs: - role: node relabel_configs: - source_labels: [__address__] regex: '(.*):10250' target_label: __address__ replacement: '${1}:10250' filelog: include: - /var/log/pods/**/*.log operators: - type: json_parser timestamp: parse_from: attributes.time severity: parse_from: attributes.severity processors: batch: timeout: 5s send_batch_max_size: 1024 memory_limiter: check_interval: 1s limit_mib: 400 spike_limit_mib: 100 exporters: jaeger: endpoint: jaeger-collector.observability.svc:14250 tls: insecure: true prometheus: endpoint: "0.0.0.0:8888" loki: endpoint: http://loki.observability.svc:3100/api/prom/push service: pipelines: traces: receivers: [otlp] processors: [batch, memory_limiter] exporters: [jaeger] metrics: receivers: [otlp, prometheus] processors: [batch, memory_limiter] exporters: [prometheus] logs: receivers: [filelog] processors: [batch, memory_limiter] exporters: [loki]
Why this shape?
/var/log/pods/**).YAMLapiVersion: v1 kind: ServiceAccount metadata: name: otel-collector-sa namespace: observability --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: otel-collector-role rules: - apiGroups: [""] resources: ["pods", "nodes"] verbs: ["get", "list", "watch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: otel-collector-binding subjects: - kind: ServiceAccount name: otel-collector-sa namespace: observability roleRef: kind: ClusterRole name: otel-collector-role apiGroup: rbac.authorization.k8s.io
Apply everything:
Bashkubectl apply -f rbac.yaml -f collector-config.yaml -f collector-daemonset.yaml
You’ll see the collector pods in observability namespace, each exposing 4317/tcp.
Bashhelm repo add jaegertracing https://jaegertracing.github.io/helm-charts helm install jaeger jaegertracing/jaeger \ --namespace observability \ --set collector.service.enabled=true \ --set query.service.type=LoadBalancer \ --set collector.image.tag=1.53.0
The collector endpoint we referenced earlier (jaeger-collector.observability.svc:14250) resolves automatically.
Bashhelm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm install prometheus prometheus-community/kube-prometheus-stack \ --namespace observability \ --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false
Add a Grafana dashboard that queries the /metrics endpoint on the collector (otel-collector:8888). A ready‑made dashboard ID 1860 (OpenTelemetry Collector) visualizes spans per minute, CPU usage, and queue depth.
Bashhelm repo add grafana https://grafana.github.io/helm-charts helm install loki grafana/loki-stack \ --namespace observability \ --set loki.image.tag=2.9.2 \ --set promtail.enabled=false
In Grafana, create a Loki data source pointing at http://loki.observability.svc:3100. You can now search logs by trace_id thanks to the fields we injected.
curl -X POST http://orders-api.prod.svc.cluster.local:8080/order.http://<lb-ip>:16686, look for service orders-api. You should see a span tree with HTTP, DB, and custom CreateOrder spans.otelcol_exporter_sent_spans increments.{trace_id="*"} |~ "CreateOrder" to see logs tied to the same request.If any step fails, start with kubectl logs -n observability ds/otel-collector and look for pipeline errors. Common gotchas:
ConfigMap or DownwardAPI env var in the pod spec.max_batch_size in the Loki exporter if you see “request body too large”.| Concern | Recommended Setting |
|---|---|
| CPU limits | resources: limits: cpu: "500m" requests: cpu: "200m" per collector container |
| Memory | Keep memory_limiter limit at 400 MiB (see config) and monitor process_resident_memory_bytes |
| TLS | Switch OTLP exporter to mTLS (tls: {cert_file: /certs/client.crt, key_file: /certs/client.key}) and enable tls on Jaeger/Loki |
| Sampling | Add sampler processor in the trace pipeline: type: parentbased_traceidratio with ratio: 0.2 to cut 80 % of traffic in high‑load clusters |
| Autoscaling | If you prefer a Deployment over DaemonSet, enable HPA based on otelcol_exporter_sent_spans metric |
Implementing OpenTelemetry in Kubernetes isn’t a “set‑and‑forget” task, but once the pipeline runs, you gain a unified view that cuts mean‑time‑to‑detect (MTTD) by half in my experience. Give it a try, tweak the processors to match your traffic, and watch your observability maturity climb.
Next steps
k8s.pod.name, k8s.namespace.name).Happy tracing!
If you liked this guide, subscribe for more hands‑on DevOps deep dives.