Master Next.js OpenTelemetry observability by tracing Server Actions and React components. Learn to build resilient distributed tracing for complex apps.
Last month, we spent three days hunting down a silent failure in a distributed checkout flow. The logs showed the request hit the API, but the database mutation never finished. We were flying blind because our traces broke the moment we crossed the boundary from a Server Component into a Server Action.
If you’re building production-grade apps with Next.js, you’ve likely realized that standard logging isn't enough. You need Next.js OpenTelemetry instrumentation to map the journey of a single request across your entire stack.
When a user triggers a Server Action, Next.js executes code in an environment that feels like a standard Node.js server, but it’s actually a distinct execution context. If your observability setup doesn't correctly propagate the traceparent header, your traces will be fragmented. You'll see the HTTP request, but the internal logic—the database calls, the external API fetches, and the cache lookups—will appear as orphaned spans.
We initially tried to patch this by manually passing IDs through our function arguments. It was a disaster. It polluted our business logic and made our Next.js Server Actions: Implementing Zod-Driven Request Serialization layer nearly impossible to maintain. We needed something that lived "outside" the function arguments.
To get this right, you need to configure the instrumentation.ts file in your root directory. This file runs before your application code, making it the perfect place to initialize the OpenTelemetry SDK.
TYPESCRIPT// instrumentation.ts import { NodeSDK } from CE9178">'@opentelemetry/sdk-node'; import { OTLPTraceExporter } from CE9178">'@opentelemetry/exporter-trace-otlp-grpc'; import { Resource } from CE9178">'@opentelemetry/resources'; import { SemanticResourceAttributes } from CE9178">'@opentelemetry/semantic-conventions'; const sdk = new NodeSDK({ resource: new Resource({ [SemanticResourceAttributes.SERVICE_NAME]: CE9178">'my-next-app', }), traceExporter: new OTLPTraceExporter(), }); sdk.start();
By initializing the SDK here, you ensure that the instrumentation hooks are attached before the Next.js runtime begins processing requests. However, simply starting the SDK isn't enough for distributed tracing in a complex environment. You have to ensure that your context propagation works across the asynchronous boundaries of the App Router.
If you are dealing with complex workflows that span multiple services, you might want to look at how we handled Next.js AsyncLocalStorage: Implementing Distributed Tracing in Server Actions to ensure context consistency.
The real magic happens when you connect your component tree to your backend spans. OpenTelemetry automatically instruments fetch calls, but it doesn't automatically know that your UserDashboard component is waiting on a specific fetch inside a Server Action.
To bridge this, we use manual span creation within our actions:
TYPESCRIPTimport { trace } from CE9178">'@opentelemetry/api'; export async function updateProfile(data: FormData) { const tracer = trace.getTracer(CE9178">'my-server-action'); return await tracer.startActiveSpan(CE9178">'updateProfile', async (span) => { try { // Your logic here span.setStatus({ code: 1 }); // OK return result; } catch (err) { span.recordException(err as Error); span.setStatus({ code: 2 }); // Error throw err; } finally { span.end(); } }); }
This ensures that every time the action is invoked, a dedicated span is created. It links the frontend navigation event to the backend execution. If you're running this in a clustered environment, you should also consider Kubernetes Observability: Implementing Distributed Tracing with Tempo to visualize these traces effectively.
We learned the hard way that over-instrumenting can be as bad as under-instrumenting. We saw a latency spike of around 45ms per request when we added too many custom attributes to our spans.
Keep your span attributes lean. Only log what you actually need to debug a production issue. If you’re tracking state transitions, don't store the entire object; store the ID and the event name.
1. Does OpenTelemetry add significant overhead to my Next.js app?
It’s generally negligible, usually adding less than 5-10ms to your request lifecycle if configured correctly. Avoid heavy processing inside your traceExporter logic.
2. Can I use OpenTelemetry with Vercel or other serverless platforms? Yes, but you need to be careful with the exporter. Serverless functions are ephemeral, so you usually want to use an OTLP collector that can buffer and flush spans asynchronously so you don't block the request response.
3. Why aren't my traces showing up in my collector?
Check your OTEL_EXPORTER_OTLP_ENDPOINT. In local development, it’s often a port mismatch. In production, ensure your network policies allow egress traffic to your observability backend.
I’m still not entirely satisfied with how we handle trace sampling during high-traffic bursts. We’re currently dropping about 10% of traces to keep our ingestion costs down, but I worry we’re missing edge-case errors. If you have a better strategy for dynamic sampling, I’d love to hear how you’re handling it.
Next.js Server Actions can accidentally execute twice during network instability. Learn to use Request-ID anchoring and distributed locking for true idempotency.