API performance depends on managing tail latency. Learn how to implement request hedging to fire redundant requests and keep your distributed system fast.

During an on-call rotation last year, we noticed our user-facing dashboard was periodically hanging for 5% of our traffic. It wasn't a total outage, but the p99 latency spiked to nearly 1.5 seconds, causing frustration across our team. We were already using API rate limiting at the edge to protect our backend, but that didn't help when a single node in our cluster started garbage collecting or hit a noisy neighbor issue.
That’s when we turned to request hedging. It’s a simple concept: if you don’t get a response from your primary request within a specific threshold, you fire an identical, redundant request to a different node. You take whichever response comes back first and discard the other.
In a distributed system, your overall response time is only as fast as your slowest dependency. If you have a chain of five services, and each has a 1% chance of hitting a slow tail-latency event, your effective failure rate compounds quickly. Request hedging targets those specific "hiccups" rather than systemic failures.
We initially tried simply increasing our timeout values. That was a mistake. All it did was make the user wait longer before seeing an error. We also considered aggressive retries, but retries are dangerous—they often trigger a "retry storm" that can take down a struggling service. Hedging is different because you aren't waiting for a failure; you're proactively racing for a faster success.

To implement this, you need a client-side strategy that is both aware of the p95 latency of your service and careful not to double your load during a genuine outage.
Don't hedge every request. That’s a recipe for doubling your traffic under normal conditions. Instead, set your hedge threshold to your p95 or p99 latency. If your service typically responds in 100ms, set your hedge timer to 150ms.
Here is a simplified pattern using a Go-like concurrency model. You start the primary request, wait for a timer, and if the timer expires, you spawn the second request.
Gofunc fetchWithHedging(ctx context.Context, url string) (*Response, error) { // Primary request ch := make(chan *Response, 2) go func() { ch <- sendRequest(ctx, url) }() // Hedging timer (e.g., 150ms) select { case res := <-ch: return res, nil case <-time.After(150 * time.Millisecond): // Fire the hedge go func() { ch <- sendRequest(ctx, url) }() } // Wait for the first success return <-ch, nil }
The biggest trade-off is the extra load. If you hedge at the p95, you are effectively adding 5% more traffic to your backend. You must ensure your services are provisioned to handle that extra overhead.
If your service is already running at 80% CPU utilization, hedging will likely push it over the edge. Before enabling this, I suggest checking your LLM caching strategies or other data-fetching layers to ensure you aren't wasting resources that could be cached instead.
Never use request hedging for non-idempotent operations. If your API endpoint performs a POST that creates a record or charges a credit card, hedging will result in duplicate side effects unless you have strict Laravel API integration idempotency logic in place.
Even with idempotency keys, you’re adding complexity. Reserve hedging for read-heavy operations—like fetching user profiles, configuration settings, or catalog data—where the cost of a duplicate request is negligible.

We saw our p99 latency drop from 1.5s to about 300ms after implementing this. It wasn't a silver bullet; it just smoothed out the noise.
One thing I'd do differently next time? I’d implement a circuit breaker that disables hedging if the backend starts returning 5xx errors. We once had a scenario where the backend was failing intermittently, and our hedging logic just sent twice as many requests to a service that was already dying. That taught me that hedging is for latency spikes, not for service recovery.
Q: Does hedging increase the likelihood of hitting rate limits? A: Yes. If your API gateway is configured to limit requests based on client IP, your hedged requests count toward that limit. You may need to adjust your bucket size or use a more granular rate-limiting strategy.
Q: Should I hedge every service in a chain? A: No. Hedging at every layer creates an exponential explosion of requests. Hedge only at the edge or the outermost layer of your request chain.
Q: Is there a tool that does this automatically? A: Many service meshes like Istio or Linkerd support request hedging natively in their configuration. If you’re using a service mesh, look into their "max_retries" or "hedging" policy settings before writing your own implementation.
Request hedging is a high-leverage move, but it requires operational maturity. Start small, monitor your backend CPU, and ensure your endpoints are idempotent. When done right, it’s one of the most effective ways to provide a consistent experience to your users, regardless of the underlying infrastructure’s occasional bad day.
REST API design is often cluttered by versioned URLs. Learn how to use content negotiation to manage API versioning effectively and keep your code clean.