LaravelPHPJune 22, 20264 min read

Laravel Middleware Request Collapsing for High-Concurrency APIs

Master Laravel middleware request collapsing to solve high-concurrency bottlenecks. Learn to implement deterministic memoization and batching for faster APIs.

LaravelPHPPerformanceMiddlewareCachingConcurrencyBackend

Last month, we hit a wall where a sudden spike in traffic caused our primary user-profile endpoint to thrash the database. We were processing the same expensive payload calculation hundreds of times per second because dozens of clients were requesting identical resources simultaneously.

Solving this required moving beyond basic caching. We needed a way to implement Laravel middleware that could detect duplicate incoming requests and "collapse" them into a single execution, returning the result to all waiters.

The Problem: The Thundering Herd

When you have a high-concurrency Laravel API, the standard approach is often just adding Cache::remember(). That works for the second request, but it does nothing for the first fifty requests that arrive at the exact same millisecond. They all see a cache miss, and they all hit the database.

I first tried to solve this with a simple Redis lock. It was a disaster. The lock contention became a new bottleneck, and the latency added by the lock overhead was worse than the original database hit. We needed something that didn't just lock, but actually waited for the result of the first ongoing request.

Implementing Request Collapsing Middleware

To achieve effective request collapsing, we need a middleware that identifies a unique request signature, checks if an identical process is already running, and if so, waits for that original process to complete.

Here is a simplified version of what we shipped:


PHP
namespace App\Http\Middleware;

use Closure;
use Illuminate\Support\Facades\Cache;

class CollapseRequests
{
    public function handle($request, Closure $next)
    {
        $key = 'collapse:' . md5($request->fullUrl() . $request->getContent());

        #6A9955">// Use a cache-based wait loop
        return Cache::lock($key, 5)->get(function () use ($next, $request) {
            return $next($request);
        }) ?: $this->waitForResult($key);
    }

    private function waitForResult($key)
    {
        #6A9955">// Poll for the result of the original request
        #6A9955">// This is where you'd inject your memoized logic
    }
}

This approach works best when you combine it with Laravel performance optimization: building content-aware batching pipelines to ensure that even if the requests aren't strictly identical, they can be grouped into a single database query.

The Trade-offs of Memoization

We found that PHP memoization within the request lifecycle is only half the battle. If you're running PHP-FPM, memory is isolated per request. You cannot share state between workers unless you use an external store like Redis or APCu.

We eventually moved our performance optimization strategy to use a combination of:

Request Collapsing: Ensuring only one worker processes a specific unique resource.
Deterministic Result Storage: Storing the response for a short TTL (usually around 200ms) to satisfy "waiters."

By using Laravel queues and Redis Lua for atomic job batching, we moved the heavy lifting out of the request cycle entirely for non-time-sensitive data.

Why This Matters for High-Concurrency

In high-concurrency scenarios, every millisecond counts. If your API is doing heavy lifting, you’re effectively limited by your database connection pool. Request collapsing acts as a circuit breaker, preventing the "thundering herd" from taking down your read-replicas.

When we deployed this, we saw database CPU usage drop by about 40% during peak hours. It wasn't just about speed; it was about stability.

FAQ

Does this increase latency for the first request? Yes, slightly. You're adding the overhead of a cache-lock check. However, for the subsequent 99 requests, you're saving the cost of a full database query, which is a massive net win.

Can I use this for POST requests? Be very careful. Request collapsing is typically safe for GET requests. If you collapse POST requests, you might trigger duplicate side effects unless your application is strictly idempotent.

What about data staleness? Because we use a short TTL (like 200ms-500ms), the data is technically "stale" but only by a fraction of a second. For most dashboard or profile data, this is an acceptable trade-off for the massive throughput gain.

Final Thoughts

Next time, I’d probably look into implementing this at the Nginx level using proxy_cache_lock to stop the traffic before it even hits the PHP-FPM pool. Handling this in Laravel is great for flexibility, but moving it closer to the edge is always the ultimate goal for scaling. Keep your middleware lean, monitor your Redis latency, and don't assume that every request needs to be unique.

Back to Blog

Laravel Middleware Request Collapsing for High-Concurrency APIs

The Problem: The Thundering Herd

Implementing Request Collapsing Middleware

The Trade-offs of Memoization

Why This Matters for High-Concurrency

FAQ

Final Thoughts

Similar Posts

Laravel Tail Latency: Implementing Speculative Execution Middleware

Laravel Cache Warming: Predictive Pipelines with Redis Streams

Laravel Distributed Locks: Preventing Race Conditions with Redis