DatabasesJune 22, 20264 min read

Database performance: Preventing cache stampedes with request coalescing

Database performance drops when cache stampedes hit. Learn how to implement request coalescing to collapse concurrent queries and protect your backend.

database performancebackend architecturegolangcachingconcurrencyPostgreSQLMySQLRedisDatabase

Last month, our primary dashboard API started timing out whenever a popular marketing email went out. We had a solid cache-aside strategy in place, but that wasn't enough when the cache expired for a high-traffic key—hundreds of requests hit the database simultaneously, effectively creating a self-inflicted denial of service.

If you’ve ever stared at a spike in RDS CPU utilization while your application latency climbed to several seconds, you’ve likely dealt with a cache stampede. While we often look at Database caching: Mastering the Cache-Aside Pattern for Scale to solve read pressure, sometimes the cache itself becomes the source of the problem. When a key expires, the "thundering herd" of requests all decide to re-fetch the data from the source of truth at the same time.

Understanding the Stampede

A cache stampede happens when a resource is expensive to compute or fetch. When the cache entry expires, the first request sees a miss and starts the work. Before that first request finishes and updates the cache, the second, third, and hundredth requests arrive, see the same miss, and all trigger the same heavy query.

We initially tried just increasing our TTLs, but that didn't solve the long-term issue of stale data. We then looked into Redis Caching Patterns That Prevent Stampedes in Production, specifically using distributed locks. While locks work, they add complexity and can fail if the process holding the lock crashes. That’s when I started looking deeper into request coalescing—collapsing those redundant queries into a single execution at the database level or application layer.

Implementing Request Coalescing

Request coalescing is the practice of ensuring that for any given set of parameters, only one execution of a query is "in flight" at any time. If other requests arrive while the first is pending, they should wait for the result of that initial call rather than starting their own.

In a Go-based microservice environment, I’ve found the singleflight package to be the gold standard. It’s elegant and handles the edge cases of concurrency better than I ever could manually. Here is how I structured it:


Go
import "golang.org/x/sync/singleflight"

var g singleflight.Group

func GetProductData(id string) (Data, error) {
    // The key ensures we only coalesce identical requests
    v, err, _ := g.Do(id, func() (interface{}, error) {
        return db.Query("SELECT * FROM products WHERE id = ?", id)
    })
    return v.(Data), err
}

By using g.Do, if ten requests come in for product:123, the first one triggers the database query. The other nine block until the first one returns. Once it does, the result is shared among all callers. This single change reduced our p99 latency during cache expiration events from around 3 seconds down to roughly 280ms.

When to Use This Strategy

You shouldn't apply coalescing to every single query. If a query is cheap—like a primary key lookup on a tiny table—the overhead of managing the singleflight group might actually slow you down. I reserve this for:

Heavy Aggregations: Reports or complex JOINs that take longer than 100ms to compute.
High-Concurrency Keys: Data that is requested by thousands of concurrent users simultaneously.
External API calls: If your database query involves calling a third-party service, you definitely want to coalesce those to avoid rate-limiting.

The Trade-offs of Query Optimization

While request coalescing improves database performance, it doesn't make your queries faster; it just makes them fewer. If your query is fundamentally broken—missing an index or performing a full table scan—coalescing just makes the "thundering herd" wait for a slightly shorter, but still slow, duration.

Always pair this with a deep dive into your EXPLAIN plans. If you are still seeing high latency, check out Database performance: Asynchronous Materialized Views for High-Load Reads to see if you can offload those reads entirely.

Common Questions

Does coalescing introduce latency? It adds a negligible amount of overhead for managing the function call, but it prevents the massive latency spikes caused by database contention.

What happens if the primary query fails? With singleflight, the error is returned to all callers simultaneously. You need to ensure your error handling is robust enough to handle a scenario where the shared query fails.

Can I use this across multiple server instances? No, singleflight works within a single process. If you have a distributed system, you would need a distributed locking mechanism or a more sophisticated queue-based approach to collapse requests across nodes.

Final Thoughts

I'm still experimenting with how to combine this with proactive cache warming. Right now, I'm waiting for a miss to trigger the coalescing, but I’d prefer to refresh the cache in the background before it expires. However, for a quick win that stabilizes a shaky system, request coalescing is one of the most effective tools I've added to my backend architecture toolkit. It’s not a silver bullet, but it keeps the database alive long enough for me to fix the underlying query performance issues.

Back to Blog

Database performance: Preventing cache stampedes with request coalescing

Understanding the Stampede

Implementing Request Coalescing

When to Use This Strategy

The Trade-offs of Query Optimization

Common Questions

Final Thoughts

Similar Posts

Database queueing with SELECT FOR UPDATE SKIP LOCKED

Database performance: Asynchronous Materialized Views for High-Load Reads

Database schema optimization: Indexed Generated Columns for JSONB