Redis caching patterns that prevent stampedes are essential for scaling. Learn how to stop the thundering herd effect and keep your backend performance stable.

Last month, our primary dashboard API started timing out every time a popular campaign went live. We were using a standard cache-aside pattern, but the moment a key expired, dozens of concurrent requests would see a cache miss and slam the database simultaneously. It wasn't just a slow query issue—it was a classic cache stampede.
If you've spent any time killing N+1 queries at the database layer: A practical guide, you know that database load is the enemy of stability. When your cache expires, that load spikes instantly. Here is how I handle these stampedes using robust Redis patterns.
A cache stampede (or "thundering herd") happens when a highly requested key expires. Multiple application threads check the cache, find it empty, and all decide to recompute the value by hitting the database at the same time.
If your database query takes 300ms, and you have 50 incoming requests, your database is suddenly doing 15 seconds of work in a fraction of a second. It’s a vicious cycle.
The most elegant fix isn't just setting longer TTLs. It’s "Probabilistic Early Recomputation." Instead of waiting for a hard expiration, you let the application decide to refresh the cache before it expires based on a probability calculation.
You store the "refresh-at" time inside the cached object itself. When a request comes in, you check if the current time is nearing the expiration. If it is, you use a probability function to decide if this specific request should trigger a background update.
RUBY# Pseudo-code logic for probabilistic refresh def get_with_early_recomputation(key) data = redis.get(key) return data unless data.needs_refresh? if rand < data.probability_threshold # Trigger async recomputation Async.run { update_cache(key) } end data.value end
By spreading out the refresh requests, you ensure that only one thread (or a small handful) hits the database, while the rest continue to serve the slightly stale—but still fast—cached data.

Sometimes, you can't afford any stale data. In those cases, I prefer a distributed lock using Redis SET NX (Set if Not Exists).
When a thread sees a cache miss, it attempts to acquire a lock for that specific key.
It’s cleaner than it sounds, but be careful with timeouts. If your database query hangs while holding the lock, you’ll block all other requests for that key. Always set a reasonable lock TTL.
While caching is a great shield, it isn't a substitute for a solid indexing strategy for app developers: Stop slow queries. If your database query is fundamentally slow, no amount of caching will save you when the cache does need to be populated.
I’ve seen engineers try to "cache away" performance problems caused by missing indexes. It works until the cache is cleared, and then the site goes down. Always ensure your database can handle the "cold start" before you rely on Redis to mask the latency.
One mistake I see often is setting static, long-lived TTLs. If you have a site-wide event, you end up with "synchronized expiration," where thousands of keys expire at the exact same time.
base_ttl + rand(0..300) seconds). This ensures that keys expire at different times, effectively smoothing out the load on your database.
| Pattern | Complexity | Best For |
|---|---|---|
| Probabilistic | High | High-traffic keys with tolerable staleness |
| Mutex (Locking) | Medium | Keys that require strict consistency |
| Jittered TTL | Low | General purpose caching |
I’m still experimenting with using Redis Streams to handle cache invalidation more gracefully for microservices. There’s a constant trade-off between the complexity of your cache logic and the simplicity of your code.
What I’ve learned is that you should start with simple jittered TTLs. If you still see spikes in your monitoring tools, move to a Mutex lock. Only reach for Probabilistic Recomputation when you’re dealing with massive scale where even a millisecond of lock contention is too much.
Don't over-engineer your cache until you have the metrics to prove it's the bottleneck.
Denormalize your database only when read latency becomes a bottleneck. Learn to evaluate the trade-offs between schema complexity and query speed.