Caching is a fundamental optimization strategy in modern systems. It reduces latency, lowers backend load, and improves user experience. But when many requests encounter an expired cache key at the same time, they can all attempt to regenerate the same data in parallel. This is known as a cache stampede. It’s not a bug in code or a misconfigured cache policy — it’s a failure in coordination under load.
A cache stampede creates a burst of redundant computation. Instead of one process refreshing the cache, many do. In high-traffic environments, this can cause unnecessary pressure on databases, APIs, or rendering engines — the very systems caching was meant to protect.
How Cache Stampede Happens
A typical example involves a popular page or API endpoint. Let’s say a cache key for an aggregated homepage expires at 12:00:00. Within that second, hundreds or thousands of users make requests that check the cache. The key is missing for all of them. Each request proceeds to recompute the page and attempts to write the result back. The backend is now processing the same expensive operation multiple times in parallel.
In isolation, each request is doing the right thing. But together, they create duplicated work that could have been avoided if only one regeneration were allowed.
Symptoms in Production
- Sudden spikes in backend CPU or database queries, despite steady traffic volume.
- Increased latency during specific windows — usually just after cache expirations.
- Log entries showing the same expensive operation performed repeatedly within milliseconds.
- Metrics suggesting load balancer or origin server stress during “quiet” periods.
Stampedes are often transient and hard to reproduce. The system stabilizes once the cache is populated again, but the pattern repeats whenever that key expires.
Practical Examples
- A trending topic on a news site causes a surge of traffic. The cache for that section expires, and all users trigger regeneration.
- An e-commerce backend caches personalized recommendations. The key for a popular user cohort expires at once. Each server recomputes it.
- A pricing API caches external data. Cache expires and hundreds of requests hit the third-party service simultaneously, triggering rate limiting.
Techniques to Prevent Stampede
Request coalescing with locking
Ensure that only one process regenerates a cache key at a time. Others wait or serve a fallback. This can be done using distributed locks in Redis (SETNX
) or internal mutexes for in-process caching.
Serve stale while revalidating Allow slightly stale data to be served while the system refreshes the cache in the background. This avoids blocking users and prevents mass regeneration.
Randomized TTLs Introduce small variations in cache expiration times across similar keys. Instead of setting all to 5 minutes, add jitter (e.g., 280 to 320 seconds). This reduces the chance of synchronized misses.
Pre-warming popular keys If a cache key is known to receive frequent traffic, proactively refresh it before it expires. Scheduled jobs can keep high-traffic keys always warm.
Multi-tier caching Use layered caches — memory, disk, and external cache — to allow partial fallbacks. If the in-memory cache misses, disk or Redis might still have the data, reducing regeneration pressure.
Cache Design as Coordination
The cache stampede problem is a coordination issue, not a failure of logic. Every part of the system is behaving as designed, but without awareness of concurrent activity. The solution isn’t more caching — it’s smarter caching.
Cache expiration should be treated as a controlled event, not a surprise. For systems under high concurrency, it’s important to think not only about what to cache, but also how and when that data gets refreshed. Designing for that moment — the cache miss under load — can prevent unnecessary strain and keep performance steady even under pressure.