Caching & retry amplification
Core details
Thundering herd: many clients miss cache (or key expires) simultaneously → spike on origin DB/API.
Mitigations:
| Technique | Role |
|---|---|
| TTL jitter | spread expirations |
| Single-flight / request coalescing | one fetch per key |
| Probabilistic early refresh | refresh before hard expiry |
| Circuit breaker | fail fast when origin unhealthy |
Retry amplification: clients retry on timeout together → harmonic load spike. Use capped retries, exponential backoff + jitter, per-hop budgets, Retry-After respect.
Hedging (second duplicate request after delay): can help tails or double load—use sparingly with idempotency.
Understanding
Caches trade freshness for cost. Retries trade availability perception for downstream load. Staff design names who loses when both fire at once (usually the database).
The visual model below is the failure pattern to keep in mind: synchronized TTL expiry creates the first load wave, and synchronized retries create the second. Jitter, single-flight, explicit budgets, and overload responses spread demand before it reaches the origin.

Senior understanding
Connect to product: money reads need fresh routing or short TTL + honest UX—not “cache for speed” alone. Link Application caching & consistency.
Diagram
See also
Mark this page when you finish learning it.
Spotted something unclear or wrong on this page?