Caching
What it is
A cache stores copies of frequently read data in faster storage (memory, local SSD, edge) to reduce latency and load on authoritative stores. Write policies and eviction control consistency and memory use.

Write policies
- Write-through: each write updates cache and backing store together. Stronger consistency at cache; higher write latency.
- Write-back (write-behind): writes go to cache first, persistence happens asynchronously. Lower perceived write latency; risk of data loss on crash unless replicated/durable buffer.
- Write-around: writes bypass cache to store; cache fills on read. Avoids polluting cache with one-off writes; next read may miss.
Concrete example
For a product detail page, cache product:{id} with a short TTL and invalidate it when price or inventory changes. The first request after expiry loads from the database, fills the cache, and returns the value; later requests hit memory. After a write, update the database first, then delete or refresh the cached key so readers do not keep seeing the old price.
Eviction
- LRU (Least Recently Used): evict the entry unused for the longest time. Good temporal locality; vulnerable to one-off scans flooding working set.
- LFU (Least Frequently Used): evict least popular count. Resists scans better; needs aging to avoid stale hot keys.
- TTL: time-based expiry—simple; combine with LRU/LFU in practice.
Stampede mitigation
When a popular key expires, many requests miss simultaneously and reload the same value from DB.
Mitigations:
- Probabilistic early expiration: refresh before hard TTL with random jitter.
- Single-flight / mutex per key: only one backend computes; others wait or get stale.
- Stale-while-revalidate: return old value while one worker refreshes.
- Request coalescing at cache layer (Redis/Memcached patterns, application locks).
Without mitigation: [miss][miss][miss] --> DB spike
With single-flight: [miss] --> one rebuild; others get result or staleWhen to use
- Read-heavy workloads with repeatable keys (objects, query results, rendered fragments).
- Rate-sensitive downstream services (DB, search); measure with latency-throughput goals.
Alternatives
- No cache: simpler consistency; use scaling and read replicas instead (see replication).
- Materialized views / precomputed tables: heavier but queryable like primary data.
- CDN for HTTP assets (see reverse-proxy-cdn).
Failure modes
- Stale reads: TTL too long or invalidation bugs.
- Thundering herd: see stampede above.
- Memory pressure: wrong eviction → thrashing.
- Cache aside inconsistency: update DB without invalidating cache (classic bug).
Common mistakes
- Caching low-reuse data: hit ratio stays poor but memory and invalidation complexity still grow.
- Treating TTL as correctness: expiry limits staleness, but it does not guarantee fresh reads after writes.
- Ignoring hot-key distribution: one viral key can overload a cache node even when total hit ratio looks healthy.
- Caching errors or empty values without a policy: can amplify short outages or hide newly created data.
Interview talking points
- State consistency model (eventual vs strong) for your write policy.
- Size cache with back-of-envelope: working set, hit ratio, eviction under load.
- Connect to downstream protection: circuit breakers + caching layers.
Interview answer shape
- Start with workload: read-heavy keys, freshness needs, working-set size, and tolerated staleness.
- Pick the pattern: cache-aside for simple read scaling, write-through for tighter read freshness, write-back only when loss and replay are handled.
- Explain correctness: invalidate after writes, add version checks for sensitive reads, and avoid caching data with strong transactional invariants.
- Explain operations: hit ratio, eviction rate, hot keys, backend QPS, and stampede protection.
Common follow-ups: Redis vs Memcached, TTL vs explicit invalidation, local cache vs distributed cache, and how to keep read-your-write behavior after a user edits data.
Related fundamentals
Mark this page when you finish learning it.
Last updated on
Spotted something unclear or wrong on this page?