Design Distributed Cache
1. Requirements
Functional
- Provide low-latency key-value and structured data (strings, hashes, sorted sets) access for application tiers across many stateless API nodes.
- Support TTL-based eviction, explicit delete, and optional pub/sub for cache invalidation fan-out.
- Horizontal scale-out by adding shards without full cluster downtime (rolling).
- Observable hit ratio, latency histograms, memory pressure, and eviction reasons per pool.
- Security: TLS on wire, ACL per application cluster in enterprise setups.
Non-Functional
- Latency: p99 <1 ms same-AZ for hit when network quiet; p99 <3 ms cross-AZ within region typical target.
- Throughput: millions of ops/s cluster-wide; hot keys may constrain single shard ~200k–400k ops/s for Redis class hardware.
- Availability: 99.99% with multi-AZ replicas; automatic failover <30 s for managed offerings.
- Durability: cache not source of truth — ephemeral; optional AOF/RDB snapshots for warm restart trade latency vs recovery time.
- Consistency: eventual between primary and replica for replication lag usually acceptable for cache reads.
Out of Scope
- Full replacement for transactional OLTP database (anti-pattern guardrails only).
- Geo-distributed active-active session cache with strict linearizability (specialized systems).
- Detailed Redis internals single-threaded event loop source walkthrough.
2. Back-of-Envelope Estimations
Assume 50 stateless API nodes, each 20k cache ops/s peak → 1M ops/s cluster aggregate requirement.
Memory: Working set 200 GB hot data across app portfolio — with 3 replicas for HA → 600 GB DRAM provisioned (replication overhead depends on mode).
Network: Average 200 B key+value per op → 1M * 200 B = 200 MB/s bidirectional aggregate — 25 Gbps NIC cluster headroom needed including overhead.
Shard count: If each shard handles 50k ops/s comfortably → 20 primary shards minimum; add replicas multiply nodes.
Cost: DRAM expensive — drive compression for large values (Snappy/LZ4) in app vs Redis module tradeoffs.
3. API Design
Cache is usually library-embedded, not public REST — operational Redis protocol (RESP).
Illustrative HTTP proxy for polyglot control plane:
GET /v1/cache/{key}
X-Pool: sessions
-> 200 { "value": "..." }
-> 404 { "error": "miss" }DELETE /v1/cache/{key}
-> 204POST /v1/admin/cluster/rebalance
Authorization: Bearer <ops-token>
-> 202 { "jobId": "rb_1" }Client libraries: Jedis, Lettuce, node-redis, go-redis — connection pooling, timeouts, circuit breakers.
4. Data Model
Logical: arbitrary byte keys with namespacing convention service:entity:id.
Redis data structures
- String: simple cache entry.
- Hash: object fields — fewer keys than many strings for locality.
- Sorted set: leaderboards, rate limit windows (distributed rate limiter).
- Stream: lighter Kafka alternative for small internal queues — overlap with message queues responsibilities.
Memcached alternative
- Pure KV, multithreaded, simpler — no Lua, no rich structures; higher raw GET throughput per node sometimes — choose when workload is only GET/SET large blob caching.
Redis Cluster vs Sentinel
- Cluster: automatic sharding by hash slots (16,384 slots) mapped to nodes — client aware.
- Sentinel: HA for single master non-sharded — simpler ops smaller scale.
Sample key schema
| Key | TTL | Purpose |
|---|---|---|
user:profile:9182 | 300s | JSON profile cache |
rl:token:ip:1.2.3.4 | 60s | Rate limit bucket |
5. High-Level Architecture
Managed equivalents: AWS ElastiCache, GCP Memorystore, Azure Cache — reduce ops burden.
6. Component Deep-Dives
Sharding strategy
- Hash slot on key string — stable mapping; resharding uses migration tools (Redis Cluster ASK/MOVED redirection).
- Consistent hashing when using client-side proxies like twemproxy — minimizes reshuffle on node add (consistent hashing).
Eviction policies
- allkeys-lru vs volatile-lru — pick based on whether keys have TTL; LFU (4.0+) for skewed popularity tails.
Cache-aside pattern
- App reads cache miss → load DB → populate cache — stale data if DB updates without invalidation — must publish invalidation or short TTL.
Write-through vs write-back
- Write-through: slower writes, consistent reads.
- Write-back: risk on crash — rarely used except specialized buffering.
Thundering herd / stampede
- Singleflight / request coalescing — first miss loads DB, others wait — Redis lock with short TTL or in-process mutex per key in app (caching).
Lua scripts
- Atomic compare-and-swap patterns for inventory-like semantics — but careful: long scripts block Redis single thread.
Replica lag
- Reads from replicas may be stale — MASTER preference for just-written session tokens — routing flag in client.
Why Redis over Memcached here
- Structures + Lua + TTL granularity + ecosystem — Memcached when pure horizontal GET/SET simplicity wins benchmarks.
Alternative: Hazelcast / JVM grid
- Good inside Java orgs for embedded data grids — ops model differs.
7. Bottlenecks & Mitigations
| Bottleneck | Signal | Mitigation |
|---|---|---|
| Hot key | Single slot CPU 100% | Local LRU in app; read replicas; subkey sharding key:{slot}:partN |
| Big values | Latency spikes | Split list; compress; move to object store with pointer in cache |
| Full memory | Eviction thrash | Raise cluster; tune TTL; alert early |
| Cluster bus bandwidth | High shard churn | Avoid huge MIGRATE during peak |
| Client connection storm | Redis maxclients | Pooling, proxy aggregation |
8. Tradeoffs
| Decision | Alternative | Why we picked |
|---|---|---|
| Redis Cluster | Many single masters + client hash | Operations automation at large shard count |
| TLS to proxy | Plaintext in VPC | Compliance and multi-tenant safety |
| Read replicas for scale | Bigger single node | RAM ceiling and blast radius |
| Allkeys LRU | TTL only on some keys | Safety net when devs forget TTL |
| Managed service | Self-hosted | Staff expertise vs cost |
| RESP3 protocol | RESP2 | Smarter types if clients upgraded |
9. Follow-ups (interviewer drill-downs)
- Multi-region cache: Global Datastore products with cross-region replication lag — active-active conflict resolution hard — prefer regional stickiness with load balancing.
- Strong consistency: consider ETCD/Consul not Redis — different tool.
- Cache penetration attack: random key misses to DB — bloom filter in app or dummy negative cache entries.
- Memory migration to SSD tier: Redis on Flash / KeyDB — latency trade study.
- Serverless Redis (Upstash) for spiky workloads — cold start and connection limits.
Last updated on
Spotted something unclear or wrong on this page?