Design Distributed Cache

1. Requirements

Functional

Provide low-latency key-value and structured data (strings, hashes, sorted sets) access for application tiers across many stateless API nodes.
Support TTL-based eviction, explicit delete, and optional pub/sub for cache invalidation fan-out.
Horizontal scale-out by adding shards without full cluster downtime (rolling).
Observable hit ratio, latency histograms, memory pressure, and eviction reasons per pool.
Security: TLS on wire, ACL per application cluster in enterprise setups.

Non-Functional

Latency: p99 <1 ms same-AZ for hit when network quiet; p99 <3 ms cross-AZ within region typical target.
Throughput: millions of ops/s cluster-wide; hot keys may constrain single shard ~200k–400k ops/s for Redis class hardware.
Availability: 99.99% with multi-AZ replicas; automatic failover <30 s for managed offerings.
Durability: cache not source of truth — ephemeral; optional AOF/RDB snapshots for warm restart trade latency vs recovery time.
Consistency: eventual between primary and replica for replication lag usually acceptable for cache reads.

Out of Scope

Full replacement for transactional OLTP database (anti-pattern guardrails only).
Geo-distributed active-active session cache with strict linearizability (specialized systems).
Detailed Redis internals single-threaded event loop source walkthrough.

2. Back-of-Envelope Estimations

Assume 50 stateless API nodes, each 20k cache ops/s peak → 1M ops/s cluster aggregate requirement.

Memory: Working set 200 GB hot data across app portfolio — with 3 replicas for HA → 600 GB DRAM provisioned (replication overhead depends on mode).

Network: Average 200 B key+value per op → 1M * 200 B = 200 MB/s bidirectional aggregate — 25 Gbps NIC cluster headroom needed including overhead.

Shard count: If each shard handles 50k ops/s comfortably → 20 primary shards minimum; add replicas multiply nodes.

Cost: DRAM expensive — drive compression for large values (Snappy/LZ4) in app vs Redis module tradeoffs.

3. API Design

Cache is usually library-embedded, not public REST — operational Redis protocol (RESP).

Illustrative HTTP proxy for polyglot control plane:

GET /v1/cache/{key}
X-Pool: sessions
-> 200 { "value": "..." }
-> 404 { "error": "miss" }

DELETE /v1/cache/{key}
-> 204

POST /v1/admin/cluster/rebalance
Authorization: Bearer <ops-token>
-> 202 { "jobId": "rb_1" }

Client libraries: Jedis, Lettuce, node-redis, go-redis — connection pooling, timeouts, circuit breakers.

4. Data Model

Logical: arbitrary byte keys with namespacing convention service:entity:id.

Redis data structures

String: simple cache entry.
Hash: object fields — fewer keys than many strings for locality.
Sorted set: leaderboards, rate limit windows (distributed rate limiter).
Stream: lighter Kafka alternative for small internal queues — overlap with message queues responsibilities.

Memcached alternative

Pure KV, multithreaded, simpler — no Lua, no rich structures; higher raw GET throughput per node sometimes — choose when workload is only GET/SET large blob caching.

Redis Cluster vs Sentinel

Cluster: automatic sharding by hash slots (16,384 slots) mapped to nodes — client aware.
Sentinel: HA for single master non-sharded — simpler ops smaller scale.

Sample key schema

Key	TTL	Purpose
`user:profile:9182`	300s	JSON profile cache
`rl:token:ip:1.2.3.4`	60s	Rate limit bucket

5. High-Level Architecture

Distributed cache sharding and failover diagram showing clients, proxy routing, primary shards, replicas, and control-plane failover.

Loading diagram…

Managed equivalents: AWS ElastiCache, GCP Memorystore, Azure Cache — reduce ops burden.

6. Component Deep-Dives

Sharding strategy

Hash slot on key string — stable mapping; resharding uses migration tools (Redis Cluster ASK/MOVED redirection).
Consistent hashing when using client-side proxies like twemproxy — minimizes reshuffle on node add (consistent hashing).

Eviction policies

allkeys-lru vs volatile-lru — pick based on whether keys have TTL; LFU (4.0+) for skewed popularity tails.

Cache-aside pattern

App reads cache miss → load DB → populate cache — stale data if DB updates without invalidation — must publish invalidation or short TTL.

Write-through vs write-back

Write-through: slower writes, consistent reads.
Write-back: risk on crash — rarely used except specialized buffering.

Thundering herd / stampede

Singleflight / request coalescing — first miss loads DB, others wait — Redis lock with short TTL or in-process mutex per key in app (caching).

Lua scripts

Atomic compare-and-swap patterns for inventory-like semantics — but careful: long scripts block Redis single thread.

Replica lag

Reads from replicas may be stale — MASTER preference for just-written session tokens — routing flag in client.

Why Redis over Memcached here

Structures + Lua + TTL granularity + ecosystem — Memcached when pure horizontal GET/SET simplicity wins benchmarks.

Alternative: Hazelcast / JVM grid

Good inside Java orgs for embedded data grids — ops model differs.

7. Bottlenecks & Mitigations

Bottleneck	Signal	Mitigation
Hot key	Single slot CPU 100%	Local LRU in app; read replicas; subkey sharding `key:{slot}:partN`
Big values	Latency spikes	Split list; compress; move to object store with pointer in cache
Full memory	Eviction thrash	Raise cluster; tune TTL; alert early
Cluster bus bandwidth	High shard churn	Avoid huge MIGRATE during peak
Client connection storm	Redis `maxclients`	Pooling, proxy aggregation

8. Tradeoffs

Decision	Alternative	Why we picked
Redis Cluster	Many single masters + client hash	Operations automation at large shard count
TLS to proxy	Plaintext in VPC	Compliance and multi-tenant safety
Read replicas for scale	Bigger single node	RAM ceiling and blast radius
Allkeys LRU	TTL only on some keys	Safety net when devs forget TTL
Managed service	Self-hosted	Staff expertise vs cost
RESP3 protocol	RESP2	Smarter types if clients upgraded

9. Follow-ups (interviewer drill-downs)

Multi-region cache: Global Datastore products with cross-region replication lag — active-active conflict resolution hard — prefer regional stickiness with load balancing.
Strong consistency: consider ETCD/Consul not Redis — different tool.
Cache penetration attack: random key misses to DB — bloom filter in app or dummy negative cache entries.
Memory migration to SSD tier: Redis on Flash / KeyDB — latency trade study.
Serverless Redis (Upstash) for spiky workloads — cold start and connection limits.

On this page