THN Interview Prep

Design Distributed Cache

1. Requirements

Functional

  • Provide low-latency key-value and structured data (strings, hashes, sorted sets) access for application tiers across many stateless API nodes.
  • Support TTL-based eviction, explicit delete, and optional pub/sub for cache invalidation fan-out.
  • Horizontal scale-out by adding shards without full cluster downtime (rolling).
  • Observable hit ratio, latency histograms, memory pressure, and eviction reasons per pool.
  • Security: TLS on wire, ACL per application cluster in enterprise setups.

Non-Functional

  • Latency: p99 <1 ms same-AZ for hit when network quiet; p99 <3 ms cross-AZ within region typical target.
  • Throughput: millions of ops/s cluster-wide; hot keys may constrain single shard ~200k–400k ops/s for Redis class hardware.
  • Availability: 99.99% with multi-AZ replicas; automatic failover <30 s for managed offerings.
  • Durability: cache not source of truth — ephemeral; optional AOF/RDB snapshots for warm restart trade latency vs recovery time.
  • Consistency: eventual between primary and replica for replication lag usually acceptable for cache reads.

Out of Scope

  • Full replacement for transactional OLTP database (anti-pattern guardrails only).
  • Geo-distributed active-active session cache with strict linearizability (specialized systems).
  • Detailed Redis internals single-threaded event loop source walkthrough.

2. Back-of-Envelope Estimations

Assume 50 stateless API nodes, each 20k cache ops/s peak → 1M ops/s cluster aggregate requirement.

Memory: Working set 200 GB hot data across app portfolio — with 3 replicas for HA → 600 GB DRAM provisioned (replication overhead depends on mode).

Network: Average 200 B key+value per op → 1M * 200 B = 200 MB/s bidirectional aggregate — 25 Gbps NIC cluster headroom needed including overhead.

Shard count: If each shard handles 50k ops/s comfortably → 20 primary shards minimum; add replicas multiply nodes.

Cost: DRAM expensive — drive compression for large values (Snappy/LZ4) in app vs Redis module tradeoffs.


3. API Design

Cache is usually library-embedded, not public REST — operational Redis protocol (RESP).

Illustrative HTTP proxy for polyglot control plane:

GET /v1/cache/{key}
X-Pool: sessions
-> 200 { "value": "..." }
-> 404 { "error": "miss" }
DELETE /v1/cache/{key}
-> 204
POST /v1/admin/cluster/rebalance
Authorization: Bearer <ops-token>
-> 202 { "jobId": "rb_1" }

Client libraries: Jedis, Lettuce, node-redis, go-redis — connection pooling, timeouts, circuit breakers.


4. Data Model

Logical: arbitrary byte keys with namespacing convention service:entity:id.

Redis data structures

  • String: simple cache entry.
  • Hash: object fields — fewer keys than many strings for locality.
  • Sorted set: leaderboards, rate limit windows (distributed rate limiter).
  • Stream: lighter Kafka alternative for small internal queues — overlap with message queues responsibilities.

Memcached alternative

  • Pure KV, multithreaded, simpler — no Lua, no rich structures; higher raw GET throughput per node sometimes — choose when workload is only GET/SET large blob caching.

Redis Cluster vs Sentinel

  • Cluster: automatic sharding by hash slots (16,384 slots) mapped to nodes — client aware.
  • Sentinel: HA for single master non-sharded — simpler ops smaller scale.

Sample key schema

KeyTTLPurpose
user:profile:9182300sJSON profile cache
rl:token:ip:1.2.3.460sRate limit bucket

5. High-Level Architecture

Loading diagram…

Managed equivalents: AWS ElastiCache, GCP Memorystore, Azure Cache — reduce ops burden.


6. Component Deep-Dives

Sharding strategy

  • Hash slot on key string — stable mapping; resharding uses migration tools (Redis Cluster ASK/MOVED redirection).
  • Consistent hashing when using client-side proxies like twemproxy — minimizes reshuffle on node add (consistent hashing).

Eviction policies

  • allkeys-lru vs volatile-lru — pick based on whether keys have TTL; LFU (4.0+) for skewed popularity tails.

Cache-aside pattern

  • App reads cache miss → load DB → populate cache — stale data if DB updates without invalidation — must publish invalidation or short TTL.

Write-through vs write-back

  • Write-through: slower writes, consistent reads.
  • Write-back: risk on crash — rarely used except specialized buffering.

Thundering herd / stampede

  • Singleflight / request coalescing — first miss loads DB, others wait — Redis lock with short TTL or in-process mutex per key in app (caching).

Lua scripts

  • Atomic compare-and-swap patterns for inventory-like semantics — but careful: long scripts block Redis single thread.

Replica lag

  • Reads from replicas may be stale — MASTER preference for just-written session tokens — routing flag in client.

Why Redis over Memcached here

  • Structures + Lua + TTL granularity + ecosystem — Memcached when pure horizontal GET/SET simplicity wins benchmarks.

Alternative: Hazelcast / JVM grid

  • Good inside Java orgs for embedded data grids — ops model differs.

7. Bottlenecks & Mitigations

BottleneckSignalMitigation
Hot keySingle slot CPU 100%Local LRU in app; read replicas; subkey sharding key:{slot}:partN
Big valuesLatency spikesSplit list; compress; move to object store with pointer in cache
Full memoryEviction thrashRaise cluster; tune TTL; alert early
Cluster bus bandwidthHigh shard churnAvoid huge MIGRATE during peak
Client connection stormRedis maxclientsPooling, proxy aggregation

8. Tradeoffs

DecisionAlternativeWhy we picked
Redis ClusterMany single masters + client hashOperations automation at large shard count
TLS to proxyPlaintext in VPCCompliance and multi-tenant safety
Read replicas for scaleBigger single nodeRAM ceiling and blast radius
Allkeys LRUTTL only on some keysSafety net when devs forget TTL
Managed serviceSelf-hostedStaff expertise vs cost
RESP3 protocolRESP2Smarter types if clients upgraded

9. Follow-ups (interviewer drill-downs)

  • Multi-region cache: Global Datastore products with cross-region replication lag — active-active conflict resolution hard — prefer regional stickiness with load balancing.
  • Strong consistency: consider ETCD/Consul not Redis — different tool.
  • Cache penetration attack: random key misses to DB — bloom filter in app or dummy negative cache entries.
  • Memory migration to SSD tier: Redis on Flash / KeyDB — latency trade study.
  • Serverless Redis (Upstash) for spiky workloads — cold start and connection limits.

Last updated on

Spotted something unclear or wrong on this page?

On this page