THN Interview Prep

Design Distributed Rate Limiter

1. Requirements

Functional

  • Enforce per-actor limits: by user id, API key, IP address, or composite route (e.g., POST /v1/messages separate from GET /v1/feed).
  • Support multiple algorithms: token bucket (smooth burst), sliding window (strict per minute), fixed window (simplest), leaky bucket (constant outflow).
  • Return standard headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset (or RFC draft RateLimit headers).
  • Distributed: same limit across many API nodes and regions within defined tolerance.
  • Optional global vs per-region limits (e.g., 1k/s global, 200/s per region cap).

Non-Functional

  • Latency: limiter check adds p99 under 1–2 ms in data path when local; under 5 ms if remote call acceptable for strict global sync.
  • Throughput: 1M+ checks/s cluster-wide for high-traffic gateway.
  • Availability: fail-open vs fail-closed policy configurable; default fail-open for product UX with alert, or fail-closed for security endpoints.
  • Consistency: eventual is OK for most APIs; strong per key may be required for financial operations (use single leader or distributed lock—expensive).
  • Durability: counters are ephemeral; no need for disk durability beyond Redis AOF if taking over from crash within seconds is acceptable.

Out of Scope

  • User-facing billing for overage (only signal 429).
  • Long-term analytics of every check (sampling to metrics pipeline is enough).
  • Application-level fair queuing of work inside business logic (only HTTP/gRPC admission control).
  • DDoS at network edge (assume Cloudflare/AWS Shield in front).

2. Back-of-Envelope Estimations

Service behind 100k RPS total API traffic; 10% of requests need a distributed check (10k RPS) because local in-process token bucket already handles smooth per-node guard.

  • Per check: 1–2 Redis commands (GET+INCR or Lua). 10k * 2 = 20k cmd/s to Redis. Single shard ~100k–200k ops/s typical; one cluster sufficient with headroom.

  • Memory: 10M active keys in worst case (one per user) * 64 B ~ 640 MB for counters + overhead ~ 1–2 GB. Realistic: 1M hot keys * 100 B = 100 MB; use key TTL aligned to window.

  • Cross-AZ network: 10k * 1 KB = 10 MB/s — negligible.

If traffic grows 10x, shard Redis by actor hash; scale horizontally.


3. API Design

Data-plane embedded in Envoy filter or middleware; control-plane is REST for operators.

Middleware contract (conceptual)

GET /internal/rl/decision?actor=ip:1.2.3.4&route=createOrder&cost=1
-> 200 { "allow": true, "limit": 100, "remaining": 37, "resetAt": 1714406400 }
-> 200 { "allow": false, "retryAfter": 3 }

Control plane

PUT /v1/quotas/{tenantId}/routes/{routeKey}
Body: { "algorithm": "token_bucket", "rate": 100, "burst": 150, "windowSeconds": 60 }
-> 204

gRPC for sidecar: CheckRateLimit(CheckRequest) returns (Decision) for lower latency than HTTP.

Client-visible when blocked:

HTTP/1.1 429 Too Many Requests
Retry-After: 3
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0

4. Data Model

Logical keys

  • rl:{tenant}:{route}:{actorId}:{windowBucket} for fixed/sliding window.
  • rl:tb:{tenant}:{route}:{actorId} for token bucket fields: tokens, last refill timestamp.

Redis vs Memcached

  • Redis: atomic Lua scripts, TTL, optional persistence. Industry default for rate limiting (caching layer often doubles as counter store).
  • Memcached: pure counters without Lua; need careful CAS loops — simpler but more chatty under contention.

Alternative: dedicated service

  • Envoy Global Rate Limit Service with Redis/pluggable backend: standardizes protocol for polyglot microservices.

DynamoDB for extreme multi-region without Redis: conditional writes per key — costlier per check; use only when managed counters must survive regional Redis loss.

Sample Redis Lua sketch (token bucket)

  • Keys: tokens, ts. Atomically read time, refill tokens at rate r up to burst, subtract cost, set TTL.

5. High-Level Architecture

Loading diagram…

Two-tier: local cheap bucket stops obvious floods; global Redis-backed limit enforces cross-node fairness.


6. Component Deep-Dives

Token bucket (smooth bursts)

  • Stores tokens and last update time. Refill: tokens = min(burst, tokens + rate * dt). Consumption subtracts cost; deny if insufficient.
  • Why vs leaky bucket: token bucket allows bursts aligned with user expectation; leaky bucket output is smoother but burst semantics differ.

Sliding window log (accuracy)

  • Store sorted set of timestamps per actor in Redis; remove older than window; count members vs limit — memory heavy for chatty actors.
  • Sliding window counter approximation: combine current and previous fixed windows weighted — fewer keys (Redis sliding window blog patterns).

Fixed window

  • Simplest: INCR key with TTL at window boundary—cheap but allows 2x burst at edges of adjacent windows. Mitigate with small sub-windows or hybrid.

Why Envoy Global Rate Limit + Redis vs every service rolling own

  • Central policy, consistent headers, less duplicated bug surface; alternative library in each language risks drift—acceptable only with strong platform team.

Synchronization across regions

  • Independent counters per region double allowance unless divided by N regions — use weighted quotas or global Redis (latency/capex).
  • CRDT-style relaxed counters for non-financial use cases.

Failure modes

  • Redis down: local only or fail-closed for sensitive routes; trip a circuit breaker on the Redis client so API nodes back off instead of hot-looping failed checks (pairs with health-aware load balancing).

Clock skew

  • NTP on all nodes; monotonic now() in Lua uses Redis TIME for single source of truth.

7. Bottlenecks & Mitigations

BottleneckSignalMitigation
Hot key (one IP DDoS-like)Single Redis slot hotShard actor key with hash salt only if product allows; block at edge/WAF first
Lua long scriptRedis single-thread contendsKeep scripts O(1); avoid large keys
Cross-region RedisRTT adds msRegional limit + async global settle; or accept soft limit
Thundering herd on 429Clients retry togetherJittered backoff; Retry-After honored
Config push lagOld limits after deployVersion quotas; eventual consistency acceptable

8. Tradeoffs

DecisionAlternativeWhy we picked
Redis + Lua atomicityDB row lockSub-ms; horizontal scale
Token bucket defaultFixed window onlyBetter UX for legitimate bursts
Fail-open on Redis outageFail-closedAvailability bias; security endpoints override
Envoy GRLSCustom Go serviceStandard integration; less code
Per-region limitsSingle global RedisLatency vs accuracy trade
Approximate sliding windowExact logMemory and CPU savings at scale

9. Follow-ups (interviewer drill-downs)

  • Exactly strict global 100 RPS? Single leader per key (slow), or accept ~5% overcount with eventual merge — discuss CAP.
  • Weighted limits (expensive operations cost 10)? Pass cost into script; decrement tokens by cost.
  • GraphQL single endpoint — rate limit by resolver via persisted queries + operation cost.
  • Kubernetes — sidecar Envoy vs eBPF future; both valid LLD topics.
  • Multi-tenant noisy neighbor: separate Redis clusters per large tenant.
  • Audit: compare with caching stampede prevention — different problem but shared Redis expertise.

Last updated on

Spotted something unclear or wrong on this page?

On this page