Design Distributed Rate Limiter

1. Requirements

Functional

Enforce per-actor limits: by user id, API key, IP address, or composite route (e.g., POST /v1/messages separate from GET /v1/feed).
Support multiple algorithms: token bucket (smooth burst), sliding window (strict per minute), fixed window (simplest), leaky bucket (constant outflow).
Return standard headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset (or RFC draft RateLimit headers).
Distributed: same limit across many API nodes and regions within defined tolerance.
Optional global vs per-region limits (e.g., 1k/s global, 200/s per region cap).

Non-Functional

Latency: limiter check adds p99 under 1–2 ms in data path when local; under 5 ms if remote call acceptable for strict global sync.
Throughput: 1M+ checks/s cluster-wide for high-traffic gateway.
Availability: fail-open vs fail-closed policy configurable; default fail-open for product UX with alert, or fail-closed for security endpoints.
Consistency: eventual is OK for most APIs; strong per key may be required for financial operations (use single leader or distributed lock—expensive).
Durability: counters are ephemeral; no need for disk durability beyond Redis AOF if taking over from crash within seconds is acceptable.

Out of Scope

User-facing billing for overage (only signal 429).
Long-term analytics of every check (sampling to metrics pipeline is enough).
Application-level fair queuing of work inside business logic (only HTTP/gRPC admission control).
DDoS at network edge (assume Cloudflare/AWS Shield in front).

2. Back-of-Envelope Estimations

Service behind 100k RPS total API traffic; 10% of requests need a distributed check (10k RPS) because local in-process token bucket already handles smooth per-node guard.

Per check: 1–2 Redis commands (GET+INCR or Lua). 10k * 2 = 20k cmd/s to Redis. Single shard ~100k–200k ops/s typical; one cluster sufficient with headroom.
Memory: 10M active keys in worst case (one per user) * 64 B ~ 640 MB for counters + overhead ~ 1–2 GB. Realistic: 1M hot keys * 100 B = 100 MB; use key TTL aligned to window.
Cross-AZ network: 10k * 1 KB = 10 MB/s — negligible.

If traffic grows 10x, shard Redis by actor hash; scale horizontally.

3. API Design

Data-plane embedded in Envoy filter or middleware; control-plane is REST for operators.

Middleware contract (conceptual)

GET /internal/rl/decision?actor=ip:1.2.3.4&route=createOrder&cost=1
-> 200 { "allow": true, "limit": 100, "remaining": 37, "resetAt": 1714406400 }
-> 200 { "allow": false, "retryAfter": 3 }

Control plane

PUT /v1/quotas/{tenantId}/routes/{routeKey}
Body: { "algorithm": "token_bucket", "rate": 100, "burst": 150, "windowSeconds": 60 }
-> 204

gRPC for sidecar: CheckRateLimit(CheckRequest) returns (Decision) for lower latency than HTTP.

Client-visible when blocked:

HTTP/1.1 429 Too Many Requests
Retry-After: 3
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0

4. Data Model

Logical keys

rl:{tenant}:{route}:{actorId}:{windowBucket} for fixed/sliding window.
rl:tb:{tenant}:{route}:{actorId} for token bucket fields: tokens, last refill timestamp.

Redis vs Memcached

Redis: atomic Lua scripts, TTL, optional persistence. Industry default for rate limiting (caching layer often doubles as counter store).
Memcached: pure counters without Lua; need careful CAS loops — simpler but more chatty under contention.

Alternative: dedicated service

Envoy Global Rate Limit Service with Redis/pluggable backend: standardizes protocol for polyglot microservices.

DynamoDB for extreme multi-region without Redis: conditional writes per key — costlier per check; use only when managed counters must survive regional Redis loss.

Sample Redis Lua sketch (token bucket)

Keys: tokens, ts. Atomically read time, refill tokens at rate r up to burst, subtract cost, set TTL.

5. High-Level Architecture

Distributed rate limiter hot path showing edge checks, local token bucket, global limiter, Redis counter state, and config control plane.

Loading diagram…

Two-tier: local cheap bucket stops obvious floods; global Redis-backed limit enforces cross-node fairness.

6. Component Deep-Dives

Token bucket (smooth bursts)

Stores tokens and last update time. Refill: tokens = min(burst, tokens + rate * dt). Consumption subtracts cost; deny if insufficient.
Why vs leaky bucket: token bucket allows bursts aligned with user expectation; leaky bucket output is smoother but burst semantics differ.

Sliding window log (accuracy)

Store sorted set of timestamps per actor in Redis; remove older than window; count members vs limit — memory heavy for chatty actors.
Sliding window counter approximation: combine current and previous fixed windows weighted — fewer keys (Redis sliding window blog patterns).

Fixed window

Simplest: INCR key with TTL at window boundary—cheap but allows 2x burst at edges of adjacent windows. Mitigate with small sub-windows or hybrid.

Why Envoy Global Rate Limit + Redis vs every service rolling own

Central policy, consistent headers, less duplicated bug surface; alternative library in each language risks drift—acceptable only with strong platform team.

Synchronization across regions

Independent counters per region double allowance unless divided by N regions — use weighted quotas or global Redis (latency/capex).
CRDT-style relaxed counters for non-financial use cases.

Failure modes

Redis down: local only or fail-closed for sensitive routes; trip a circuit breaker on the Redis client so API nodes back off instead of hot-looping failed checks (pairs with health-aware load balancing).

Clock skew

NTP on all nodes; monotonic now() in Lua uses Redis TIME for single source of truth.

7. Bottlenecks & Mitigations

Bottleneck	Signal	Mitigation
Hot key (one IP DDoS-like)	Single Redis slot hot	Shard actor key with hash salt only if product allows; block at edge/WAF first
Lua long script	Redis single-thread contends	Keep scripts O(1); avoid large keys
Cross-region Redis	RTT adds ms	Regional limit + async global settle; or accept soft limit
Thundering herd on 429	Clients retry together	Jittered backoff; `Retry-After` honored
Config push lag	Old limits after deploy	Version quotas; eventual consistency acceptable

8. Tradeoffs

Decision	Alternative	Why we picked
Redis + Lua atomicity	DB row lock	Sub-ms; horizontal scale
Token bucket default	Fixed window only	Better UX for legitimate bursts
Fail-open on Redis outage	Fail-closed	Availability bias; security endpoints override
Envoy GRLS	Custom Go service	Standard integration; less code
Per-region limits	Single global Redis	Latency vs accuracy trade
Approximate sliding window	Exact log	Memory and CPU savings at scale

9. Follow-ups (interviewer drill-downs)

Exactly strict global 100 RPS? Single leader per key (slow), or accept ~5% overcount with eventual merge — discuss CAP.
Weighted limits (expensive operations cost 10)? Pass cost into script; decrement tokens by cost.
GraphQL single endpoint — rate limit by resolver via persisted queries + operation cost.
Kubernetes — sidecar Envoy vs eBPF future; both valid LLD topics.
Multi-tenant noisy neighbor: separate Redis clusters per large tenant.
Audit: compare with caching stampede prevention — different problem but shared Redis expertise.

On this page