Design Distributed Rate Limiter
1. Requirements
Functional
- Enforce per-actor limits: by user id, API key, IP address, or composite route (e.g.,
POST /v1/messagesseparate fromGET /v1/feed). - Support multiple algorithms: token bucket (smooth burst), sliding window (strict per minute), fixed window (simplest), leaky bucket (constant outflow).
- Return standard headers:
X-RateLimit-Limit,X-RateLimit-Remaining,X-RateLimit-Reset(or RFC draft RateLimit headers). - Distributed: same limit across many API nodes and regions within defined tolerance.
- Optional global vs per-region limits (e.g., 1k/s global, 200/s per region cap).
Non-Functional
- Latency: limiter check adds p99 under 1–2 ms in data path when local; under 5 ms if remote call acceptable for strict global sync.
- Throughput: 1M+ checks/s cluster-wide for high-traffic gateway.
- Availability: fail-open vs fail-closed policy configurable; default fail-open for product UX with alert, or fail-closed for security endpoints.
- Consistency: eventual is OK for most APIs; strong per key may be required for financial operations (use single leader or distributed lock—expensive).
- Durability: counters are ephemeral; no need for disk durability beyond Redis AOF if taking over from crash within seconds is acceptable.
Out of Scope
- User-facing billing for overage (only signal 429).
- Long-term analytics of every check (sampling to metrics pipeline is enough).
- Application-level fair queuing of work inside business logic (only HTTP/gRPC admission control).
- DDoS at network edge (assume Cloudflare/AWS Shield in front).
2. Back-of-Envelope Estimations
Service behind 100k RPS total API traffic; 10% of requests need a distributed check (10k RPS) because local in-process token bucket already handles smooth per-node guard.
-
Per check: 1–2 Redis commands (GET+INCR or Lua). 10k * 2 = 20k cmd/s to Redis. Single shard ~100k–200k ops/s typical; one cluster sufficient with headroom.
-
Memory: 10M active keys in worst case (one per user) * 64 B ~ 640 MB for counters + overhead ~ 1–2 GB. Realistic: 1M hot keys * 100 B = 100 MB; use key TTL aligned to window.
-
Cross-AZ network: 10k * 1 KB = 10 MB/s — negligible.
If traffic grows 10x, shard Redis by actor hash; scale horizontally.
3. API Design
Data-plane embedded in Envoy filter or middleware; control-plane is REST for operators.
Middleware contract (conceptual)
GET /internal/rl/decision?actor=ip:1.2.3.4&route=createOrder&cost=1
-> 200 { "allow": true, "limit": 100, "remaining": 37, "resetAt": 1714406400 }
-> 200 { "allow": false, "retryAfter": 3 }Control plane
PUT /v1/quotas/{tenantId}/routes/{routeKey}
Body: { "algorithm": "token_bucket", "rate": 100, "burst": 150, "windowSeconds": 60 }
-> 204gRPC for sidecar: CheckRateLimit(CheckRequest) returns (Decision) for lower latency than HTTP.
Client-visible when blocked:
HTTP/1.1 429 Too Many Requests
Retry-After: 3
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 04. Data Model
Logical keys
rl:{tenant}:{route}:{actorId}:{windowBucket}for fixed/sliding window.rl:tb:{tenant}:{route}:{actorId}for token bucket fields: tokens, last refill timestamp.
Redis vs Memcached
- Redis: atomic Lua scripts, TTL, optional persistence. Industry default for rate limiting (caching layer often doubles as counter store).
- Memcached: pure counters without Lua; need careful CAS loops — simpler but more chatty under contention.
Alternative: dedicated service
- Envoy Global Rate Limit Service with Redis/pluggable backend: standardizes protocol for polyglot microservices.
DynamoDB for extreme multi-region without Redis: conditional writes per key — costlier per check; use only when managed counters must survive regional Redis loss.
Sample Redis Lua sketch (token bucket)
- Keys:
tokens,ts. Atomically read time, refill tokens at raterup toburst, subtract cost, set TTL.
5. High-Level Architecture
Two-tier: local cheap bucket stops obvious floods; global Redis-backed limit enforces cross-node fairness.
6. Component Deep-Dives
Token bucket (smooth bursts)
- Stores tokens and last update time. Refill:
tokens = min(burst, tokens + rate * dt). Consumption subtracts cost; deny if insufficient. - Why vs leaky bucket: token bucket allows bursts aligned with user expectation; leaky bucket output is smoother but burst semantics differ.
Sliding window log (accuracy)
- Store sorted set of timestamps per actor in Redis; remove older than window; count members vs limit — memory heavy for chatty actors.
- Sliding window counter approximation: combine current and previous fixed windows weighted — fewer keys (Redis sliding window blog patterns).
Fixed window
- Simplest:
INCRkey with TTL at window boundary—cheap but allows 2x burst at edges of adjacent windows. Mitigate with small sub-windows or hybrid.
Why Envoy Global Rate Limit + Redis vs every service rolling own
- Central policy, consistent headers, less duplicated bug surface; alternative library in each language risks drift—acceptable only with strong platform team.
Synchronization across regions
- Independent counters per region double allowance unless divided by N regions — use weighted quotas or global Redis (latency/capex).
- CRDT-style relaxed counters for non-financial use cases.
Failure modes
- Redis down: local only or fail-closed for sensitive routes; trip a circuit breaker on the Redis client so API nodes back off instead of hot-looping failed checks (pairs with health-aware load balancing).
Clock skew
- NTP on all nodes; monotonic
now()in Lua uses RedisTIMEfor single source of truth.
7. Bottlenecks & Mitigations
| Bottleneck | Signal | Mitigation |
|---|---|---|
| Hot key (one IP DDoS-like) | Single Redis slot hot | Shard actor key with hash salt only if product allows; block at edge/WAF first |
| Lua long script | Redis single-thread contends | Keep scripts O(1); avoid large keys |
| Cross-region Redis | RTT adds ms | Regional limit + async global settle; or accept soft limit |
| Thundering herd on 429 | Clients retry together | Jittered backoff; Retry-After honored |
| Config push lag | Old limits after deploy | Version quotas; eventual consistency acceptable |
8. Tradeoffs
| Decision | Alternative | Why we picked |
|---|---|---|
| Redis + Lua atomicity | DB row lock | Sub-ms; horizontal scale |
| Token bucket default | Fixed window only | Better UX for legitimate bursts |
| Fail-open on Redis outage | Fail-closed | Availability bias; security endpoints override |
| Envoy GRLS | Custom Go service | Standard integration; less code |
| Per-region limits | Single global Redis | Latency vs accuracy trade |
| Approximate sliding window | Exact log | Memory and CPU savings at scale |
9. Follow-ups (interviewer drill-downs)
- Exactly strict global 100 RPS? Single leader per key (slow), or accept ~5% overcount with eventual merge — discuss CAP.
- Weighted limits (expensive operations cost 10)? Pass
costinto script; decrement tokens by cost. - GraphQL single endpoint — rate limit by resolver via persisted queries + operation cost.
- Kubernetes — sidecar Envoy vs eBPF future; both valid LLD topics.
- Multi-tenant noisy neighbor: separate Redis clusters per large tenant.
- Audit: compare with caching stampede prevention — different problem but shared Redis expertise.
Last updated on
Spotted something unclear or wrong on this page?