THN Interview Prep

Rate Limiter

What it is

A rate limiter caps how many requests a client (user id, IP, API key) can make in a window to protect services and ensure fair usage. Algorithms differ in burst behavior and memory costs.

Rate limiter decision path showing request identity selection, token bucket refill and consume, shared Redis counter checks, allow and reject outcomes, and failover policy.

Algorithms

Token bucket

  • Tokens refill at rate refillPerSecond up to capacity.
  • Allows smooth bursts up to bucket size; common for APIs that tolerate short spikes.

Leaky bucket

  • Requests enter a queue leaking at fixed rate; smoothes output to steady rate; can queue or drop when full.

Sliding window log

  • Store timestamps of each request in a window; precise count; memory-heavy for high QPS (optimize with sliding window counter hybrids).
  Token bucket:   [tokens + refill] --allow--> forward request
  Leaky bucket:   queue --drip fixed rate--> steady downstream

Distributed approaches

  • Central Redis (or similar): INCR with TTL, Lua scripts for atomicity, or Redis Cell-style modules.
  • Sticky routing: limit per gateway instance—inconsistent across nodes unless synchronized.
  • Approximate counting (e.g. Gossip, probabilistic) for global scale—trade accuracy for cost.

Synchronization cost affects latency-throughput; local + async sync is a common compromise.

When to use

  • Public APIs, login endpoints, expensive operations.
  • Multi-tenant fairness and DDoS mitigation at application layer (with network-layer defenses).

Alternatives

  • Admission control at load balancer (see load-balancer); coarser granularity.
  • Back-pressure via queues instead of hard reject—different UX.

Failure modes

  • Clock skew across nodes skews windows—use server time authority or tolerate fuzz.
  • Redis down: fail-open (risky) vs fail-closed (availability hit)—document policy.
  • Hot keys for global counters—shard limit keys or use local token buckets with small error.

Interview talking points

  • State limit dimensions: per user, per IP, per endpoint, global vs regional.
  • Choose algorithm by burst tolerance (token bucket) vs strict smoothness (leaky bucket).
  • Estimate allowed QPS with back-of-envelope given fleet size.

Mark this page when you finish learning it.

Last updated on

Spotted something unclear or wrong on this page?

On this page