Design Rate Limiter (LLD)

1. Requirements

Functional
- For each client key, permit up to N requests per fixed window or token refill rate.
- Answer allow or deny with optional retry-after metadata.
- Share limiter across cooperative workers in one process or coordinate externally for cluster.
Non-Functional
- O(1) hot path for decision with bounded memory per key.
- Correct under concurrency without double-spending tokens.
Assumptions / Out of Scope
- Distributed Redis sliding window is HLD; here in-process abstractions.

2. Core Entities

Entity	Responsibility	Key Attributes
RateLimiter	Public API	algorithm ref
ClientKey	Identifier	string or tenant id
TokenBucket	Refill model	capacity, tokens, refillRate, lastRefill
FixedWindow	Counter model	windowStart, count, limit
Decision	Result	allowed, retryAfter
Clock	Time source	now function injectable

3. Class Diagram

Loading diagram…

4. State / Sequence Diagram (where relevant)

Loading diagram…

5. Design Patterns Applied

Strategy — Swappable LimiterAlgorithm. Strategy pattern.
Singleton (per scope) — One limiter instance per service process if shared global (use sparingly). Singleton pattern.

6. Implementation

Go

package ratelimit

type ClientKey string

type Decision struct {
    Allowed    bool
    RetryAfter time.Duration
}

type LimiterAlgorithm interface {
    TryAcquire(key ClientKey) Decision
}

type RateLimiter struct {
    Algorithm LimiterAlgorithm
}

func (limiter *RateLimiter) Allow(key ClientKey) Decision {
    return limiter.Algorithm.TryAcquire(key)
}

JavaScript

class TokenBucketLimiter {
  constructor({ capacity, refillPerSecond, clock }) {
    this.capacity = capacity;
    this.refillPerSecond = refillPerSecond;
    this.clock = clock;
    this.buckets = new Map();
  }
  tryAcquire(clientKey) { /* ... */ }
}

class RateLimiter {
  constructor({ algorithm }) {
    this.algorithm = algorithm;
  }
  allow(clientKey) {
    return this.algorithm.tryAcquire(clientKey);
  }
}

7. Concurrency / Thread Safety

Collisions: Parallel Allow on same key racing token decrement.
Granularity: Mutex per key sharded by hash; or atomic operations on token fields with careful refill math.
Go: sync.Mutex map keyed by ClientKey or sync.Map for bucket storage.
JavaScript: Single-threaded; distributed workers need Redis Lua or compare-and-swap.

8. Extensibility & Followups

Sliding window log with deque per key for smoother limiting.
Hierarchical quotas (tenant then user).
Edge cases: clock skew, burst after long idle, memory growth from many keys (eviction).

On this page