THN Interview Prep

Back-of-the-Envelope Estimation: QPS, Storage, Bandwidth, Rules of Thumb

Definition

Back-of-envelope calculations are order-of-magnitude estimates used to sanity-check architecture before detail: daily active users to QPS, write volume to storage, payload sizes to network, and CPU rough budgets. Goals are ranges and bottleneck hints, not fake precision.

Why it matters in interviews

The E — Estimations step in RESHADED interview framework expects you to drive numbers early: drives database choice, shard count, cache size, and cost. Interviewers reward explicit assumptions ("assume 1 KB per row") and peak factors.

Tradeoffs

  • Too precise — Wastes time; wrong confidence.
  • Too vague — Cannot justify sharding vs single DB.
  • Peak factors — Real traffic is spiky; take 2–3× average for rough peak QPS unless given better data.

Core formulas (intuition)

  • Average RPS from DAUpeakRPS ≈ (DAU × actionsPerUserPerDay / secondsPerDay) × peakFactor
    • Example: 10M DAU, 20 reads/user/day → 10e6 × 20 / 86400 ≈ 2.3k average read RPS; peak might be ~7k with peakFactor 3.
  • Storage growthdailyWrites × recordSize × retentionDays (plus indexes overhead 2–3× rule of thumb for relational).
  • EgressRPS × responseSize; watch cross-AZ and CDN costs.

Rule-of-thumb ranges (not benchmarks—vary by workload)

ResourceBallpark (order of magnitude)
HTTP API work per coreOften ~100–2,000 RPS for simple CRUD in memory + DB; ~1–50 RPS for heavy CPU per request; I/O bound far lower if naive
SSD random readMicroseconds to low milliseconds application-visible with caching
Cross-region RTT~20–200 ms+ one-way depending on distance
JSON overheadFactor 2–5× vs compact binary for same logical data

State ranges, then narrow with profiling—never invent "exactly 847 RPS per core."

Concrete examples

  1. Photo hosting — 50M users, 5 uploads/user/day, 5 MB average image → 50e6 × 5 × 5 MB / day raw ingest order ~1.25 PB/day before compression—immediately forces object storage, async processing, CDN.
  2. Chat read path — 100k concurrent rooms, 1 msg/sec per room peak slices → 100k write QPS to message pipeline—partition by roomId becomes mandatory conversation.
  3. Small B2B SaaS — 5k companies × 20 users → 100k seats; 10 actions/user/day → low average RPS—single-region SQL may suffice until growth proves otherwise.

How to say it in 30 seconds

"I state assumptions: DAU, actions, payload, retention. I compute average RPS and apply a peak factor. I check storage and egress—one usually screams first. I give ranges for per-core throughput and validate in load tests."

Common follow-up questions

  • How do you handle unknown spike factor? Use peak-to-average from similar products or provision + autoscale + queue.
  • Where do estimates go wrong? Hot keys, N+1 queries, fat joins, unbounded fan-out.
  • Why separate read vs write QPS? Different caching and partitioning stories.

See also: System design curriculum overview

Last updated on

Spotted something unclear or wrong on this page?

On this page