Back-of-the-Envelope Estimation: QPS, Storage, Bandwidth, Rules of Thumb

Definition

Back-of-envelope calculations are order-of-magnitude estimates used to sanity-check architecture before detail: daily active users to QPS, write volume to storage, payload sizes to network, and CPU rough budgets. Goals are ranges and bottleneck hints, not fake precision.

Why it matters in interviews

The E — Estimations step in RESHADED interview framework expects you to drive numbers early: drives database choice, shard count, cache size, and cost. Interviewers reward explicit assumptions ("assume 1 KB per row") and peak factors.

Tradeoffs

Too precise — Wastes time; wrong confidence.
Too vague — Cannot justify sharding vs single DB.
Peak factors — Real traffic is spiky; take 2–3× average for rough peak QPS unless given better data.

Core formulas (intuition)

Average RPS from DAU — peakRPS ≈ (DAU × actionsPerUserPerDay / secondsPerDay) × peakFactor
- Example: 10M DAU, 20 reads/user/day → 10e6 × 20 / 86400 ≈ 2.3k average read RPS; peak might be ~7k with peakFactor 3.
Storage growth — dailyWrites × recordSize × retentionDays (plus indexes overhead 2–3× rule of thumb for relational).
Egress — RPS × responseSize; watch cross-AZ and CDN costs.

Rule-of-thumb ranges (not benchmarks—vary by workload)

Resource	Ballpark (order of magnitude)
HTTP API work per core	Often ~100–2,000 RPS for simple CRUD in memory + DB; ~1–50 RPS for heavy CPU per request; I/O bound far lower if naive
SSD random read	Microseconds to low milliseconds application-visible with caching
Cross-region RTT	~20–200 ms+ one-way depending on distance
JSON overhead	Factor 2–5× vs compact binary for same logical data

State ranges, then narrow with profiling—never invent "exactly 847 RPS per core."

Concrete examples

Photo hosting — 50M users, 5 uploads/user/day, 5 MB average image → 50e6 × 5 × 5 MB / day raw ingest order ~1.25 PB/day before compression—immediately forces object storage, async processing, CDN.
Chat read path — 100k concurrent rooms, 1 msg/sec per room peak slices → 100k write QPS to message pipeline—partition by roomId becomes mandatory conversation.
Small B2B SaaS — 5k companies × 20 users → 100k seats; 10 actions/user/day → low average RPS—single-region SQL may suffice until growth proves otherwise.

How to say it in 30 seconds

"I state assumptions: DAU, actions, payload, retention. I compute average RPS and apply a peak factor. I check storage and egress—one usually screams first. I give ranges for per-core throughput and validate in load tests."

Common follow-up questions

How do you handle unknown spike factor? Use peak-to-average from similar products or provision + autoscale + queue.
Where do estimates go wrong? Hot keys, N+1 queries, fat joins, unbounded fan-out.
Why separate read vs write QPS? Different caching and partitioning stories.

Cross-links (building blocks)

Sharding, caching, CDN, and message queues turn estimation results into architecture—see System design curriculum overview.

On this page