THN Interview Prep

Scalability Axes: Vertical, Horizontal, Stateless, Read vs Write

Definition

Scalability is the ability to handle more load (users, requests, data) with acceptable latency and cost by changing capacity. Two primary axes of scale are vertical scaling (bigger machine) and horizontal scaling (more machines). On top of that, stateless services let you add instances without shared in-memory state, and read scaling (replicas, caches) is often easier than write scaling (sharding, partitioning, ordering).

Why it matters in interviews

Interviewers want to see that you decompose load: where is the bottleneck (CPU, memory, I/O, network), and which lever (scale up, scale out, cache, split reads/writes) you reach for first. Confusing vertical vs horizontal, or claiming "we'll just add servers" without discussing state and data ownership, is a red flag. Articulating read vs write paths shows you can design for real systems (feeds, payments, object storage) where these concerns split.

Tradeoffs

ApproachProsCons
VerticalSimpler ops, no re-partitioning, strong for monolithic DBHard upper bound, expensive, single point of failure for that node
HorizontalLinear capacity, fault isolation (with good design)Need load balancing, consistent hashing, distributed failure modes
Stateless app tierEasy autoscaling, rolling deploysSession/auth often pushed to DB, cache, or tokens; still need sticky concerns for some protocols
Read scalingCaches and replicas are well understoodStale reads, replica lag, cache invalidation complexity
Write scalingSharding/partitioning unlocks write throughputHot keys, cross-shard transactions, rebalancing pain

Stateless services do not store session-specific data in process memory; externalize to shared stores (Redis, DB) or signed tokens (JWT). That is what allows N identical instances behind a load balancer (see Building blocks: load balancing).

Concrete examples

  1. E-commerce product catalog — Read-heavy. You scale reads with CDN + read replicas + application cache; writes go to primary. App servers are stateless so you can add instances during a sale.
  2. Order checkout (writes) — You partition orders by customerId or orderId range/hash to scale writes; payment and inventory need stronger consistency on hot items, so you do not "just cache" without a story.
  3. Real-time leaderboards — High write QPS to scores. You might scale vertically a specialized store first, then shard by game or region; reads might be eventually consistent aggregated views.

How to say it in 30 seconds

"I separate stateless API tier from stateful data. I scale reads with caches and replicas; writes need partitioning or serialization where ordering matters. I choose vertical when limits are clear and horizontal when I need elastic growth and fault isolation—and I always call out hot keys and cross-shard work."

Common follow-up questions

  • When would you not use horizontal scaling for the database? Small datasets, strong transactional needs across all rows, or when operational simplicity dominates.
  • What breaks if the app is not stateless? Sticky sessions required, uneven load after deploys, harder autoscaling and failover.
  • How do you scale writes to a single hot key? Queue + serialize, shard logical key space, or redesign (e.g., counters with sharded aggregates).

See also: System design curriculum overview

Last updated on

Spotted something unclear or wrong on this page?

On this page