Scalability Axes: Vertical, Horizontal, Stateless, Read vs Write

Definition

Scalability is the ability to handle more load (users, requests, data) with acceptable latency and cost by changing capacity. Two primary axes of scale are vertical scaling (bigger machine) and horizontal scaling (more machines). On top of that, stateless services let you add instances without shared in-memory state, and read scaling (replicas, caches) is often easier than write scaling (sharding, partitioning, ordering).

Why it matters in interviews

Interviewers want to see that you decompose load: where is the bottleneck (CPU, memory, I/O, network), and which lever (scale up, scale out, cache, split reads/writes) you reach for first. Confusing vertical vs horizontal, or claiming "we'll just add servers" without discussing state and data ownership, is a red flag. Articulating read vs write paths shows you can design for real systems (feeds, payments, object storage) where these concerns split.

Tradeoffs

Approach	Pros	Cons
Vertical	Simpler ops, no re-partitioning, strong for monolithic DB	Hard upper bound, expensive, single point of failure for that node
Horizontal	Linear capacity, fault isolation (with good design)	Need load balancing, consistent hashing, distributed failure modes
Stateless app tier	Easy autoscaling, rolling deploys	Session/auth often pushed to DB, cache, or tokens; still need sticky concerns for some protocols
Read scaling	Caches and replicas are well understood	Stale reads, replica lag, cache invalidation complexity
Write scaling	Sharding/partitioning unlocks write throughput	Hot keys, cross-shard transactions, rebalancing pain

Stateless services do not store session-specific data in process memory; externalize to shared stores (Redis, DB) or signed tokens (JWT). That is what allows N identical instances behind a load balancer (see Building blocks: load balancing).

Concrete examples

E-commerce product catalog — Read-heavy. You scale reads with CDN + read replicas + application cache; writes go to primary. App servers are stateless so you can add instances during a sale.
Order checkout (writes) — You partition orders by customerId or orderId range/hash to scale writes; payment and inventory need stronger consistency on hot items, so you do not "just cache" without a story.
Real-time leaderboards — High write QPS to scores. You might scale vertically a specialized store first, then shard by game or region; reads might be eventually consistent aggregated views.

How to say it in 30 seconds

"I separate stateless API tier from stateful data. I scale reads with caches and replicas; writes need partitioning or serialization where ordering matters. I choose vertical when limits are clear and horizontal when I need elastic growth and fault isolation—and I always call out hot keys and cross-shard work."

Common follow-up questions

When would you not use horizontal scaling for the database? Small datasets, strong transactional needs across all rows, or when operational simplicity dominates.
What breaks if the app is not stateless? Sticky sessions required, uneven load after deploys, harder autoscaling and failover.
How do you scale writes to a single hot key? Queue + serialize, shard logical key space, or redesign (e.g., counters with sharded aggregates).

Cross-links (building blocks)

Load balancing, CDN, caching, replication, and sharding are where these axes become concrete—see the Folders table in System design curriculum overview.

On this page