Scalability Axes: Vertical, Horizontal, Stateless, Read vs Write
Definition
Scalability is the ability to handle more load (users, requests, data) with acceptable latency and cost by changing capacity. Two primary axes of scale are vertical scaling (bigger machine) and horizontal scaling (more machines). On top of that, stateless services let you add instances without shared in-memory state, and read scaling (replicas, caches) is often easier than write scaling (sharding, partitioning, ordering).
Why it matters in interviews
Interviewers want to see that you decompose load: where is the bottleneck (CPU, memory, I/O, network), and which lever (scale up, scale out, cache, split reads/writes) you reach for first. Confusing vertical vs horizontal, or claiming "we'll just add servers" without discussing state and data ownership, is a red flag. Articulating read vs write paths shows you can design for real systems (feeds, payments, object storage) where these concerns split.
Tradeoffs
| Approach | Pros | Cons |
|---|---|---|
| Vertical | Simpler ops, no re-partitioning, strong for monolithic DB | Hard upper bound, expensive, single point of failure for that node |
| Horizontal | Linear capacity, fault isolation (with good design) | Need load balancing, consistent hashing, distributed failure modes |
| Stateless app tier | Easy autoscaling, rolling deploys | Session/auth often pushed to DB, cache, or tokens; still need sticky concerns for some protocols |
| Read scaling | Caches and replicas are well understood | Stale reads, replica lag, cache invalidation complexity |
| Write scaling | Sharding/partitioning unlocks write throughput | Hot keys, cross-shard transactions, rebalancing pain |
Stateless services do not store session-specific data in process memory; externalize to shared stores (Redis, DB) or signed tokens (JWT). That is what allows N identical instances behind a load balancer (see Building blocks: load balancing).
Concrete examples
- E-commerce product catalog — Read-heavy. You scale reads with CDN + read replicas + application cache; writes go to primary. App servers are stateless so you can add instances during a sale.
- Order checkout (writes) — You partition orders by
customerIdororderIdrange/hash to scale writes; payment and inventory need stronger consistency on hot items, so you do not "just cache" without a story. - Real-time leaderboards — High write QPS to scores. You might scale vertically a specialized store first, then shard by game or region; reads might be eventually consistent aggregated views.
How to say it in 30 seconds
"I separate stateless API tier from stateful data. I scale reads with caches and replicas; writes need partitioning or serialization where ordering matters. I choose vertical when limits are clear and horizontal when I need elastic growth and fault isolation—and I always call out hot keys and cross-shard work."
Common follow-up questions
- When would you not use horizontal scaling for the database? Small datasets, strong transactional needs across all rows, or when operational simplicity dominates.
- What breaks if the app is not stateless? Sticky sessions required, uneven load after deploys, harder autoscaling and failover.
- How do you scale writes to a single hot key? Queue + serialize, shard logical key space, or redesign (e.g., counters with sharded aggregates).
Cross-links (building blocks)
- Load balancing, CDN, caching, replication, and sharding are where these axes become concrete—see the Folders table in System design curriculum overview.
See also: System design curriculum overview
Last updated on
Spotted something unclear or wrong on this page?