THN Interview Prep

Sharding

What it is

Sharding (horizontal partitioning) splits data across multiple physical nodes so each holds a subset of rows or keys. Clients or middleware route queries to the correct shard using a shard key.

Strategies

  • Range sharding: consecutive key ranges per shard (e.g. user ids 0–1M on shard A). Range scans are easy; hot ranges can overload one shard.
  • Hash sharding: hash(shardKey) mod numShards spreads load more evenly; range queries across shards are harder.
  • Directory / lookup service: central map from key to shard; flexible resharding at cost of extra hop and component to scale.
  hash(userId) --> shard index --> DB instance
  range(time)  --> time shard    --> (watch hot days)

Hot keys

A shard key that is too popular (celebrity user, viral object) puts disproportionate load on one node.

Mitigations:

  • Cache at edge (see caching).
  • Sub-shard or split hot key into synthetic suffixes with merger in app (advanced).
  • Avoid low-cardinality shard keys that collapse traffic.

Resharding

Growing data or skew forces split, merge, or rebalance.

  • Double-write / dual read during migration; verify checksums; switch traffic gradually.
  • Consistent hashing on ring can reduce full reshuffles when adding nodes (also used by load-balancer algorithms).
  • Plan downtime vs online migration; use backlog and rate limits to protect source DB.

When to use

  • Single-node DB cannot fit data or QPS after vertical scaling and replication read replicas are insufficient for writes.
  • Regulatory data residency (shard by region).

Alternatives

  • Partitioned tables inside one managed cluster (BigQuery, Citus-style)—shard-like without you operating shards.
  • No sharding: stronger consistency story; simpler ops until you must scale out.

Failure modes

  • Cross-shard transactions: expensive or unsupported; design aggregates without global transactions.
  • Rebalancing bugs: partial writes, wrong routing table.
  • Uneven growth: monitor shard size and QPS per shard.

Interview talking points

  • Always state shard key choice and hot key plan; use back-of-envelope for growth per shard.
  • Link resharding to zero-downtime stories and monitoring.
  • Compare to Dynamo-style partitioning in sql-vs-nosql.

Last updated on

Spotted something unclear or wrong on this page?

On this page