Replication, sharding & operations
For B-tree vs LSM, CAP/PACELC, quorum, consistent hashing, and leaderless models, see Storage engines, CAP & distributed data—this page stays closer to operations, RDS/Dynamo-style failure stories, and backup math.
Core details
Single-leader replication: writes go to primary; async replicas serve read scale or analytics. Replication lag is normal—reads may be seconds behind.
Failover: promote replica → split brain and data loss window risks must be rehearsed (automatic vs operator-led).
Sharding / partitioning: split data by partition key so most queries hit one shard; resharding is painful—design key early from access paths.
Hot partition: one key (celebrity user) dominates a shard—mitigate with key redesign, salting, caching, or rate limits.
Backups:
| Concept | Interview one-liner |
|---|---|
| RPO | max acceptable data loss window |
| RTO | max acceptable downtime to restore |
| Restore drill | periodic proof backups are recoverable |
Logical vs physical backups trade portability vs speed and size.
Observability: lag metrics, replication slot backlog, disk growth, checkpoint behavior—tie alerts to customer-visible symptoms where possible.
Understanding
“Strong consistency on replicas” without routing sticky to primary or version checks invites bugs. User sees stale profile after update until you read your writes.
Global secondary indexes in sharded systems mean cross-shard work—cost jumps; prefer primary access pattern locality.
Senior understanding
| Failure | Story elements |
|---|---|
| Replica promoted with lag | in-flight writes, brain split, fencing old primary |
| Backup missing WAL | cannot point-in-time recover—policy gap |
| Shard imbalance | monitoring per-shard QPS/size; reshard plan |
| AZ outage | availability zone placement, quorum if using consensus stores |
Cross-link idempotent writers when failover causes duplicate delivery (idempotency).
Fencing tokens & stale primaries
After a failover, the old primary might still accept writes (split brain) if it believes it is leader.
Fencing = give the current epoch (lease / fencing token / ballot number) with each write; storage rejects writes with outdated tokens. Alternative: STONITH-class “make old node stop” in infra.
Diagram
See also
Last updated on
Spotted something unclear or wrong on this page?