Replication
What it is
Replication copies data from a primary (leader) to replicas (followers) for durability and read scaling. Semantics depend on sync vs async and quorum rules.
Leader-follower
- Single leader: all writes go to leader; replicas apply log or statement/row updates.
- Multi-leader: writes on multiple nodes—conflict resolution required (last-write-wins, CRDTs, app merge).
Typical async follower: lower write latency; replication lag means stale reads from replicas.
Loading diagram…
Quorum reads and writes (distributed stores)
In classical quorum replication (N replicas, write quorum W, read quorum R):
- R + W > N can provide stronger read-your-writes style guarantees when coordinated.
- W = 1, R = N favors writes; W = N, R = 1 favors durability but slow writes.
Concrete systems layer additional protocols (e.g. Paxos/Raft for leader election—see consensus).
Replication lag
- Async replication: reads from follower may be seconds behind—broken read-after-write if user expects immediate visibility.
- Mitigations: read from leader after write; monotonic reads by sticking to one replica; version checks in app.
When to use
- HA: failover to replica when leader dies.
- Read scaling: route read-heavy traffic to replicas (watch lag).
- Backups and DR: cross-region replica.
Alternatives
- No replication (single node): simplest; single point of failure.
- Sharding instead of replicas for write scale (see sharding).
Failure modes
- Split-brain if two nodes think they are leader—needs fencing and consensus.
- Unbounded lag: replica disk slow; replication slot backlog (Postgres) can fill disk.
- Lost writes if leader acknowledges before durable replicate (depends on sync level).
Interview talking points
- State RPO/RTO for failover; sync replication improves RPO, hurts latency—tie to latency-throughput.
- Explicitly handle read-your-own-write for UX-sensitive flows.
- Connect HA failover to health checks from load-balancer.
Related fundamentals
Last updated on
Spotted something unclear or wrong on this page?