Replication

What it is

Replication copies data from a primary (leader) to replicas (followers) for durability and read scaling. Semantics depend on sync vs async and quorum rules.

Replication lag and failover flow showing writes to the leader, log shipping to replicas, stale follower reads, read-your-write choices, and promotion fencing.

Leader-follower

Single leader: all writes go to leader; replicas apply log or statement/row updates.
Multi-leader: writes on multiple nodes—conflict resolution required (last-write-wins, CRDTs, app merge).

Typical async follower: lower write latency; replication lag means stale reads from replicas.

Loading diagram…

Quorum reads and writes (distributed stores)

In classical quorum replication (N replicas, write quorum W, read quorum R):

R + W > N can provide stronger read-your-writes style guarantees when coordinated.
W = 1, R = N makes writes fast but reads expensive because every replica may need to be checked.
W = N, R = 1 makes writes slow and durable across all replicas, then reads can be served quickly from one replica.

Concrete systems layer additional protocols (e.g. Paxos/Raft for leader election—see consensus).

Do not mix these models casually in interviews: async primary-replica databases, Dynamo-style quorum stores, and consensus-backed replicated logs expose different guarantees and failure behavior.

Replication lag

Async replication: reads from follower may be seconds behind—broken read-after-write if user expects immediate visibility.
Mitigations: read from leader after write; monotonic reads by sticking to one replica; version checks in app.

After a user changes their email, route that user's next few profile reads to the leader or require a replica version at least as new as the write response. Other users can still read from replicas because a short delay is less harmful. This keeps the UX strong without forcing every read in the system through the leader.

When to use

HA: failover to replica when leader dies.
Read scaling: route read-heavy traffic to replicas (watch lag).
Backups and DR: cross-region replica.

Alternatives

No replication (single node): simplest; single point of failure.
Sharding instead of replicas for write scale (see sharding).

Failure modes

Split-brain if two nodes think they are leader—needs fencing and consensus.
Unbounded lag: replica disk slow; replication slot backlog (Postgres) can fill disk.
Lost writes if leader acknowledges before durable replicate (depends on sync level).

Common mistakes

Sending all reads to replicas without handling read-your-write paths.
Promoting a stale replica during failover and silently losing acknowledged writes.
Confusing backup with HA: a backup can restore data, but it does not serve traffic during a leader failure.
Ignoring write latency from synchronous replication across long network distances.

Interview talking points

State RPO/RTO for failover; sync replication improves RPO, hurts latency—tie to latency-throughput.
Explicitly handle read-your-own-write for UX-sensitive flows.
Connect HA failover to health checks from load-balancer.

Interview answer shape

Start with the goal: read scale, high availability, disaster recovery, or stronger durability.
Choose sync or async replication and state the latency and data-loss tradeoff.
Define read routing: leader reads for fresh paths, follower reads for scalable stale-tolerant paths.
Explain failover: health checks, fencing, promotion order, and how clients discover the new leader.
Name metrics: replica lag, replication backlog, failover time, write latency, and RPO/RTO.

Common follow-ups: leader election, split-brain prevention, quorum reads and writes, cross-region replication, and how to handle stale reads after a write.