THN Interview Prep

Replication

What it is

Replication copies data from a primary (leader) to replicas (followers) for durability and read scaling. Semantics depend on sync vs async and quorum rules.

Leader-follower

  • Single leader: all writes go to leader; replicas apply log or statement/row updates.
  • Multi-leader: writes on multiple nodes—conflict resolution required (last-write-wins, CRDTs, app merge).

Typical async follower: lower write latency; replication lag means stale reads from replicas.

Loading diagram…

Quorum reads and writes (distributed stores)

In classical quorum replication (N replicas, write quorum W, read quorum R):

  • R + W > N can provide stronger read-your-writes style guarantees when coordinated.
  • W = 1, R = N favors writes; W = N, R = 1 favors durability but slow writes.

Concrete systems layer additional protocols (e.g. Paxos/Raft for leader election—see consensus).

Replication lag

  • Async replication: reads from follower may be seconds behind—broken read-after-write if user expects immediate visibility.
  • Mitigations: read from leader after write; monotonic reads by sticking to one replica; version checks in app.

When to use

  • HA: failover to replica when leader dies.
  • Read scaling: route read-heavy traffic to replicas (watch lag).
  • Backups and DR: cross-region replica.

Alternatives

  • No replication (single node): simplest; single point of failure.
  • Sharding instead of replicas for write scale (see sharding).

Failure modes

  • Split-brain if two nodes think they are leader—needs fencing and consensus.
  • Unbounded lag: replica disk slow; replication slot backlog (Postgres) can fill disk.
  • Lost writes if leader acknowledges before durable replicate (depends on sync level).

Interview talking points

  • State RPO/RTO for failover; sync replication improves RPO, hurts latency—tie to latency-throughput.
  • Explicitly handle read-your-own-write for UX-sensitive flows.
  • Connect HA failover to health checks from load-balancer.

Last updated on

Spotted something unclear or wrong on this page?

On this page