THN Interview Prep

Storage engines, CAP & distributed data

This page matches the SDE3 / staff bar: not only using RDS or DynamoDB, but explaining on-disk structure, trade-offs under partitions, and what breaks at scale. Stacks that combine Node.js, AWS, and Docker usually hit: Amazon RDS (Postgres/MySQL), DynamoDB (or Cassandra-class wide-column), and ElastiCache—wiring should be consistent with those failure modes.


B-trees vs LSM-trees (storage engine families)

B-tree family (many RDBMS)LSM-tree family (DynamoDB/RocksDB/Cassandra-style)
Write pathIn-place or copy-on-write pages; updates touch indexed pagesAppend to memtable, then flush to sorted SSTables; compaction rewrites
Read pathTypically fewer seeks per key lookup (tree height)May read several SSTable generations + bloom filters
StrengthPredictable read latency; mature range scans in RDBMSHigh write throughput, good for workloads that append and tune compaction
PainWrite amplification on hot pages; page splitsCompaction storms, read amplification if tuning poor, staleness of merged view during backlog

Interview line: “Postgres/MySQL engines are B-tree–centric for OLTP; Dynamo and many NoSQL stacks use LSM-style layers because they optimize for partitioned, high-ingest paths—then I’d ask about p99 during compaction and hot keys.”

Loading diagram…

AWS lens: RDS (Postgres/MySQL) → think B-tree, WAL, checkpoints, vacuum/bloat. DynamoDB → Amazon’s managed LSM-like partitioned store; throttling and hot partitions dominate ops stories.

Topic mind map (ASCII)

Storage engines & distributed data
├── On disk
│   ├── B-tree family (RDS OLTP) .......... seeks, pages, splits
│   └── LSM family (Dynamo/Cassandra) ... memtable, SSTables, compaction
├── Indexes
│   ├── clustered vs secondary vs heap .... locality & hops
│   └── composite / spatial / over-index . planner + write cost
├── Theory
│   ├── CAP .............................. partition → A vs C
│   └── PACELC ........................... else → L vs C (replica lag)
├── Replication
│   ├── single-leader .................... RDS primary + replicas
│   ├── multi-leader ..................... conflicts / merge
│   └── leaderless + quorum ............ W+R>N overlap intuition
└── Partitioning
    ├── hash vs range .................... skew patterns
    ├── consistent hashing ............... ring, virtual nodes
    └── hot partition (celebrity) ...... throttle, cache, key split

Indexing deep dive: clustered, composite, spatial, over-indexing

Clustered vs non-clustered

ConceptMeaningTypical engines
Clustered indexTable rows are stored in the order of that index (or directly in the index structure)InnoDB primary key is clustered; SQL Server clustered PK
Non-clustered / secondaryIndex entries point to row/PK/row id; extra hopSecondary indexes on InnoDB → back to PK
Heap table + indexesRows live in a heap; indexes reference row locationsPostgreSQL default: heap + separate indexes (no clustered PK in the MySQL sense)

Staff: “Clustered PK chooses physical locality for range scans on that key; random UUID PK can fragment inserts—sometimes time-ordered or sequenced keys trade insert locality vs security.”

Composite indexes

  • Leading column must match WHERE for the index to be used as you expect.
  • Range on a middle column can stop use of later columns in classic B-tree composites.

(Covered in more depth on Indexing & query tuning.)

Spatial indexes

  • R-tree, GiST (Postgres), etc.—support nearest, containment for geo.
  • Cost: larger structures, tooling for explain plans; still subject to selectivity and SRID mistakes.

Over-indexing cost

CostWhy it matters
Write amplificationEvery INSERT/UPDATE/DELETE touches each index
Planner confusionToo many similar indexes → wrong plan choice
Disk / memoryBuffer pool churn, slower backups & replication
Lock granularityExtra indexes can widen lock or version pressure

Rule: index from measured slow queries, not anticipation.


CAP theorem & PACELC

CAP (simplified): in a network partition, you cannot have both strong linearizability (often labeled “C” in talks) and full availability for all operations—P forces a choice.

Reality check: CAP is not “pick two forever.” It’s a partition scenario lens; most engineering is latency + consistency under normal conditions.

PACELC extends: Else (no partition)—choose Latency vs Consistency.

If…Trade
PartitionA vs C (availability of conflicting operations vs rejecting/stalling)
NormalL vs C (fast possibly stale vs slower fresher)

DynamoDB / Cassandra class: tunable R/W quorums and eventual paths—normal ops optimize L until you crank consistency or strong reads.

RDS: single-region primary + replicas → app still sees replica lag (PACELC “else”: you chose L on the read path).

Loading diagram…

Replication strategies & quorum

Single-leader (primary / replica)

  • All writes to leader; replicas async or sync chain.
  • RDS: Multi-AZ for durability/failover; read replicas off the path for scale (lag).

Multi-leader

  • Multiple acceptors of writes; conflict resolution required (last-write-wins, CRDTs, app merge).
  • Useful multi-region; hard—interviewers probe conflict stories.

Leaderless (Dynamo-style)

  • Any replica can serve reads/writes; correctness leans on quorum, version vectors or LWW, and background convergence:
    • Read repair — a read touches R replicas; if one is stale, issue a repair write so replicas catch up.
    • Hinted handoff — if a replica is down, a peer stores a hint and hands off writes when it returns (reduces temporary divergence).
    • Anti-entropy — periodic Merkle-tree (or similar) diffs between replicas to find missing data.
Loading diagram…

Quorum (W + R > N intuition)

For N replicas:

  • W = minimum nodes that ack write
  • R = minimum nodes read from

Often taught: if W + R > N, you can get overlap so reads see a write (tunable eventual vs strong is vendor-specific). DynamoDB uses explicit consistency knobs (e.g. strongly consistent reads on demand) rather than you setting W/R manually—but the mental model is still asked.

PatternNarrative
R=1, W=1fast, risky stale
W majoritydurability bias
R+W > Noverlap argument in interviews

Partitioning (sharding): hash vs range

StrategyProsCons
Hash of partition keySpread load, simple routingRange queries across keys expensive
RangeRange scans local to shardHot ranges (“latest” feeds)
CompositeSKU + tenant etc.design mistakes → uneven shards

Consistent hashing

  • Reduces full reshuffles when nodes join/leave—keys map to a ring; only neighbors move data.
  • Interview: mention virtual nodes to balance physical host skew.

Hot partitions (“celebrity problem”)

  • One key absorbs disproportionate traffic → throttle (DynamoDB), split keyspace, cache, write sharding (split logical key into N physical keys), rate limits.

Node + Docker: many containers share one logical “service”—they still hit one partition if the key is wrong. Fix is data model, not “more pods.”

Loading diagram…

Node.js, AWS, Docker (how this shows up)

PiecePractice
NodeSingle-threaded event loop; don’t do CPU-heavy compaction in-process; use RDS/Dynamo as dedicated stores
Connection poolspg / mysql2 pools per container; too many replicas × too many pods = connection storm on RDS
DockerHorizontal pods don’t fix hot partition; sticky in-memory cache in each pod ≠ shared consistency
RDSB-tree + WAL mental model; failover DNS; replica lag for reads
DynamoDBProvisioned vs on-demand, GSI/LSI access patterns, partition limits, consider DAX only with clear staleness story

Interview questions & short answers

Q: When does an LSM store beat a B-tree for your workload?
A: Sustained high write or append-heavy ingest where batched compaction is acceptable and read p99 is tuned with bloom filters, level shape, and hot key isolation—not “because NoSQL.”

Q: Explain PACELC in one breath.
A: If partitioned: availability vs consistency; if not: latency vs freshness—replica reads are the everyday ELC trade.

Q: Why W+R>N?
A: Guarantees at least one replica in the read set saw the write’s quorum—overlap argument; map to vendor semantics honestly.


See also

Last updated on

Spotted something unclear or wrong on this page?

On this page