Databases & caching

Data systems are where constraints become real: latency, correctness, money, and compliance all meet the storage layer. Staff-level answers name consistency guarantees, failure modes (replica lag, split brain, hot keys), and operational habits (migrations, backups, observability)—not only “pick Postgres.”

How to use this page

Use Core database ideas as a vocabulary checklist (ACID, isolation, plans, replication).
Use Caching alongside databases to keep source of truth vs derived copies explicit.
Skim Diagrams before mock interviews—practice narrating arrows out loud.
Run Things to prepare as a literal checklist for projects or interview stories.
Staff / SDE3 bar (Node.js + AWS): pair RDS (B-tree OLTP) with Dynamo-style partitioned thinking where relevant; treat ElastiCache as a consistency problem, not only latency—internals live under Storage engines, CAP & distributed data and Application caching & consistency.

Topic study plan (deep pages)

Each topic under /databases/topics/... follows: Core details → Understanding → Senior understanding → Diagram.

Topic	Focus
Storage models & access paths	SQL vs NoSQL, document modeling, data warehouse, use cases, when to choose which
Storage engines, CAP & distributed data	B-tree vs LSM, CAP/PACELC, quorum, read repair, sharding—RDS / Dynamo mental models
AWS data services & Node.js	RDS Proxy, DynamoDB GSI/LSI/streams, ElastiCache cluster, Memcached vs Redis
CDC & outbox projections	Transactional outbox, CDC, dual-write antipattern, projections
Transactions & isolation	ACID vs BASE, isolation & MVCC, 2PC vs saga, scopes & locks
Indexing & query tuning	Normalization, index patterns, query optimization, indexing Q&A
Replication, sharding & operations	Leaders, lag, partitions, backups, RPO/RTO
Application caching & consistency	ElastiCache/Redis topologies, eviction (LRU/LFU/W-TinyLFU), penetration/herd/avalanche, dual-write

Core database ideas

Storage models (when you’d reach for each)

Deep page: Storage models & access paths (SQL vs NoSQL, warehouse vs OLTP, document modeling, when to choose which).

Family	Strength	Typical costs / caveats
Relational (OLTP)	joins, constraints, transactions	schema migration discipline
Wide-column / KV	huge scale, simple access patterns	awkward ad-hoc analytics
Document	flexible shapes, nested reads	cross-document consistency harder
Search / analytics stores	rankings, aggregations	not a source of truth for money
Time-series	metrics, retention policies	cardinality & retention ops

Interview line: “I’d choose the model after access paths and consistency requirements, not the logo on the box.”

ACID (know what each letter guards)

Letter	Guards against	Implication
Atomicity	partial writes	batch work or explicit compensation
Consistency	invalid states your rules forbid	constraints + app invariants together
Isolation	concurrent anomalies	pick level per transaction, not globally maximal
Durability	acknowledged loss	WAL / replication; fsync semantics matter

Isolation anomalies (name → fix intuition)

Anomaly	One-line intuition	Common levers
Dirty read	seeing uncommitted work	isolation ↑, shorter txns
Non-repeatable read	row changes between reads in one txn	locking, snapshot isolation
Phantom read	new rows appear in range	range locks / higher isolation sparingly
Lost update	two writers clobber	optimistic version column, `SELECT … FOR UPDATE`

Not every endpoint needs SERIALIZABLE—cost and deadlock risk are real; scope transactions to the smallest critical section.

Indexing & the query path

Deep page: Indexing & query tuning (normalization, index Q&A, optimization patterns).

B-tree (and friends) power most default indexes; composite index column order matches filter + sort patterns.
Covering indexes avoid heap/table lookups when the planner can satisfy the query from the index alone.
Cardinality & statistics skew plans—stale stats → wrong nested loops vs seq scans.

Operational loop:

Identify hot query text (slow log / pg_stat_statements class metrics).
EXPLAIN (ANALYZE) or vendor equivalent—compare estimated vs actual rows.
Fix with query rewrite before index soup; add indexes only when the plan proves it.

Schema evolution

Treat migrations as deploy trains: additive-first columns, dual-write / backfill phases, backward-compatible reads until cutover. Breaking changes are timelines, not single PRs.

Connections & pools

Exhausted connection pools look like “the database is slow.” Watch wait time to acquire a connection, not only query duration.

Related depth: Database query path, Profiling services & async, /backend.
AWS + integration: AWS data services & Node.js, CDC & outbox.

Distributed data (production realism)

Replication basics

Pattern	Read freshness	Ops story
Single leader	strong on primary; replicas may lag	failover, promotion drills
Multi-leader	conflict handling required	rare; careful merge rules
quorum / consensus	pricey; correctness-critical configs	operational complexity

Replica lag surfaces as intermittent “wrong data” unless you engineer read-your-writes (sticky to primary, version checks, or user-visible staleness).

Partitioning / sharding

Choose a partition key from real access paths—resharding later is expensive. Watch hot partitions (celebrity keys) and plan salting or key redesign early.

Backup & recovery (you should be able to narrate)

RPO/RTO targets tied to product risk.
Restore drills on a schedule—not “we assume backups work.”
Logical vs physical backup tradeoffs (portability vs speed).

Caching alongside databases

Caches are derived state. The database (or durable event log) remains authority unless you consciously accept staleness with product guardrails.

Layers (stack north to south)

Layer	Latency win	Staleness / safety traps
Browser / CDN	huge for assets & static	personalized HTML at edge
Application memory	microsecond class	per-instance inconsistency
Shared remote (Redis class)	cross-instance sharing	thundering herd, key poisoning
DB buffer pool	automatic	not a substitute for app cache design

Patterns (who owns invalidation?)

Pattern	Flow	Owns freshness
Cache-aside	app reads cache, loads DB on miss, populates	app + TTL/events
Read-through	cache library loads on miss	cache service + TTL
Write-through	writes go to cache + DB together	tighter but write latency
Write-behind	writes acknowledged before durable flush	risk—queue loss

Mitigate cache stampede: TTL jitter, request coalescing / single-flight, probabilistic early refresh.

Related depth: Caching & consistency, Caching & retry amplification.

Access paths: list the 5–10 hottest queries and their filters/sorts.
Consistency: which reads must be linearizable vs OK with seconds of lag?
Growth: data volume curve, retention, archival, legal holds.
Multi-tenancy: tenant key in every table/index story; noisy neighbor controls.
Migration runway: additive schema steps, backfills, dual reads/writes.

Before production hardening

Backups with documented restore drill frequency and owner.
Replication lag dashboards and alerts tied to user-facing symptoms.
Connection pool limits per service; saturation metrics.
Slow query logging and top N review cadence.
Secrets rotation path; least privilege DB roles.
Idempotency for writers (see /backend/topics/idempotency-at-least-once).

Interview prep (90 minutes, repeatable)

Block	Minutes	Output
Draw read + write path with cache + primary + replica	15	can narrate every arrow
Walk one slow query from log → plan → fix	20	EXPLAIN vocabulary
Cache invalidation story for one feature	15	events, TTL, stampede
Failure: replica lag + dual write	15	UX + detection
Schema change without downtime	15	additive phases

Memorize one crisp line for each: isolation level choice, stampede mitigation, RPO/RTO, when not to cache.

Pitfalls

Treating read replicas as strongly consistent without version checks or routing rules.
Giant transactions holding locks while calling HTTP or queues.
Unbounded caches without eviction, TTL, or memory caps—OOM incidents.
Cache as source of truth for balances or inventory without durable reconciliation.
Indexing every column without examining plans—write amplification and bloated storage.

/backend — lifecycle, timeouts, messaging, caching entry points.
/performance — tail latency, retries, thundering herds, DB performance notes.
/security — injection, least privilege, secrets.