Storage models & access paths
Core knowledge (memorize the spine)
| Idea | One sentence |
|---|---|
| Access path | How a query reaches rows with the fewest round-trips for your hottest patterns. |
| OLTP | Many short read/write transactions; row-oriented; strong constraints. |
| Warehouse / OLAP | Heavy scans & aggregates over history; columnar-friendly; batch / interactive BI. |
| SQL | Declarative language; relational engines implement it with tables, joins, transactions. |
| NoSQL | Family of stores (document, KV, wide-column, graph)—different consistency and query models, not “no schema.” |
| Polyglot persistence | Multiple stores on purpose, each owning a bounded workload with clear sync rules. |
Interviews reward: name workloads first, then vendor. “We need Postgres” without QPS shape + consistency + query shape is weak signal.
SQL vs NoSQL (what the distinction actually means)
SQL usually implies a relational engine: tabular data, joins, ACID transactions, constraints, mature query planner. You think in schemas and normal forms (then denormalize where measured).
NoSQL is an umbrella:
| Category | Data unit | Sweet spot | Rough tradeoff |
|---|---|---|---|
| Document | BSON/JSON-like documents | Aggregate-root reads, flexible attributes | Cross-document transactions often limited |
| KV / wide-column | key → value / wide rows | Massive scale, simple get/put by key | Ad-hoc relational queries painful |
| Graph | nodes & edges | Traversal workloads | Ops & query discipline at scale |
| Time-series | (time, tags) → metrics | Retention, downsampling | Not general OLTP |
Misleading shortcut: “NoSQL = infinitely scalable SQL.” Scale depends on partition key, hot keys, replication, and accepting consistency windows—not the label.
Document databases (NoSQL) — how teams use them
A document bundles fields and often nested arrays/objects (e.g. order + line items).
| Pattern | Meaning | When |
|---|---|---|
| Embedded | Child array/objects inside parent document | Bounded children; read with parent in one fetch |
| Referenced | Parent stores id of other docs; app or DB resolves | Large / growing child sets; shared entities across parents |
| Aggregate root (DDD) | Document boundary = one consistency unit | Updates stay inside one doc when possible |
Use cases document stores fit well:
- Catalog / product blobs with varying attributes (facets in search layer).
- User profiles with moderate nested settings (watch document size).
- Content / CMS with revision metadata.
- Session / cart style state with TTL (still mind durability rules).
Use cases that hurt if you only have documents:
- Ledger, balances, inventory needing serializable-ish invariants → relational or event-sourced core + projections.
- Many-to-many with heavy join-like fan-out across keys → relational or graph + careful caching.
- Warehouse-style analytics → copy to OLAP / columnar—do not brute-force on OLTP.
OLTP vs data warehouse
| Dimension | OLTP (operational) | Data warehouse / OLAP |
|---|---|---|
| Queries | point lookups, short joins, writes | Scans, aggregations, window logic |
| Latency | ms–low hundreds ms for user paths | seconds to minutes (or precomputed) |
| Freshness | near-real-time source of truth | Minutes–hours behind is normal (ETL/ELT) |
| Schema | normalized operational schema | Star/snowflake, wide fact tables, denormalized dims |
| Compute | row stores, indexes, transactions | Columnar, parallel query, partitions |
Typical architecture: OLTP system of record → CDC / batch export → warehouse → BI / ML features. Do not run giant aggregations on the primary without guardrails (replica, limits, or move to warehouse).
When to choose which database (decision checklist)
Answer in order:
- Consistency: money / inventory / uniqueness → need transactions + constraints? Default relational for that core.
- Query shape: heavy joins & ad-hoc filters → relational; get by id + bounded embed → document viable.
- Scale pattern: predictable partition key and tolerable eventual reads? Wide-column/KV class; hot key plan mandatory.
- Search / ranking: primary store still authoritative; Elasticsearch/OpenSearch etc. as secondary index with sync lag understood.
- Analytics: warehouse for historical aggregates, not the OLTP primary.
- Team & ops: migration story, backup, observability—boring tech you operate well beats exotic tech you cannot recover.
Use cases at a glance
| Scenario | Often fits | Why |
|---|---|---|
| E-commerce orders + payments | Relational (core) | balances, inventory, idempotency keys |
| Product catalog with facets | Document + search index | flexible attrs + ranking |
| Session / feature flags | KV / Redis class | TTL, high churn |
| Metrics / IoT | Time-series | retention, compression |
| Social “friends of friends” | Graph (or relational at moderate scale) | path queries |
| Executive dashboards | Warehouse | joins across history, pre-aggregation |
| Full-text search | Search engine | inverted index; not sole ledger |
Interview questions & model answers
Q1: “SQL or NoSQL for our startup?”
Answer: Start from access paths and invariants. If you need foreign keys, multi-row transactions, and flexible reporting joins, a relational database is the default. Add NoSQL when a specific workload (e.g. document-shaped reads at huge scale with a clear partition story) justifies a second store—not the reverse.
Q2: “When would you use a document DB vs Postgres JSONB?”
Answer: Postgres JSONB when you still want transactions across tables, joins, and one engine to operate. A document-native store when horizontal partition by document id, embedded aggregate reads, and operational model (replication, drivers) fit the team—always check cross-document consistency needs.
Q3: “How do you keep search in sync with the database?”
Answer: Treat search as eventually consistent: dual write is risky; prefer outbox / CDC / queue with idempotent indexing, version or cursor for replays, and monitoring lag. User-facing reads that need freshness go to primary or read-after-write routing.
Q4: “OLTP vs warehouse—why both?”
Answer: OLTP optimizes short consistent reads/writes. Warehouses optimize batch analytics without destroying OLTP with full scans. You replicate or export into the warehouse for BI and historical queries while keeping operational paths fast and safe.
Diagram (access path drives the model)
See also
- Storage engines, CAP & distributed data — B-tree vs LSM, CAP/PACELC, quorum, sharding internals (RDS / DynamoDB mental models)
- Indexing & query tuning — normalization, indexes, query optimization
- Transactions & isolation
- Replication, sharding & operations
- /databases
Last updated on
Spotted something unclear or wrong on this page?