Indexing & query tuning

Topic mind map (ASCII)

Indexing & query tuning
├── Normalization ............. 1NF / 2NF / 3NF / BCNF (+ when to denorm)
├── Index design .............. B-tree, composite order, covering, partial
├── Planner ................... stats, estimates vs actual, seq vs index scan
├── Workload fixes ............ N+1, pagination, SELECT *, OR → UNION
├── Ops ....................... pools, vacuum/bloat (MVCC context), slow log loop
└── Link to storage engines ... clustered vs heap (see storage-engines topic)

The visual model below keeps the tuning loop honest: start from measured slow-query evidence, prove the plan with actual rows and IO, change the query or index deliberately, and validate both read latency and write-path cost before shipping.

Indexing and query tuning loop showing slow query evidence, EXPLAIN ANALYZE, query rewrites, composite index design, validation, and write-path guardrails.

Core knowledge

Topic	Carry one line
Index	A sorted/probed structure so the engine skips full table scans for matching predicates.
Composite index	Column order must match how filters and ORDER BY are written.
Covering index	Index contains all columns needed—index-only plan possible.
Normalization	Reduce redundancy and update anomalies; may split tables.
Denormalize	Duplicate data for read speed—pay on writes and consistency.
Query plan	How the engine joins, filters, sorts—verify with EXPLAIN (ANALYZE).

Normalization (1NF → 3NF → BCNF)

Goal: each fact stored once in the “right” place so updates stay consistent.

1NF — atomic columns

Each column holds one value per row; no repeating groups hidden in a single column.
Interview trap: stuffing comma-separated tags in one string → harder to index/query; prefer junction table or array type with eyes open on indexing.

2NF — no partial dependency on part of a composite key

For tables with composite primary keys, non-key columns must depend on the whole key—not only half.
Classic fix: split into tables so each non-key depends on one key.

3NF — no transitive dependency

Non-key columns must depend only on the key, not on other non-key columns.
Example: if employee_id → dept_id and dept_id → dept_name, dept_name belongs in departments, not employees.

BCNF (Boyce–Codd)

Stricter than 3NF: every determinant is a candidate key. Appears in exam-style edge cases; in production you recognize “odd functional dependencies” and refactor.

When interviews ask “why normalize?”
→ Fewer update anomalies (same fact updated in two places), cleaner constraints, smaller write amplification for duplicated blobs.

Loading diagram…

Denormalization — when and how

Reason	Pattern	Cost
Read hot path	Duplicate label on child rows (e.g. `product_name` on line items)	Writes must update duplicates or accept staleness
Reporting	Materialized summary tables fed by jobs	Lag + job failures to monitor
Wide events	Snapshot fields for immutable history	Storage; great for audit / analytics

Staff line: “Denormalize only with a measured hot query and a plan for staleness / rebuild.“

Index types (conceptual)

Type	Good for	Watch
B-tree (default)	equality, range, sort	write overhead per index
Hash	exact match (where supported)	no range on hash
Unique	enforce one row per key	NULL semantics per engine
Partial / filtered	index subset of rows	must match query predicates
Full-text / GIN / special	text search inside DB	often pair with external search at scale

Composite column order (repeat until automatic):

Equality columns in the where clause (a = ? AND b = ?).
Range column last among filter columns (c > ?).
INCLUDE / covering columns for SELECT list to enable index-only scans when the engine supports it.

Indexing interview questions & answers

Q1: “Why is my index not used?”

Answer: Check: function on column (WHERE lower(email)) prevents use; wrong column order vs composite; parameter sniffing / stale stats; selectivity so planner prefers seq scan; query not actually filtering on the leading column. Prove with EXPLAIN and actual row counts.

Measure: slow log, pg_stat_statements class, trace one representative query.
EXPLAIN (ANALYZE): estimated vs actual rows—huge gap ⇒ stats or rewrite.
Rewrite: narrower SELECT, push predicates, limit join width, batch instead of loop.
Index: match leading predicates; avoid index soup.
Re-run under production-like volume; check locks and pool wait.

Common rewrites (mental patterns)

Problem	Direction
Select *	Project only needed columns—less IO, better covering chances
OR explosion	UNION of two selective branches or redesign predicates
Offset deep pagination	Keyset pagination (`WHERE id > ? ORDER BY id LIMIT …`)
Correlated subquery	Join or lateral when planner struggles
Implicit conversion	Match types—index columns not wrapped in casts/functions

Symptom → first split

Symptom	First split
p95 up, low DB CPU	pool wait, network, app mutex
p95 up, high DB CPU	bad plan, missing index, seq scan surprise
Intermittent timeouts	stats, lock wait, contention

Diagram — slow query loop

Loading diagram…

Diagram — normalization vs denormalize (read vs write)

Loading diagram…

Indexing & query tuning

Topic mind map (ASCII)

Core knowledge

Normalization (1NF → 3NF → BCNF)

1NF — atomic columns

2NF — no partial dependency on part of a composite key

3NF — no transitive dependency

BCNF (Boyce–Codd)

Denormalization — when and how

Index types (conceptual)

Indexing interview questions & answers

Q1: “Why is my index not used?”

Q2: “Too many indexes—what breaks?”

Q3: “Covering index vs wider composite?”

Q4: “N+1 in production—how do you fix?”

Query optimization playbook

Workflow (always this order)

Common rewrites (mental patterns)

Symptom → first split

Diagram — slow query loop

Diagram — normalization vs denormalize (read vs write)

See also

On this page