THN Interview Prep

Indexing & query tuning

Topic mind map (ASCII)

Indexing & query tuning
├── Normalization ............. 1NF / 2NF / 3NF / BCNF (+ when to denorm)
├── Index design .............. B-tree, composite order, covering, partial
├── Planner ................... stats, estimates vs actual, seq vs index scan
├── Workload fixes ............ N+1, pagination, SELECT *, OR → UNION
├── Ops ....................... pools, vacuum/bloat (MVCC context), slow log loop
└── Link to storage engines ... clustered vs heap (see storage-engines topic)

Core knowledge

TopicCarry one line
IndexA sorted/probed structure so the engine skips full table scans for matching predicates.
Composite indexColumn order must match how filters and ORDER BY are written.
Covering indexIndex contains all columns needed—index-only plan possible.
NormalizationReduce redundancy and update anomalies; may split tables.
DenormalizeDuplicate data for read speed—pay on writes and consistency.
Query planHow the engine joins, filters, sorts—verify with EXPLAIN (ANALYZE).

Normalization (1NF → 3NF → BCNF)

Goal: each fact stored once in the “right” place so updates stay consistent.

1NF — atomic columns

  • Each column holds one value per row; no repeating groups hidden in a single column.
  • Interview trap: stuffing comma-separated tags in one string → harder to index/query; prefer junction table or array type with eyes open on indexing.

2NF — no partial dependency on part of a composite key

  • For tables with composite primary keys, non-key columns must depend on the whole key—not only half.
  • Classic fix: split into tables so each non-key depends on one key.

3NF — no transitive dependency

  • Non-key columns must depend only on the key, not on other non-key columns.
  • Example: if employee_id → dept_id and dept_id → dept_name, dept_name belongs in departments, not employees.

BCNF (Boyce–Codd)

  • Stricter than 3NF: every determinant is a candidate key. Appears in exam-style edge cases; in production you recognize “odd functional dependencies” and refactor.

When interviews ask “why normalize?”
→ Fewer update anomalies (same fact updated in two places), cleaner constraints, smaller write amplification for duplicated blobs.

Loading diagram…

Denormalization — when and how

ReasonPatternCost
Read hot pathDuplicate label on child rows (e.g. product_name on line items)Writes must update duplicates or accept staleness
ReportingMaterialized summary tables fed by jobsLag + job failures to monitor
Wide eventsSnapshot fields for immutable historyStorage; great for audit / analytics

Staff line: “Denormalize only with a measured hot query and a plan for staleness / rebuild.“


Index types (conceptual)

TypeGood forWatch
B-tree (default)equality, range, sortwrite overhead per index
Hashexact match (where supported)no range on hash
Uniqueenforce one row per keyNULL semantics per engine
Partial / filteredindex subset of rowsmust match query predicates
Full-text / GIN / specialtext search inside DBoften pair with external search at scale

Composite column order (repeat until automatic):

  1. Equality columns in the where clause (a = ? AND b = ?).
  2. Range column last among filter columns (c > ?).
  3. INCLUDE / covering columns for SELECT list to enable index-only scans when the engine supports it.

Indexing interview questions & answers

Q1: “Why is my index not used?”

Answer: Check: function on column (WHERE lower(email)) prevents use; wrong column order vs composite; parameter sniffing / stale stats; selectivity so planner prefers seq scan; query not actually filtering on the leading column. Prove with EXPLAIN and actual row counts.

Q2: “Too many indexes—what breaks?”

Answer: Every index slows inserts/updates/deletes and uses disk/memory. Too many also confuse optimizers. Start from top slow queries, add one index, re-profile writes, repeat.

Q3: “Covering index vs wider composite?”

Answer: Covering (including non-key columns or multi-column composite that matches filter + select) avoids heap lookups. Balance index size and write cost—not every query deserves a covering index.

Q4: “N+1 in production—how do you fix?”

Answer: Batch: join, IN (...) with chunking, or dataloader pattern. Prove reduction in round-trips with traces and query count metrics.


Query optimization playbook

Workflow (always this order)

  1. Measure: slow log, pg_stat_statements class, trace one representative query.
  2. EXPLAIN (ANALYZE): estimated vs actual rows—huge gap ⇒ stats or rewrite.
  3. Rewrite: narrower SELECT, push predicates, limit join width, batch instead of loop.
  4. Index: match leading predicates; avoid index soup.
  5. Re-run under production-like volume; check locks and pool wait.

Common rewrites (mental patterns)

ProblemDirection
Select *Project only needed columns—less IO, better covering chances
OR explosionUNION of two selective branches or redesign predicates
Offset deep paginationKeyset pagination (WHERE id > ? ORDER BY id LIMIT …)
Correlated subqueryJoin or lateral when planner struggles
Implicit conversionMatch types—index columns not wrapped in casts/functions

Symptom → first split

SymptomFirst split
p95 up, low DB CPUpool wait, network, app mutex
p95 up, high DB CPUbad plan, missing index, seq scan surprise
Intermittent timeoutsstats, lock wait, contention

Diagram — slow query loop

Loading diagram…

Diagram — normalization vs denormalize (read vs write)

Loading diagram…

See also

Last updated on

Spotted something unclear or wrong on this page?

On this page