Performance & Optimization

Performance engineering is measurement → hypothesis → single change → proof, tied to a user- or revenue-facing metric. Without that anchor, “optimization” becomes guesswork and resume bullet theater.

How to use this page

Use Core basics as a vocabulary and mental model checklist.
Use Profiling playbooks as literal tool menus (what you open first).
Use Recognition cues to avoid mis-classifying a queueing problem as “CPU.”
Run Study sessions with real traces or realistic invented waterfalls.

Topic study plan (deep pages)

Topic notes: /performance/topics/... — Core details → Understanding → Senior understanding → Diagram.

Topic	Focus
Tail latency & SLOs	Percentiles, error budgets, queueing intuition
Profiling the browser	Long tasks, layout, network waterfall
Profiling services & async	CPU vs wait, pools, traces
Database query path	Plans, stats, locks, replicas
Caching & retry amplification	Stampede, jitter, hedging cautions

Authoring template (not in sidebar): `content/core-docs/performance/topics/topic-page-template.mdx` (`publishDocs: false`).

Core basics — decompose wall time

For any hop (browser frame, service request, DB query), break latency into:

Bucket	What it usually is	First instruments
Queueing	waiting for thread, pool, partition lock, event-loop turn	pool depth, sched latency, queue depth
Compute	hot code, regex, (de)serialization	CPU flame / sample
IO	remote calls, disk, replication	span waterfall, iostat class tools
Serialization	JSON/protobuf size, encoding	payload metrics, alloc profile
Coordination	locks, barriers, chatty fan-out	contention profiles, trace spans

Interview line: “I’d prove which bucket dominates before tuning—otherwise you ‘speed up’ the wrong stage.”

Little’s Law (qualitative)

In stable systems: average items in flight ≈ arrival rate × average time in system.
Sudden backlog growth often means saturation before CPU hits 100%.

Percentiles matter more than averages

Stat	Good for
p50 / median	typical feel, capacity planning shorthand
p90–p95	UX + many SLOs
high tails (p99+)	catastrophic retries / user abandonment

Optimize the percentile your product SLO names—and say explicitly when you sacrificed tail for median intentionally (rare—but must be deliberate).

Frontend performance (what to open first)

Chrome / Chromium Performance workflow

Start recording → reproduce interaction once cleanly.
Look for long tasks (main-thread blocks).
Bottom-up: sort by self time JS functions.
Layout markers: flagged reflow/layout events → map to offending read/write geometry pattern.
Network: check critical request priority, LCP candidate discovery.

Heap memory investigations:

Symptom	Tooling move
Detached DOM after route changes	Memory snapshot diff dominator paths
Steady climb	Allocation timeline / sampling profiler

Synthetic vs field:

Lighthouse lab reproducible regressions CI-friendly.
CrUX field validates low-end phones & flaky networks—you need both narratives.

Synergy: revisit /frontend for rendering & UX contracts.

Backend / services performance playbooks

CPU vs wall time split

Wall ≫ CPU ⇒ waiting (locks, pools, downstream). Profile async traces before micro-optimizing functions.

Classic mistakes:

Mistake	What it looks like	Fix direction
Blocking EL	latency spikes punctually	offload / non-blocking libs
Unbounded parallelism	downstream brownout	semaphores, bulkheads
Retry storm	cascading 429/503	backoff + jitter + budgets

Pools & saturation signals

Monitor waiting time acquiring DB connections—not only query duration. Exhausted pools mimic “DB slow” falsely.

Tracing must carry budget: parent deadline propagated to children to avoid hopeless late work.

Synergy: /backend, /databases.

Database performance fundamentals

Read path

Steps an interviewer expects you to articulate:

Step	Investigation
Find query text	pg_stat_statements class metrics / slow log
Get plan shape	EXPLAIN ANALYZE (vendor equivalent)
Check row estimates	cardinality / stats freshness story
Check access path	sequential vs index scan vs bitmap—why chosen

Staff nuance: sometimes “wrong index” isn’t—you need rewrite (covering projection, narrower SELECT, lateral batching patterns), not index soup.

Write path & amplification

indexes ↑ write cost; hot updates on wide indexes cause bloat contention—articulate consciously.

Replication & consistency illusion

Stale read replicas surfaced as intermittent “bugs” unless user-visible freshness cues engineered—classification error mistaken as raw performance issue historically common.

Caching layers & invalidation realism

Cache	Typical failure mode	Interview mitigation story
Browser HTTP	leaking auth’d HTML/CDN mishap	keyed URLs, surrogate keys
App local	stampedes after TTL	jitter + coalesce
Shared remote	stale business decisions	versioning + negative caching caution
ORM/session	phantom staleness layering	TTL + explicit invalidation events

Thundering herd: explain probabilistic early refresh conceptually—even if vendor-specific implementation deferred.

Load shapes & amplification

Pattern	Hazard	Controls
Retries without jitter	harmonic spikes	capped retries + backoff
Global fan-out timeouts	herd release	concurrency limits
Periodic cron alignment	spikes	jitter scheduling / rate smoothing
Autoscale lag	cold queue growth	provisioning / queue absorb strategy

Cold starts (serverless / JVM warmup): quantify cold penalty affecting tail—tie to concurrency provisioning money trade.

Reliability interplay (error budgets)

Shrinking tails often trades cost or complexity or correctness windows—explicitly reconcile with reliability error budget narratives when relevant—not every perf win is unconditionally “good ops.”

Recognition cues (symptoms → drills)

Symptom	Split first	Drill
High CPU but low QPS	user vs syscall vs GC	segregated profiler views
Memory climb steady	leak vs caching	snapshot diff timelines
API slow intermittent	tails vs saturation	percentile trace overlays
DB CPU low but latency high	waiting / locks / replicas	waits / locks / replication dashboard
Hot key / skew	uneven shard load	resharding narrative + cache pad

Staff follow-ups: “What dashboard would pre-detect recurrence?” “What synthetic probe fails CI next time?”

Memory hooks

USE per bottlenecked resource (Utilization, Saturation, Errors).
One change hypothesis—no multi-variable mystery PRs pretending clarity.
Little’s backlog intuition connects queue depth intuitively arrival rate interplay.

Reporting mean latency only while p95/p99 regress—retries hide the user story until production screams.
Index theater: adding indexes without measuring write amplification, bloat, and maintenance windows.
Tuning micro-benchmark laptops while production traffic is skewed, bursty, and multi-tenant.
Shipping caches without staleness UX—especially money, quotas, entitlement, entitlement counts.
Fixing CPU while latency is dominated by queueing—you win flame graphs but lose users.

/frontend — rendering + main-thread discipline.
/backend — timeouts, retries, pools, saturation.
/databases (if present in nav) — plans, isolation, replication lag.
/dsa — algorithmic hotspots when profiling points to asymptotics.

Performance & Optimization

Topic study plan (deep pages)

Core basics — decompose wall time

Little’s Law (qualitative)

Percentiles matter more than averages

Frontend performance (what to open first)

Chrome / Chromium Performance workflow

Backend / services performance playbooks

CPU vs wall time split

Pools & saturation signals

Database performance fundamentals

Read path

Write path & amplification

Replication & consistency illusion

Caching layers & invalidation realism

Load shapes & amplification

Reliability interplay (error budgets)

Recognition cues (symptoms → drills)

Memory hooks

Study sessions (timed)

Session R — Incident replay (35 min)

Session P — Flame/waterfall reading (25 min)

Session S — Constraint swap (12 min verbal)

Diagrams

Bottleneck narrowing

Retry amplification cartoon

Pitfalls

On this page