THN Interview Prep

Profiling services & async

Core details

Wall time ≫ CPU time ⇒ the service is waiting (IO, locks, pool, downstream, event-loop blocked)—not “slow functions.”

First instruments

SignalTool class
Per-request waterfalldistributed tracing (OpenTelemetry)
Pool waitmetrics on acquire time, not only query duration
Event-loop lag (Node)perf_hooks, APM lag histograms
Saturationqueue depth, thread pool, goroutine sched (runtime-specific)

Classic patterns

PatternWhat you seeDirection
Blocking ELCPU spikes + lag under loadnon-blocking libs, offload
N+1 downstreammany short spans to same dependencybatch, cache
Retry stormerror rate + latency spike togetherbackoff, jitter, budgets
Cold pooltimeouts after deploypool sizing, RDS Proxy class fixes

Understanding

Async does not mean “free”—each await is a continuation; under load, memory and scheduling matter. Parent deadline should propagate so children don’t do useless work after the client already timed out.

Senior understanding

Staff narrative: “I’d split queueing vs compute with a trace before changing algorithms.” Tie SLO to tail percentiles (Tail latency & SLOs).

Diagram

Loading diagram…

See also

Last updated on

Spotted something unclear or wrong on this page?

On this page