Backend Engineering
Senior backend scopes span interfaces people depend on, data that outlives deploys, and failure modes nobody rehearsed. Staff signal = constraints first, minimal abstract nouns, explicit timeouts & budgets, measurable tradeoffs.
Node.js interview questions & fundamentals
Event loop, streams, buffers, ES vs CJS, async patterns, and curated question lists.
DevOps & Cloud
CI/CD, Docker, Kubernetes, AWS compute patterns, observability, and safe rollouts.
How to use this page
- Lock vocabulary in Core basics (
at-least-once, saga, saturation). - Drill Cue → architecture move repetitions—not buzzwords.
- Time-box study sessions verbatim.
- Memorize failure stories using Incident script scaffolding.
Topic study plan (deep pages)
Each topic under /backend/topics/... follows: Core details → Understanding → Senior understanding → Diagram.
| Topic | Focus |
|---|---|
| Request lifecycle & timeouts | Hop budgets, deadlines, cancellation |
| Concurrency & async models | Pools, loops, blocking, backpressure |
| Idempotency & at-least-once | Dedupe keys, outbox, sagas, webhooks |
| Caching & consistency | Layers, stampedes, invalidation realism |
| Observability, SLOs & alerts | RED/USE, sampling, actionable paging |
| AuthN/Z, sessions & JWT | Object-level ACL, bearer vs cookie trade |
Node specialization: /backend/nodejs.
Read the deep pages in the table order. The order is intentional: request lifecycle gives the boundary of one user action, concurrency explains how work is scheduled, idempotency handles retries, caching handles speed versus correctness, observability proves behavior, and auth closes the security loop.
Core basics
1. Request lifecycle (articulate crisply)
Narratable chain:
Client ──► Edge (TLS,L7 rules) ──► AuthN ──► AuthZ ──► Validation
──► Handler ──► Domain logic ──► Repositories / Brokers ──► Response mappingFor each arrow you should spontaneously name timeouts, retries applicability, idempotency expectations, payload limits.
Idempotency & safety
| HTTP-ish operation | Typical idempotency | Notes |
|---|---|---|
| Read | naturally safe | still rate-limit abuse |
| Create | key or natural dedupe | race double-submit |
| Update if match version | etag / precondition | exposes conflicts clearly |
| Delete | often idempotent logically | cascading side effects clarify |
Interview line: “Retries make duplicates inevitable—effects must converge.”
2. Concurrency models
Brief clarity (language agnostic):
| Model | Wins | Costs / risks |
|---|---|---|
| Thread pool per request | isolation | bounded pool starvation |
| Event loop async I/O | high fan-in | accidental CPU blocking libs |
| Actors/isolation lanes | deterministic ordering | mailbox backlog |
| Green threads/virtual threads | ergonomic blocking code | pinning native resources thoughtfully |
Articulate detecting blocked event loop: long CPU, sync FS, catastrophic regex.
3. Stateful vs stateless tiers
Stateful pitfalls: sticky sessions scaling; session store replication; revocation delays.
Prefer externalized session/token strategy with documented trade matrix earlier on this roadmap.
JWT talking points interviewers adore:
Audiences (aud), issuer (iss), exp/rotation, revocation story (opaque token + introspection vs short TTL + replay defense).
4. Messaging & async workflows
| Pattern | When | Hazard |
|---|---|---|
| Outbox transactional emit | coupling DB publish | ordering & poison |
| Saga / process mgr | spanning svcs compensation | orphaned partial state |
| Inbox dedupe consumer | Kafka/PubSub at-least-once business effect | TTL + key design |
| Dead letter queues | salvage path | unattended growth monitoring |
Discuss ordering: partition key vs relaxed ordering throughput.
5. Data access discipline
Classic staff traps:
N+1 pattern
Articulate detecting via query logs; fix with joins, dataloaders/batching, projecting queries.
Transactions & isolation (high-level story)
Explain lost update prevented by versioning; phantom reads motivating higher isolation sparingly—not every endpoint needs SERIALIZABLE.
Schema evolution strategy
Additive-first migrations + dual-writing phases + backward compatible reads—a timeline, not teleport.
6. Distributed systems vocabulary you must pronounce confidently
Practice short definitions:
At-least-once: duplicates possible—handler must converge.
Basically-available systems: degrade deliberately under partitions per product risk tolerance—not magical CAP hand-waving sans context.
Backpressure: shed load gracefully (bounded queue returning 429/503 semantics with Retry-After coherence).
7. Caching strata
| Cache | Typical invalidation | Pain |
|---|---|---|
| Local L1 | TTL / pub-sub | staleness hotspots |
| Shared Redis/mem | explicit keys/events | thundering herds |
| HTTP intermediaries | header nuance mis-set | secrecy leaks |
State CACHE STAMPEDE mitigation: probabilistic early expiration / request coalescing / jitter.
8. Observability trilogy (actionable—not decoration)
Structured logs correlate trace id traversing hops.
Metric families:
| Family | Signals |
|---|---|
| RED | Rate, Errors, Duration per service golden path |
| USE | Utilization, Saturation, Errors per infra resource |
Tracing pitfalls: exploding cardinality labels; oversampling drowning cost.
Alerting attaches explicit customer outcome—“p95 checkout dependency > 800ms breaches SLO impacting conversion guard.”
Backend answer template
When an interviewer asks a backend design or debugging question, keep the answer grounded:
- State the product path: checkout, login, upload, notification, search.
- Name the invariant: no double charge, no cross-tenant read, no silent data loss, bounded latency.
- Draw the request path: client, edge, auth, validation, handler, dependencies, async work.
- Place controls on the path: timeout, retry policy, idempotency key, rate limit, authorization check.
- Explain failure behavior: retry, degrade, enqueue, reject, or compensate.
- Prove it works: SLI/SLO, trace, log field, saturation metric, test, or rollout guard.
This shape prevents vague answers like “use Kafka and cache it.” It forces each component to earn its place.
Understanding
Partial failure dominates perfect happy paths—design degraded modes:
| Failure | Product behavior | Observability marker |
|---|---|---|
| DB read replica lag | staleness disclaimers | replication delay gauge |
| Downstream outage | graceful fallback/feature off | breaker open ratio |
| Queue backlog | shedding / delayed UX | lag depth alerting |
Operational excellence is preventative: progressive rollout, automatic rollback hooks, curated failure injection drills.
Recognition cues
| Phrase | First mental model expansion |
|---|---|
| Idempotent webhook | signature ordering, replay table, transactional commit boundary |
| Rate limit bursts | bucket vs leaky, global vs sticky keys, Retry-After contract |
| Hot key/cache stampede | key redesign, probabilistic TTL, layering |
| Saga compensation | reversible steps enumerated + manual intervention path |
| Exactly once | refactor honest idempotent converge narrative |
| Noisy neighbor multi-tenant | quotas, fairness queues, noisy tenant isolation breakout |
Staff follow-ups: dashboard proving fix, chaos experiment story, SLA math.
Common interview prompts
| Prompt | What a strong answer must include |
|---|---|
| “Design a reliable payment webhook processor.” | signature verification, idempotency key, transactional boundary, at-least-once retries, DLQ, replay safety |
| “Your p99 latency doubled after a deploy.” | isolate path by trace, inspect pool waits/event loop lag/downstream latency, rollback guard, add regression metric |
| “How would you cache product pages?” | freshness class, public vs personalized data, TTL/invalidation, stampede control, correctness escape hatch |
| “How do you prevent cross-tenant data leaks?” | authZ near data, tenant predicates, object-level checks, integration tests, audit logs |
| “Queue backlog is growing.” | arrival vs service rate, consumer saturation, poison messages, partition skew, backpressure/shaping |
Memory hooks
- Queues without saturation metrics are blind luggage piles.
- Retries multiply traffic—budget them.
- Pick one heroic SLI per story.
Study pattern
A — Endpoint postmortem (12 min narrated)
Tell: happiest path latency composition; degrade path; authoritative status codes mapping to client retries.
B — Distributed diagram (35 min)
Client→edge→service→cache/db/queue. Annotate: timeout per hop * budget math; idempotent key location; breaker thresholds.
C — Incident compression (15 min template)
Impact duration → Detection gap → Immediate mitigation → Permanent remedy → Automated guard prevented recurrence.
Diagrams
Path + controls
Messaging reliability
Pitfalls
Synchronous retries storming cascading dependency outages.
Returning opaque 500 for validation mistakes—teaches clients sloppy retry pathology.
Caches without TTL story → silent stale financial decisions subtle bugs.
Skipping payload size guards facilitating OOM amplification path.
Interview scripts
Timeout narrative:
“Every downstream gets an explicit deadline participating in distributed budget—if Postgres p95 climbs, we degrade non-critical enrichment rather than stalling checkout.”
Idempotent processor:
“At-least-once delivery duplicates exist by definition—I store a deterministic idempotency key in a transactional outbox-aligned table so retries converge without double-charging effects.”
Related depth
/performance profiling cross-layer /databases plans & isolation /security authZ pitfalls /dsa large design plus coding synergy.
Mark this page when you finish learning it.
Spotted something unclear or wrong on this page?