Interview Prep — Generative AI Systems
Companion to /gen-ai. /gen-ai is reference depth; this page is onsite cadence: opening script, retrieval detail, tooling, evaluation, economics, outages, drills, diagrams.
Rehearsal cadence
| Slot | Activity |
|---|---|
| 10 min/day | Speak Opening frame (below) aloud |
| 25 min ×2 / week | End-to-end diagram + seven failure arrows each with metric |
| Weekly | Follow-ups verbally, no notes |
Opening frame (~60 seconds, fixed order)
- Intent: conversational QA • summarization • extraction • routing • rewriting—explicit if output requires validated JSON.
- Risk band: low-stakes copy vs contractual / financial / regulated (selects review, abstain, retention, logging stance).
- Grounding contract: internal corpus only • web tolerated • citations required? explicit refusal behavior?
- SLO triad: qualitative accuracy/refusal stance, latency tail emphasis, coarse economics ($/successful outcome class).
Interview hinge sentence:
“I treat completions as unreliable until corroborated—grounding, tools, validators, eval, degrade paths all express that mistrust.”
Retrieval & ingestion (don't hand-wave “we RAG”)
Source hygiene
Call out ingestion risks by source:
| Source | Typical failure |
|---|---|
| SaaS connectors | partial sync / duplicate events |
| Web crawl | boilerplate/nav pollution |
| User uploads | adversarial documents |
| Warehouse exports | ambiguous staleness |
Isolation line:
“ACL metadata is attached before embedding; retrieval filter is mandatory per request—no ‘global vector search’ for multi-tenant products.”
Chunking talk track
- Justify chunk size/overlap with offline slices, not magic constants.
- Call out tables, code, multilingual rows as special cases.
Retrieval strategies
Be ready to compare:
| Mode | Strength | Weakness |
|---|---|---|
| Dense vectors | fuzzy semantic match | rare tokens / SKUs |
| Sparse lexical | exact-ish hits | vocabulary mismatch |
| Hybrid + fusion | pragmatic default | tuning + latency |
Reranking: improves precision after recall is healthy—not a band-aid for empty/bad corpora.
Packing context
- Deduplicate overlapping chunks.
- Trim boilerplate headings that steal token budget.
- Keep user question visible without burying it under evidence spam.
Structured outputs & tools
Frame tools as narrow RPCs:
| Control | One-liner |
|---|---|
| AuthZ | User identity authorizes side effects—not model confidence |
| Schema validation | Malformed tool JSON fails closed |
| Idempotency | Retries converge on same business effect |
| Timeouts | Model cannot wedge thread pools indefinitely |
| Audit | Log trace id, corpus version, retrieval ids, redacted args |
Bounded JSON repair: cap attempts; never infinite token burn fixing syntax.
Evaluation & monitoring
Offline
Versioned golden sets with fingerprinted corpora + prompts; prevent train/eval leakage when iterating chunkers.
Online (examples of metrics & actions)
| Metric | Escalation |
|---|---|
| retrieval_empty_rate | indexer / ACL regression |
| refusal_rate spike | safety filter / prompt drift |
| tool_error_rate | downstream outage / schema drift |
| cost_per_success | cache hit rate / router model / context bloat |
Human review
Schedule sampling for high-stakes misses—not only reactive firefighting.
Cost & latency levers
Mention without brand flex:
- Route simple intents to smaller checkpoints.
- Parallelize independent retrieval shards (mind provider rate limits).
- Cache retrieve results (tenant-scoped keys) vs final answers (higher staleness risk—say it).
- Stream tokens for perceived responsiveness when allowed.
Security & abuse (tight)
- Direct + indirect prompt injection (documents you “trust”).
- Tool sprawl = incident surface—principle of least capability.
- Log redaction for tokens/PII; abuse metrics (tool attempt spikes).
Cross-study: /security, deeper patterns in /gen-ai.
Provider / model outage strategy
State a degraded path: retrieval-only answers with freshness disclaimer, feature flag off heavy path, deterministic canned responses tier, queue async deeper generation—pick what matches product honesty.
Avoid promising perfect continuity—credibility prefers explicit limitations.
Understanding — staff-level tells
Interviewers reward:
- enumerated failure catalogs paired with telemetry
- humility on hallucinations (architecture bounds, not scolding completions)
- clear economics reasoning alongside architecture boxes
Avoid model-name tourism without system consequences.
Recognition cue map
| Pressure question | Pivot |
|---|---|
| “Stop hallucinations” | grounding + abstain thresholds + factual tools |
| “Update knowledge reliably” | idempotent ingestion + corpus versioning |
| “500 ms budget” | parallel retrieve, omit rerank hot path, stream |
| “PII/regulated” | regional indexes, retrieval filters, retention |
| “Red team mindset” | injection matrix + tool quotas + escalation |
Follow-up traps (answer <30 seconds)
- Nothing retrieved: user-visible degrade + telemetry + fallback suggestions (not hallucinated citations).
- Model vs tool conflict: deterministic rule—tool-derived facts beat prose inventions for structured fields.
- Cross-tenant leak fear: namespaces + retrieval contract tests—not vibes.
- LLM-as-judge bias: calibrated human anchors + disagreement surfacing cadence.
Memory hooks
- Evidence → constrain → generate → validate.
- Economics gates architecture glamor.
- Broad tools widen incident cones.
Study drills
Drill 1 — Whiteboard (25 min)
Boxes: ingest → chunk/embed → hybrid retrieve → rerank? → policies → constrained decode → validators → UX. Annotate seven failures (empty retrieval, bad ACL, rerank stalls, schema reject, provider 429, toxicity filter, stale corpus) → each + metric name.
Drill 2 — KPI triage (10 min)
Pick task_success_rate, p95_latency, cost_per_success—give threshold + paging vs dashboard + owner.
Drill 3 — Attacker trio (10 min)
Indirect PDF injection • tool-mediated exfil attempt • denial via expensive tool recursion—detect + mitigate each.
Diagram — onsite story spine
Diagram — RAG control-plane mental model
Tie-ins
- Concepts & depth:
/gen-ai - Prep cadence & rounds:
/interview, 12-week plan overview, behavioral STAR - System-design skeleton: RESHADED framework
Last updated on
Spotted something unclear or wrong on this page?