Interview Prep — Generative AI Systems

Companion to /gen-ai. /gen-ai is reference depth; this page is onsite cadence: opening script, retrieval detail, tooling, evaluation, economics, outages, drills, diagrams.

Rehearsal cadence

Slot	Activity
10 min/day	Speak Opening frame (below) aloud
25 min ×2 / week	End-to-end diagram + seven failure arrows each with metric
Weekly	`Follow-ups` verbally, no notes

Opening frame (~60 seconds, fixed order)

Intent: conversational QA • summarization • extraction • routing • rewriting—explicit if output requires validated JSON.
Risk band: low-stakes copy vs contractual / financial / regulated (selects review, abstain, retention, logging stance).
Grounding contract: internal corpus only • web tolerated • citations required? explicit refusal behavior?
SLO triad: qualitative accuracy/refusal stance, latency tail emphasis, coarse economics ($/successful outcome class).

Interview hinge sentence:

“I treat completions as unreliable until corroborated—grounding, tools, validators, eval, degrade paths all express that mistrust.”

Retrieval & ingestion (don't hand-wave “we RAG”)

Source hygiene

Call out ingestion risks by source:

Source	Typical failure
SaaS connectors	partial sync / duplicate events
Web crawl	boilerplate/nav pollution
User uploads	adversarial documents
Warehouse exports	ambiguous staleness

Isolation line:

“ACL metadata is attached before embedding; retrieval filter is mandatory per request—no ‘global vector search’ for multi-tenant products.”

Chunking talk track

Justify chunk size/overlap with offline slices, not magic constants.
Call out tables, code, multilingual rows as special cases.

Retrieval strategies

Be ready to compare:

Mode	Strength	Weakness
Dense vectors	fuzzy semantic match	rare tokens / SKUs
Sparse lexical	exact-ish hits	vocabulary mismatch
Hybrid + fusion	pragmatic default	tuning + latency

Reranking: improves precision after recall is healthy—not a band-aid for empty/bad corpora.

Packing context

Deduplicate overlapping chunks.
Trim boilerplate headings that steal token budget.
Keep user question visible without burying it under evidence spam.

Structured outputs & tools

Frame tools as narrow RPCs:

Control	One-liner
AuthZ	User identity authorizes side effects—not model confidence
Schema validation	Malformed tool JSON fails closed
Idempotency	Retries converge on same business effect
Timeouts	Model cannot wedge thread pools indefinitely
Audit	Log trace id, corpus version, retrieval ids, redacted args

Bounded JSON repair: cap attempts; never infinite token burn fixing syntax.

Evaluation & monitoring

Offline

Versioned golden sets with fingerprinted corpora + prompts; prevent train/eval leakage when iterating chunkers.

Online (examples of metrics & actions)

Metric	Escalation
retrieval_empty_rate	indexer / ACL regression
refusal_rate spike	safety filter / prompt drift
tool_error_rate	downstream outage / schema drift
cost_per_success	cache hit rate / router model / context bloat

Human review

Schedule sampling for high-stakes misses—not only reactive firefighting.

Cost & latency levers

Mention without brand flex:

Route simple intents to smaller checkpoints.
Parallelize independent retrieval shards (mind provider rate limits).
Cache retrieve results (tenant-scoped keys) vs final answers (higher staleness risk—say it).
Stream tokens for perceived responsiveness when allowed.

Security & abuse (tight)

Direct + indirect prompt injection (documents you “trust”).
Tool sprawl = incident surface—principle of least capability.
Log redaction for tokens/PII; abuse metrics (tool attempt spikes).

Cross-study: /security, deeper patterns in /gen-ai.

Provider / model outage strategy

State a degraded path: retrieval-only answers with freshness disclaimer, feature flag off heavy path, deterministic canned responses tier, queue async deeper generation—pick what matches product honesty.

Avoid promising perfect continuity—credibility prefers explicit limitations.

Understanding — staff-level tells

Interviewers reward:

enumerated failure catalogs paired with telemetry
humility on hallucinations (architecture bounds, not scolding completions)
clear economics reasoning alongside architecture boxes

Avoid model-name tourism without system consequences.

Recognition cue map

Pressure question	Pivot
“Stop hallucinations”	grounding + abstain thresholds + factual tools
“Update knowledge reliably”	idempotent ingestion + corpus versioning
“500 ms budget”	parallel retrieve, omit rerank hot path, stream
“PII/regulated”	regional indexes, retrieval filters, retention
“Red team mindset”	injection matrix + tool quotas + escalation

Follow-up traps (answer <30 seconds)

Nothing retrieved: user-visible degrade + telemetry + fallback suggestions (not hallucinated citations).
Model vs tool conflict: deterministic rule—tool-derived facts beat prose inventions for structured fields.
Cross-tenant leak fear: namespaces + retrieval contract tests—not vibes.
LLM-as-judge bias: calibrated human anchors + disagreement surfacing cadence.

Memory hooks

Evidence → constrain → generate → validate.
Economics gates architecture glamor.
Broad tools widen incident cones.

Study drills

Drill 1 — Whiteboard (25 min)

Boxes: ingest → chunk/embed → hybrid retrieve → rerank? → policies → constrained decode → validators → UX. Annotate seven failures (empty retrieval, bad ACL, rerank stalls, schema reject, provider 429, toxicity filter, stale corpus) → each + metric name.

Concepts & depth: /gen-ai
Prep cadence & rounds: /interview, 12-week plan overview, behavioral STAR
System-design skeleton: RESHADED framework