THN Interview Prep

Interview Prep — Generative AI Systems

Companion to /gen-ai. /gen-ai is reference depth; this page is onsite cadence: opening script, retrieval detail, tooling, evaluation, economics, outages, drills, diagrams.


Rehearsal cadence

SlotActivity
10 min/daySpeak Opening frame (below) aloud
25 min ×2 / weekEnd-to-end diagram + seven failure arrows each with metric
WeeklyFollow-ups verbally, no notes

Opening frame (~60 seconds, fixed order)

  1. Intent: conversational QA • summarization • extraction • routing • rewriting—explicit if output requires validated JSON.
  2. Risk band: low-stakes copy vs contractual / financial / regulated (selects review, abstain, retention, logging stance).
  3. Grounding contract: internal corpus only • web tolerated • citations required? explicit refusal behavior?
  4. SLO triad: qualitative accuracy/refusal stance, latency tail emphasis, coarse economics ($/successful outcome class).

Interview hinge sentence:

“I treat completions as unreliable until corroborated—grounding, tools, validators, eval, degrade paths all express that mistrust.”


Retrieval & ingestion (don't hand-wave “we RAG”)

Source hygiene

Call out ingestion risks by source:

SourceTypical failure
SaaS connectorspartial sync / duplicate events
Web crawlboilerplate/nav pollution
User uploadsadversarial documents
Warehouse exportsambiguous staleness

Isolation line:

“ACL metadata is attached before embedding; retrieval filter is mandatory per request—no ‘global vector search’ for multi-tenant products.”

Chunking talk track

  • Justify chunk size/overlap with offline slices, not magic constants.
  • Call out tables, code, multilingual rows as special cases.

Retrieval strategies

Be ready to compare:

ModeStrengthWeakness
Dense vectorsfuzzy semantic matchrare tokens / SKUs
Sparse lexicalexact-ish hitsvocabulary mismatch
Hybrid + fusionpragmatic defaulttuning + latency

Reranking: improves precision after recall is healthy—not a band-aid for empty/bad corpora.

Packing context

  • Deduplicate overlapping chunks.
  • Trim boilerplate headings that steal token budget.
  • Keep user question visible without burying it under evidence spam.

Structured outputs & tools

Frame tools as narrow RPCs:

ControlOne-liner
AuthZUser identity authorizes side effects—not model confidence
Schema validationMalformed tool JSON fails closed
IdempotencyRetries converge on same business effect
TimeoutsModel cannot wedge thread pools indefinitely
AuditLog trace id, corpus version, retrieval ids, redacted args

Bounded JSON repair: cap attempts; never infinite token burn fixing syntax.


Evaluation & monitoring

Offline

Versioned golden sets with fingerprinted corpora + prompts; prevent train/eval leakage when iterating chunkers.

Online (examples of metrics & actions)

MetricEscalation
retrieval_empty_rateindexer / ACL regression
refusal_rate spikesafety filter / prompt drift
tool_error_ratedownstream outage / schema drift
cost_per_successcache hit rate / router model / context bloat

Human review

Schedule sampling for high-stakes misses—not only reactive firefighting.


Cost & latency levers

Mention without brand flex:

  • Route simple intents to smaller checkpoints.
  • Parallelize independent retrieval shards (mind provider rate limits).
  • Cache retrieve results (tenant-scoped keys) vs final answers (higher staleness risk—say it).
  • Stream tokens for perceived responsiveness when allowed.

Security & abuse (tight)

  • Direct + indirect prompt injection (documents you “trust”).
  • Tool sprawl = incident surface—principle of least capability.
  • Log redaction for tokens/PII; abuse metrics (tool attempt spikes).

Cross-study: /security, deeper patterns in /gen-ai.


Provider / model outage strategy

State a degraded path: retrieval-only answers with freshness disclaimer, feature flag off heavy path, deterministic canned responses tier, queue async deeper generation—pick what matches product honesty.

Avoid promising perfect continuity—credibility prefers explicit limitations.


Understanding — staff-level tells

Interviewers reward:

  • enumerated failure catalogs paired with telemetry
  • humility on hallucinations (architecture bounds, not scolding completions)
  • clear economics reasoning alongside architecture boxes

Avoid model-name tourism without system consequences.


Recognition cue map

Pressure questionPivot
“Stop hallucinations”grounding + abstain thresholds + factual tools
“Update knowledge reliably”idempotent ingestion + corpus versioning
“500 ms budget”parallel retrieve, omit rerank hot path, stream
“PII/regulated”regional indexes, retrieval filters, retention
“Red team mindset”injection matrix + tool quotas + escalation

Follow-up traps (answer <30 seconds)

  • Nothing retrieved: user-visible degrade + telemetry + fallback suggestions (not hallucinated citations).
  • Model vs tool conflict: deterministic rule—tool-derived facts beat prose inventions for structured fields.
  • Cross-tenant leak fear: namespaces + retrieval contract tests—not vibes.
  • LLM-as-judge bias: calibrated human anchors + disagreement surfacing cadence.

Memory hooks

  • Evidence → constrain → generate → validate.
  • Economics gates architecture glamor.
  • Broad tools widen incident cones.

Study drills

Drill 1 — Whiteboard (25 min)

Boxes: ingest → chunk/embed → hybrid retrieve → rerank? → policies → constrained decode → validators → UX. Annotate seven failures (empty retrieval, bad ACL, rerank stalls, schema reject, provider 429, toxicity filter, stale corpus) → each + metric name.

Drill 2 — KPI triage (10 min)

Pick task_success_rate, p95_latency, cost_per_success—give threshold + paging vs dashboard + owner.

Drill 3 — Attacker trio (10 min)

Indirect PDF injection • tool-mediated exfil attempt • denial via expensive tool recursion—detect + mitigate each.


Diagram — onsite story spine

Loading diagram…

Diagram — RAG control-plane mental model

Loading diagram…

Tie-ins

Last updated on

Spotted something unclear or wrong on this page?

On this page