GenAI mock interview drill

Use this page after finishing the GenAI topic path. It ties together LLM contracts, RAG, structured outputs, tools, agents, evals, safety, cost, latency, and rollout.

Mock prompt

Design a customer-support GenAI assistant for a SaaS product.

Requirements:

Answer from company docs, release notes, account settings, and prior support tickets.
Cite sources for factual claims.
Create refund-review tickets, but do not issue refunds directly.
Support multiple tenants and enterprise accounts.
Keep p95 latency under 2.5 seconds for common questions.
Keep cost per resolved ticket visible to product and support leadership.
Handle malicious uploaded PDFs or emails.

30-second answer

"I would build this as a grounded support workflow, not a free-form chatbot. The request goes through auth and risk classification, then either a simple FAQ path, RAG over ACL-filtered docs/tickets, or a guarded tool path for refund tickets. The model can draft answers or request tools, but the app validates evidence, permissions, schemas, idempotency, and safety. I would evaluate retrieval, citations, tool args, refusals, latency, and cost before rollout."

Use this when the interviewer asks for a quick framing.

2-minute answer

Clarify risk: normal support answers are medium risk; refund tickets are side effects and need approval/audit.
Data path: ingest docs, release notes, tickets, and account metadata with tenant, ACL, source version, freshness, and deletion metadata.
Retrieval: hybrid search plus rerank when needed; pack minimal evidence with citation ids.
Model contract: answer only from evidence for factual claims; refuse or ask clarification when evidence is missing.
Tooling: expose narrow tools such as get_account_plan and create_refund_review_ticket; validate user, tenant, order id, policy, and idempotency key server-side.
Safety: uploaded docs and emails are untrusted. Prompt injection cannot change tool policy or authorization.
Evaluation: top-k evidence recall, citation support, unsupported-claim rate, tool-route correctness, safe refusal, latency, and cost per resolved ticket.
Rollout: canary, shadow evals, trace redaction, feature flags, and rollback.

5-minute answer outline

Loading diagram…

Walk the diagram in this order:

Ingress and identity.
Routing by task and risk.
RAG for evidence.
Tool gateway for source-of-truth data or side effects.
Validation and response.
Traces, evals, and rollout controls.

Expected follow-ups

Follow-up	Strong answer direction
"Why not just put all docs in the prompt?"	Cost, latency, context dilution, ACL risk, freshness, and injection risk.
"How do you stop hallucinations?"	Grounding, citations, unsupported-claim checks, refusal when evidence is missing, evals with negative cases.
"How do you handle malicious PDFs?"	Treat retrieved text as data, not instructions; strip/flag injection-like content; restrict tools; require approval.
"What if retrieval returns nothing?"	Say evidence is missing, ask clarification, route to another source, or escalate; do not invent citations.
"How do you lower p95 latency?"	Route simple tasks, trim context, skip rerank when safe, parallelize independent retrieval, cache tenant-safely.
"How do you evaluate this?"	Retrieval recall, citation support, tool route/args, refusal correctness, cost, latency, human labels, canary telemetry.
"How do you prevent tenant leaks?"	ACL before packing, tenant-scoped caches, source metadata, authorization in tools, trace redaction.

Common bad answers

Bad answer	Why it fails
"Use a vector DB and GPT to answer."	Skips ACLs, source versions, hybrid search, citations, evals, and safety.
"Tell the model not to leak data."	Prompt-only safety cannot enforce permissions or tool policy.
"Use the biggest model for quality."	Ignores latency, cost, routing, evals, and cost per successful outcome.
"Let the agent call support APIs directly."	Skips tool gateway, idempotency, approval, audit, and least privilege.
"Measure with user thumbs-up only."	Too coarse; misses retrieval, grounding, tool route, safety, latency, and cost regressions.

Scoring rubric

Area	10/10 answer includes
Task framing	Clear product goal, user types, risk tiers, and refusal behavior.
Architecture	Ingress, auth, routing, RAG, model, tools, validation, tracing, rollout.
RAG	Ingestion, chunking, metadata, ACLs, hybrid search, packing, citations, evals.
Tools	Narrow schemas, user-scoped auth, idempotency, quotas, approval, audit.
Safety	Direct/indirect injection, PII, tenant isolation, output checks, red-team tests.
Evals	Offline golden sets, online monitors, human calibration, release gates.
Cost/latency	p95 budget, route selection, caching, context trimming, step budgets.
Communication	Clear tradeoffs, explicit failure paths, no vendor-name tourism.

Self-check

You are ready for this interview if you can explain:

Why RAG is more than a vector database.
Why structured output does not guarantee truth.
Why tool authorization must be server-side.
How to debug a hallucinated answer with citations.
How to reduce p95 latency without destroying quality.
How to evaluate an agent trajectory, not only the final text.
How prompt injection becomes worse when tools and memory exist.

Generative AI hub · RAG · Safety & injection · Cost & latency · Agentic production

On this page