THN Interview Prep

Generative AI for Engineers

Generative AI systems are not "a model call with a prompt." A production system is a regular software system wrapped around a probabilistic model: it controls context, retrieval, tools, schemas, safety, evaluation, latency, cost, and fallback behavior.

The practical mindset is:

Treat the model as useful but untrusted. Ground it with evidence, constrain it with contracts, validate its outputs, and measure the whole workflow.


How to study this section

Use this path if you are learning the topic for interviews or production design:

StepPageWhat you should be able to explain
1LLM contracts, context & toolsHow tokens, context, hallucination, structured outputs, and tool calls fit together.
2RAG: ingest -> retrieve -> packHow external knowledge flows from documents into answers.
3Structured outputs & guardrailsHow to turn model output into safe application behavior.
4EvaluationsHow to know whether a prompt, model, retriever, or agent change is better.
5Safety & injectionWhy prompt injection is a privilege-boundary problem.
6Cost & latency routingHow to keep GenAI systems economically and operationally realistic.
7Agentic AI trackWhen to use agents, how to design stateful workflows, and how to ship them safely.

Interview cadence companion: /dsa/interview-prep/generative-ai.


Core mental model

Loading diagram…

Read the diagram as a control system:

  • The application owns identity, permissions, budgets, routing, retries, and user experience.
  • The model proposes language, structured data, or tool calls.
  • The retrieval/tool layers provide evidence and actions.
  • The validation/eval layers decide whether the result is usable.

The seven interview pillars

1. LLM contract

An LLM predicts output from learned parameters plus the request context. It does not automatically know your private database, current policies, or tenant permissions.

Good interview answer:

"I separate the model prior from runtime context. The application supplies instructions, retrieved evidence, tool schemas, and validation rules. The model can propose an answer or tool call, but the app decides what is allowed."

Common mistake: saying "the model knows" instead of explaining context, grounding, and validation.

2. Context and token budget

Context is expensive and limited. More context can increase cost, latency, privacy risk, and confusion.

Useful packing order:

  1. Stable system/developer policy.
  2. Task-relevant memory.
  3. Permission-filtered retrieved evidence.
  4. Few-shot examples only when they change behavior.
  5. Current user request near the end.

Interview follow-up to expect: "Why not use a larger context window?" Answer: because ranking, ACL filtering, freshness, dedupe, and injection risk still matter.

3. RAG

RAG is an evidence pipeline, not just a vector database.

The full path is:

ingest -> parse -> chunk -> label -> embed/index -> retrieve -> rerank -> pack -> answer -> evaluate

Strong answer points:

  • Attach tenant, ACL, source version, and freshness metadata before indexing.
  • Use hybrid retrieval when exact identifiers matter.
  • Preserve chunk lineage so wrong answers can be debugged.
  • Refuse or degrade when no evidence is found.

4. Structured outputs and tools

Structured output reduces shape errors. It does not guarantee truth.

Use:

  • Structured outputs when the model must return typed data.
  • Tool calling when the system needs external data or side effects.
  • Guardrails when policy must block, transform, or escalate behavior.

Rule for tools:

The model requests a tool call. The server validates identity, schema, quotas, idempotency, and authorization before executing anything.

5. Evaluation

Demos do not prove reliability. You need:

LayerPurpose
Offline evalsCatch regressions before release.
Online telemetryDetect drift, latency, cost, refusal, and tool errors.
Human reviewCalibrate ambiguous quality and high-risk misses.

For agents, evaluate the whole trajectory: retrieval, state transitions, tool calls, approvals, guardrails, and final answer.

6. Safety

Prompt injection is untrusted input trying to cross a privilege boundary.

Important distinction:

  • Direct injection: user says "ignore instructions."
  • Indirect injection: a PDF, email, webpage, ticket, or tool output contains hidden instructions.
  • Tool abuse: injected content tries to cause an action, not just a wrong answer.

Prompt-only defense is insufficient. Use narrow tools, server-side authorization, output checks, trace review, and human approval for high-risk actions.

7. Cost, latency, and rollout

Senior designs discuss economics early. Cost and latency come from model tokens, retrieval, reranking, tool IO, retries, tracing, and agent loops.

Levers:

  • Route simple tasks to cheaper paths.
  • Trim context without dropping required evidence.
  • Cache tenant-safely.
  • Parallelize independent retrieval/tool calls.
  • Stream for perceived latency.
  • Cap steps, retries, and repair loops.
  • Roll out with canaries, eval gates, and rollback flags.

Agentic AI track

Study agents after the basics above. An agent is a stateful workflow where the model may choose the next action under application control.

Recommended order:

PageUse it for
Agentic AI overviewTrack map and high-level lifecycle.
Agentic fundamentalsAgent, tool, state, observation, transition, termination.
Agentic architecture workflowEnd-to-end production architecture.
Agent memory, state & storageCheckpoints, memory, source-of-truth separation.
LangChainPrimitives and tool schemas.
LangGraphExplicit graphs, routers, checkpoints, interrupts.
LangSmithTracing, datasets, evals, release gates.
Agentic productionShipping, operations, security, and incident response.

Interview sound bite:

"Agents are state machines with stochastic transitions. I define legal states, allowed tools, stop conditions, budgets, and observability before I add autonomy."


Interview answer structure

When asked to design a GenAI system, answer in this order:

  1. Clarify task and risk: QA, summarization, extraction, routing, support, code, legal, finance, medical, internal tooling.
  2. Define grounding contract: internal corpus, web, tools, citations, refusal behavior.
  3. Sketch request path: ingress -> auth -> retrieval/tools -> model -> validation -> response.
  4. Explain data lifecycle: ingest, chunking, metadata, ACLs, freshness, deletion.
  5. Add safety: prompt injection, tool scope, PII, audit, human review.
  6. Add evals: offline dataset, online monitors, human calibration.
  7. Add cost/latency: routing, caching, parallelism, streaming, budgets.
  8. Add rollout: canary, shadow eval, prompt/model versioning, rollback.

Example interview prompt

Prompt: Design a customer-support assistant that answers from company docs and can create refund tickets.

Strong answer outline:

AreaGood answer
RiskRefund creation is a side effect, so answers and actions need different controls.
RAGIngest help docs, policies, tickets; attach tenant/product/version metadata; hybrid retrieve; cite evidence.
ToolingExpose create_refund_ticket, not arbitrary API access. Validate user, order id, refund policy, and idempotency key.
SafetyTreat uploaded screenshots and retrieved docs as untrusted. Block instructions inside documents from changing tool policy.
EvalTest top-k evidence, citation support, correct refusal, tool args, and refund policy edge cases.
OpsTrack p95 latency, retrieval empty rate, tool error rate, cost per resolved ticket, refund escalation rate.

Common bad answer:

"Put all support docs in a vector DB and ask the LLM to answer and call APIs."

Why it is weak: it skips ACLs, freshness, hybrid retrieval, citations, tool authorization, idempotency, evals, and escalation.


Debugging map

SymptomFirst place to inspectLikely fix
Confident wrong answerRetrieved chunks and packed promptImprove retrieval, citation rules, refusal threshold.
Correct source, wrong fieldStructured output and semantic validationAdd business validation or tool cross-check.
Slow p95Reranker, provider latency, serial IOParallelize independent work, route, cache, trim context.
Cost spikeAgent loops, retries, long contextAdd step budgets, cap repairs, route cheaper tasks.
Data leakACL filters, trace payloads, answer cacheDisable path, audit access, tighten tenant-scoped keys.
High refusal ratePolicy classifier or prompt versionCompare traces and rollback or recalibrate.

Memory hooks

  • Evidence first, generation second.
  • Schema checks shape; tools and evidence check truth.
  • Prompt safety helps, server-side policy enforces.
  • Evaluate the workflow, not only the final sentence.
  • Every agent needs stop conditions, budgets, and traces.

Self-check

You are ready to move through the topic pages if you can answer:

  • What does the application control that the model should not control?
  • What is the difference between model prior, retrieved context, and tool output?
  • Why does a larger context window not remove the need for RAG?
  • Which failure is worse: a wrong answer or a wrong side effect?
  • Which metrics prove quality, safety, latency, and cost are acceptable?

Mark this page when you finish learning it.

Spotted something unclear or wrong on this page?

On this page