THN Interview Prep

LLM contracts, context & tools

An LLM is a probabilistic next-token model wrapped by an application contract. It receives tokens, attends to context, predicts useful continuations, and returns text or structured actions. The engineering job is to turn that flexible generator into a reliable system with instructions, retrieval, schemas, validators, tools, and evaluations.

Do not explain LLM systems as "the model knows the answer." Explain them as: model prior + supplied context + decoding constraints + application checks.


Simple mental model

Loading diagram…

Explanation: the model does not store your runtime database. It predicts from learned parameters plus the context you provide. If the context is missing, stale, poisoned, or too long to fit well, the answer can be wrong even when the model is strong.


Core concepts

ConceptMeaningProduction implication
TokenUnit the model reads/writes. It may be a word, word piece, punctuation, or bytes.Cost, latency, and context limits are token-based.
Context windowMaximum tokens the request can carry.You need ranking, truncation, memory compression, and source selection.
System/developer instructionsHigh-priority behavior contract.Keep short, versioned, and testable.
User messageCurrent task/request.Keep close to the final model call so intent is not buried.
Retrieved contextExternal facts injected at runtime.Must be permission-filtered and cited.
DecodingHow next tokens are selected.Low randomness helps consistency but does not guarantee correctness.
Structured outputModel response constrained to schema.Makes integration safer, but values can still be logically wrong.
Tool callModel asks the app to run a named function.The app validates and executes; the model should not directly touch systems.

Context packing order

Loading diagram…

Rule: include the smallest context that can answer the task. More context can dilute attention, increase cost, leak data, and make prompt injection harder to reason about.


Hallucination

Hallucination is unsupported output: the model states something not grounded in reliable context or tool results. It is not only a model issue; it is often an architecture issue.

CauseExampleMitigation
Missing evidenceUser asks about a private invoice; no retrieval ran.Retrieve by tenant, cite evidence ids, say when data is unavailable.
Bad retrievalSimilar but wrong policy document ranked first.Hybrid search, rerank, freshness checks, chunk lineage.
Over-broad prompt"Answer confidently" encourages guessing.Require uncertainty and evidence-backed claims.
Schema-only confidenceJSON is valid but value is wrong.Validate business rules and cross-check with tools.
Tool observation mismatchTool returns partial data; model fills gaps.Return explicit status, missing fields, and refusal/degrade path.

Interview phrase: "Structured output prevents shape errors, not truth errors."


Tool calling lifecycle

Loading diagram…

The model requests a tool call. The application owns validation, authorization, execution, retries, idempotency, and audit.


Interview questions

1. How does an LLM work at inference time?

  • Text becomes tokens, tokens pass through the transformer, the model scores likely next tokens, decoding selects output tokens, and the application validates the result.

2. Why does a larger context window not solve RAG?

  • You still need permission filtering, ranking, dedupe, freshness, and context packing. A bigger window can carry more noise and more injected instructions.

3. What is the difference between JSON mode and structured outputs?

  • JSON mode targets valid JSON. Structured outputs target a supplied schema. Both still require semantic validation for business correctness.

4. Why is tool calling safer than asking the model to produce SQL or shell commands?

  • Typed tools expose narrow capabilities. The server can enforce identity, arguments, limits, and audit before touching real systems.

5. How do you reduce hallucination in production?

  • Ground answers in retrieved evidence or tools, require citations, expose uncertainty, block unsupported claims where possible, and evaluate with negative cases.

RAG: ingest -> retrieve -> pack · Structured outputs & guardrails · Agentic architecture workflow

Spotted something unclear or wrong on this page?

On this page