THN Interview Prep

Agentic production & serving

Shipping agents means combining latency SLOs, financial guardrails, trust architecture, and recoverable execution: authZ per tool, quotas, human escalation, deterministic tests, trace redaction, and rollback hooks (LangSmith). If concepts are fuzzy, skim fundamentals first.


Process — hardened request lifecycle

Loading diagram…

Scoped tool gateway: central choke point verifying args, quotas, tenancy, cryptographic proof of invocation context.


Runtime state transitions

Loading diagram…

Operational meaning: production code needs metrics and handlers for every terminal path, not only the happy Completed path.


Defense-in-depth layering

Loading diagram…

Best practices recap

AreaPrinciple
BudgetCaps on model turns, retries, parallelism; degrade gracefully.
Side effectsIdempotency tokens; never trust model raw strings for destructive ops—map to deterministic handles first.
SchemaInvalid tool args → deterministic refusal path; capped repair attempts only.
TestingGolden multi-hop traces asserting calls + args, not flaky creative prose parity.
ObservabilityCorrelate orchestration traces with infra metrics (#rate limit spikes, queue depth).
Data policyMinimize payloads to LangSmith—mask PII consistently.
IdentityCarry tenant/user identity into tools; never let the model impersonate a broader service account.
MemorySeparate user memory, task scratchpad, and audit logs; apply retention and deletion policy explicitly.
RolloutUse canaries, shadow evals, prompt/model versioning, and rollback flags.

Security controls mapped to agent risk

RiskControl
Prompt injectionTreat retrieved text and tool output as untrusted; instruction hierarchy must not be overridden by documents.
Sensitive data disclosureRedact traces, minimize context, enforce ACL before retrieval, and filter final output.
Improper output handlingValidate model output before passing it to HTML, SQL, shell, email, workflow engines, or APIs.
Excessive agencyReduce tool count, permissions, and autonomy; require approval for irreversible actions.
Vector weaknessACL-filter before ranking, dedupe chunks, detect stale indexes, and monitor citation quality.
MisinformationRequire evidence for factual claims, expose uncertainty, and degrade when sources are missing.
Unbounded consumptionEnforce per-user and per-tenant step, token, wall-clock, concurrency, and spend limits.

Failure playbook (elevator-ready)

SymptomLikely systemic causeImmediate mitigation
Cost spikeRecursive tool chattertighten step clamp + escalate
Stale factual answersRAG ingestion driftreindex + degrade message
Weird tool payloadsInjection via documentssanitization boundary + verifier model
All responses slow tailSequential waterfallparallelize orthogonal IO + prefetch
Wrong tenant dataRetrieval/tool authZ gapdisable tool, rotate traces, audit access, add ACL prefilter
Users see partial actionsMissing idempotency/rollbackstop writes, reconcile side effects, add transaction boundary
High refusals after deployPrompt/model/evaluator regressionrollback version, compare traces, inspect input classifier

Expand security dialogue with /security + Gen AI ingestion notes on /gen-ai.


TypeScript parity

Agents on LangGraph.js: same graph patterns (START/END, ToolNode equivalents, streaming). Operational concerns remain identical though runtime differs (async ergonomics vs Python Gunicorn workers).


Interview questions — production

1. Outline an on-call playbook for exploding tool spend.

  • Freeze feature flags, tighten budgets, pinpoint trace cluster w/ anomalies, rollback prompt version referencing dataset gate.

2. Why separate tenancy at orchestration vs embeddings index?

  • Different threat surfaces—retrieval leakage still surfaces via ranking; unify policy engine.

3. Describe circuit breaker triggering when provider flaps.

  • Short-circuit synchronous calls → cached canned responses + disclaim freshness.

4. What contract tests validate before deploy?

  • Schema conformance, tool latency percentiles replayed shadow traffic, evaluator delta budgets.

5. Incident where logs looked green but users angry.

  • Only happy instrumentation—need negative path spans + refusal distribution sampling.

6. How do you safely let an agent send emails or create tickets?

  • Draft first, show recipient/body/action summary, require approval for external sends, use idempotency keys, and audit the final payload.

7. What is the difference between a guardrail and an evaluator?

  • Guardrail acts during runtime to block or transform a request/output. Evaluator usually runs offline or in CI to decide whether a model/prompt/tool change should ship.

8. What should be in a production trace?

  • User/tenant hash, model version, prompt/tool version, state transition, tool name, validated args, result status, latency, token/spend, guardrail decisions, and final outcome.

Track hub · LangGraph internals

Spotted something unclear or wrong on this page?

On this page