Agentic production & serving
Shipping agents means combining latency SLOs, financial guardrails, trust architecture, and recoverable execution: authZ per tool, quotas, human escalation, deterministic tests, trace redaction, and rollback hooks (LangSmith). If concepts are fuzzy, skim fundamentals first.
Process — hardened request lifecycle
Scoped tool gateway: central choke point verifying args, quotas, tenancy, cryptographic proof of invocation context.
Runtime state transitions
Operational meaning: production code needs metrics and handlers for every terminal path, not only the happy Completed path.
Defense-in-depth layering
Best practices recap
| Area | Principle |
|---|---|
| Budget | Caps on model turns, retries, parallelism; degrade gracefully. |
| Side effects | Idempotency tokens; never trust model raw strings for destructive ops—map to deterministic handles first. |
| Schema | Invalid tool args → deterministic refusal path; capped repair attempts only. |
| Testing | Golden multi-hop traces asserting calls + args, not flaky creative prose parity. |
| Observability | Correlate orchestration traces with infra metrics (#rate limit spikes, queue depth). |
| Data policy | Minimize payloads to LangSmith—mask PII consistently. |
| Identity | Carry tenant/user identity into tools; never let the model impersonate a broader service account. |
| Memory | Separate user memory, task scratchpad, and audit logs; apply retention and deletion policy explicitly. |
| Rollout | Use canaries, shadow evals, prompt/model versioning, and rollback flags. |
Security controls mapped to agent risk
| Risk | Control |
|---|---|
| Prompt injection | Treat retrieved text and tool output as untrusted; instruction hierarchy must not be overridden by documents. |
| Sensitive data disclosure | Redact traces, minimize context, enforce ACL before retrieval, and filter final output. |
| Improper output handling | Validate model output before passing it to HTML, SQL, shell, email, workflow engines, or APIs. |
| Excessive agency | Reduce tool count, permissions, and autonomy; require approval for irreversible actions. |
| Vector weakness | ACL-filter before ranking, dedupe chunks, detect stale indexes, and monitor citation quality. |
| Misinformation | Require evidence for factual claims, expose uncertainty, and degrade when sources are missing. |
| Unbounded consumption | Enforce per-user and per-tenant step, token, wall-clock, concurrency, and spend limits. |
Failure playbook (elevator-ready)
| Symptom | Likely systemic cause | Immediate mitigation |
|---|---|---|
| Cost spike | Recursive tool chatter | tighten step clamp + escalate |
| Stale factual answers | RAG ingestion drift | reindex + degrade message |
| Weird tool payloads | Injection via documents | sanitization boundary + verifier model |
| All responses slow tail | Sequential waterfall | parallelize orthogonal IO + prefetch |
| Wrong tenant data | Retrieval/tool authZ gap | disable tool, rotate traces, audit access, add ACL prefilter |
| Users see partial actions | Missing idempotency/rollback | stop writes, reconcile side effects, add transaction boundary |
| High refusals after deploy | Prompt/model/evaluator regression | rollback version, compare traces, inspect input classifier |
Expand security dialogue with /security + Gen AI ingestion notes on /gen-ai.
TypeScript parity
Agents on LangGraph.js: same graph patterns (START/END, ToolNode equivalents, streaming). Operational concerns remain identical though runtime differs (async ergonomics vs Python Gunicorn workers).
Interview questions — production
1. Outline an on-call playbook for exploding tool spend.
- Freeze feature flags, tighten budgets, pinpoint trace cluster w/ anomalies, rollback prompt version referencing dataset gate.
2. Why separate tenancy at orchestration vs embeddings index?
- Different threat surfaces—retrieval leakage still surfaces via ranking; unify policy engine.
3. Describe circuit breaker triggering when provider flaps.
- Short-circuit synchronous calls → cached canned responses + disclaim freshness.
4. What contract tests validate before deploy?
- Schema conformance, tool latency percentiles replayed shadow traffic, evaluator delta budgets.
5. Incident where logs looked green but users angry.
- Only happy instrumentation—need negative path spans + refusal distribution sampling.
6. How do you safely let an agent send emails or create tickets?
- Draft first, show recipient/body/action summary, require approval for external sends, use idempotency keys, and audit the final payload.
7. What is the difference between a guardrail and an evaluator?
- Guardrail acts during runtime to block or transform a request/output. Evaluator usually runs offline or in CI to decide whether a model/prompt/tool change should ship.
8. What should be in a production trace?
- User/tenant hash, model version, prompt/tool version, state transition, tool name, validated args, result status, latency, token/spend, guardrail decisions, and final outcome.
Related
Spotted something unclear or wrong on this page?