Agentic architecture workflow
A production agent is an orchestrated state machine around an LLM. The LLM helps decide what to do next, but the application owns control: state, permissions, tools, storage, retries, budgets, human review, and audit.
Complete architecture
How to explain this in an interview: the agent loop is in the center, but the important production systems are around it: state, memory, retrieval, tool gateway, policy, human approval, and observability.
State-to-state workflow
Every transition should have a reason, budget, trace event, and terminal fallback.
Step-by-step request path
| Step | What happens | Owner |
|---|---|---|
| 1. Intake | Authenticate, rate-limit, attach tenant/user context. | Edge/API |
| 2. Classify | Detect intent, risk, required capabilities, policy scope. | Orchestrator |
| 3. Load state | Get thread checkpoint, task facts, counters, approvals. | State store |
| 4. Retrieve | Query vector/keyword indexes with ACL and freshness filters. | RAG service |
| 5. Plan/reason | Ask model for next action under schema and budget. | LLM + orchestrator |
| 6. Validate action | Check tool name, args, tenant, policy, idempotency, spend. | Tool gateway |
| 7. Human review | Pause for approval when action is risky. | HITL workflow |
| 8. Execute | Run allowed tool, return observation, record audit. | Tool service |
| 9. Update state | Append observation, counters, decisions, checkpoint. | Orchestrator |
| 10. Finalize | Validate answer, citations, output schema, and refusal rules. | Orchestrator |
| 11. Observe | Trace run, label outcome, feed eval dataset. | Observability |
Common architecture patterns
| Pattern | Use when | Watch out |
|---|---|---|
| Single agent + tools | One domain, small tool set, easy policy. | Tool sprawl over time. |
| Planner/executor | Task decomposition is useful but execution must be controlled. | Planner invents impossible steps. |
| Router + specialist agents | Distinct domains like billing, support, code, legal. | Cross-agent prompt injection and duplicated state. |
| Graph workflow | Loops, human approvals, long-running tasks, replay. | Poorly typed state becomes hard to debug. |
| Deterministic chain + LLM nodes | Mostly fixed path with small reasoning pockets. | Calling it an agent when no autonomy is needed. |
Failure modes by layer
| Layer | Failure | Control |
|---|---|---|
| Ingress | Abuse or prompt injection. | Input guardrails, abuse limits, policy classifier. |
| RAG | Wrong, stale, or unauthorized context. | ACL filters, versioning, hybrid search, citations. |
| LLM | Unsupported claim or wrong action. | Evidence rules, structured outputs, evaluations. |
| Tools | Side effect runs with bad args. | Schema validation, authZ, idempotency, approval. |
| State | Agent forgets or loops. | Checkpoints, counters, duplicate-action detection. |
| Memory | Sensitive or stale facts reused. | TTL, scopes, user deletion, memory review. |
| Observability | Incident has no useful trace. | Trace transitions, tool args, versions, outcomes. |
Interview questions
1. Draw a production agent architecture.
- Start with ingress/auth, then orchestrator, LLM, state store, RAG, tool gateway, human review, audit, and tracing. Make clear that tools and storage are outside the LLM.
2. What is the orchestrator responsible for?
- State, prompts, model calls, routing, validation, budgets, retries, checkpoints, and final response handling.
3. Why separate tool gateway from the model?
- The gateway enforces real permissions and safety. The model only proposes actions.
4. What makes an agent recoverable?
- Durable checkpoints, idempotent tools, audit logs, human resume paths, and terminal fallback states.
5. How do you keep the architecture simple?
- Prefer deterministic chains where possible, expose fewer tools, use graphs only when loops or approvals matter, and evaluate the actual workflow.
Related
Agentic fundamentals · Agent memory, state & storage · Agentic production
Spotted something unclear or wrong on this page?