Agentic architecture workflow

A production agent is an orchestrated state machine around an LLM. The LLM helps decide what to do next, but the application owns control: state, permissions, tools, storage, retries, budgets, human review, and audit.

Complete architecture

Loading diagram…

How to explain this in an interview: the agent loop is in the center, but the important production systems are around it: state, memory, retrieval, tool gateway, policy, human approval, and observability.

State-to-state workflow

Loading diagram…

Every transition should have a reason, budget, trace event, and terminal fallback.

Step-by-step request path

Step	What happens	Owner
1. Intake	Authenticate, rate-limit, attach tenant/user context.	Edge/API
2. Classify	Detect intent, risk, required capabilities, policy scope.	Orchestrator
3. Load state	Get thread checkpoint, task facts, counters, approvals.	State store
4. Retrieve	Query vector/keyword indexes with ACL and freshness filters.	RAG service
5. Plan/reason	Ask model for next action under schema and budget.	LLM + orchestrator
6. Validate action	Check tool name, args, tenant, policy, idempotency, spend.	Tool gateway
7. Human review	Pause for approval when action is risky.	HITL workflow
8. Execute	Run allowed tool, return observation, record audit.	Tool service
9. Update state	Append observation, counters, decisions, checkpoint.	Orchestrator
10. Finalize	Validate answer, citations, output schema, and refusal rules.	Orchestrator
11. Observe	Trace run, label outcome, feed eval dataset.	Observability

Common architecture patterns

Pattern	Use when	Watch out
Single agent + tools	One domain, small tool set, easy policy.	Tool sprawl over time.
Planner/executor	Task decomposition is useful but execution must be controlled.	Planner invents impossible steps.
Router + specialist agents	Distinct domains like billing, support, code, legal.	Cross-agent prompt injection and duplicated state.
Graph workflow	Loops, human approvals, long-running tasks, replay.	Poorly typed state becomes hard to debug.
Deterministic chain + LLM nodes	Mostly fixed path with small reasoning pockets.	Calling it an agent when no autonomy is needed.

Failure modes by layer

Layer	Failure	Control
Ingress	Abuse or prompt injection.	Input guardrails, abuse limits, policy classifier.
RAG	Wrong, stale, or unauthorized context.	ACL filters, versioning, hybrid search, citations.
LLM	Unsupported claim or wrong action.	Evidence rules, structured outputs, evaluations.
Tools	Side effect runs with bad args.	Schema validation, authZ, idempotency, approval.
State	Agent forgets or loops.	Checkpoints, counters, duplicate-action detection.
Memory	Sensitive or stale facts reused.	TTL, scopes, user deletion, memory review.
Observability	Incident has no useful trace.	Trace transitions, tool args, versions, outcomes.

Interview questions

1. Draw a production agent architecture.

Start with ingress/auth, then orchestrator, LLM, state store, RAG, tool gateway, human review, audit, and tracing. Make clear that tools and storage are outside the LLM.

Follow-up: What are the most important boxes?

Orchestrator, state store, tool gateway, RAG service, policy checks, human review, and trace/eval store. Those make the model controllable.

2. What is the orchestrator responsible for?

State, prompts, model calls, routing, validation, budgets, retries, checkpoints, and final response handling.

3. Why separate tool gateway from the model?

The gateway enforces real permissions and safety. The model only proposes actions.

4. What makes an agent recoverable?

Durable checkpoints, idempotent tools, audit logs, human resume paths, and terminal fallback states.

5. How do you keep the architecture simple?

Prefer deterministic chains where possible, expose fewer tools, use graphs only when loops or approvals matter, and evaluate the actual workflow.

Interview answer template

For "Design a production agent", answer:

Clarify goal, risk, allowed actions, and stop conditions.
Draw ingress/auth, orchestrator, state store, model, RAG, tool gateway, human review, audit, and tracing.
Explain request flow: classify -> retrieve/tool -> decide -> validate -> final/escalate.
Add controls: ACLs, schemas, budgets, idempotency, approvals, redaction.
Add failure paths: empty retrieval, provider timeout, bad args, loop, policy block.
Add evals and rollout: golden traces, canaries, rollback flags.

Common bad answers

Bad answer	Why it is weak
"The LLM is the brain, everything else is plugins."	The orchestrator, state, tools, policies, and traces are the control plane.
"Put tools directly in the model."	The model should propose actions; the app enforces permissions and execution.
"Add agents before defining the workflow."	You need task, risk, state, and stop conditions first.

Self-check

You are ready if you can explain:

The boxes in a production agent architecture.
What the orchestrator owns.
Why tools and storage are outside the model.
How state transitions map to failure handling.
How to keep the architecture simpler when autonomy is not needed.

Agentic fundamentals · Agent memory, state & storage · Agentic production

On this page