Agent memory, state & storage
Agents need storage because the LLM call itself is stateless. Each request must supply the context needed for the next step. Good systems separate short-term state, long-term memory, retrieval knowledge, tool data, and audit traces.
Do not call every stored thing "memory." Different storage has different consistency, privacy, and deletion requirements.
Storage map
State vs memory vs knowledge
| Item | Scope | Example | Storage |
|---|---|---|---|
| Thread state | Current conversation/run. | Messages, step count, pending tool call. | Checkpointer / DB. |
| Scratchpad | Current task only. | Plan, intermediate observations, duplicate-action detector. | Checkpoint state, often not shown to user. |
| User memory | Cross-session user facts. | "Prefers concise answers", locale, saved preference. | User memory table/vector store with consent controls. |
| Task memory | Project/workflow facts. | Current incident id, selected repo, approval status. | Relational/document DB. |
| Knowledge base | Shared external facts. | Policies, docs, tickets, product manuals. | RAG stores: object storage, metadata DB, vector/index. |
| Business source of truth | Real systems. | Orders, payments, inventory, deployments. | Existing databases/APIs. |
| Audit trace | Compliance/debug. | Who approved action, tool args, result status. | Append-only audit/log store. |
Context assembly
The selector should answer: what is relevant, allowed, fresh, and small enough?
Checkpoints and durable execution
A checkpoint stores workflow state after each meaningful step. It enables:
- Resume after model/provider/tool failure.
- Human approval pauses.
- Time-travel debugging.
- Replay for evaluation.
- Idempotent recovery after duplicate requests.
Checkpoint state should include enough to resume, but not raw secrets or full documents.
Memory lifecycle
Storage safety rules
| Rule | Why |
|---|---|
| Separate state from memory | Run checkpoints are not long-term user facts. |
| Separate memory from audit | User deletion may apply to memory; audit may have legal retention. |
| Do not put secrets in prompts | Prompts and traces spread data across systems. |
| Store references, not blobs, in state | State stays small and resumable. |
| Use TTL and scopes | Stale memory causes wrong personalization. |
| Version memory writes | You need to know which model/tool wrote a fact. |
| Let users correct/delete memory | Builds trust and supports compliance. |
Interview questions
1. What is state management for agents?
- Tracking the current workflow: messages, node, tool calls, observations, approvals, counters, and terminal status.
2. Why is long-term memory risky?
- It can store sensitive, stale, or incorrect facts and reapply them later in the wrong context.
3. What belongs in a vector memory store?
- Searchable summaries or facts with metadata and permissions. Not secrets, raw credentials, or authoritative records.
4. How do you prevent context overflow?
- Rank, summarize, dedupe, trim by token budget, preserve high-priority instructions, and retrieve only relevant facts.
5. What is the source of truth: memory or tools?
- Tools/business systems are the source of truth. Memory is a convenience cache unless explicitly designed otherwise.
Related
Spotted something unclear or wrong on this page?