Agent memory, state & storage

Agents need storage because the LLM call itself is stateless. Each request must supply the context needed for the next step. Good systems separate short-term state, long-term memory, retrieval knowledge, tool data, and audit traces.

Do not call every stored thing "memory." Different storage has different consistency, privacy, and deletion requirements.

Storage map

Loading diagram…

State vs memory vs knowledge

Item	Scope	Example	Storage
Thread state	Current conversation/run.	Messages, step count, pending tool call.	Checkpointer / DB.
Scratchpad	Current task only.	Plan, intermediate observations, duplicate-action detector.	Checkpoint state, often not shown to user.
User memory	Cross-session user facts.	"Prefers concise answers", locale, saved preference.	User memory table/vector store with consent controls.
Task memory	Project/workflow facts.	Current incident id, selected repo, approval status.	Relational/document DB.
Knowledge base	Shared external facts.	Policies, docs, tickets, product manuals.	RAG stores: object storage, metadata DB, vector/index.
Business source of truth	Real systems.	Orders, payments, inventory, deployments.	Existing databases/APIs.
Audit trace	Compliance/debug.	Who approved action, tool args, result status.	Append-only audit/log store.

Context assembly

Loading diagram…

The selector should answer: what is relevant, allowed, fresh, and small enough?

Example scenario

Scenario: A support agent helps a user resolve a billing issue across several turns.

Storage concern	Good design
Current conversation	Keep thread state, pending tool calls, step count, and latest observations in a checkpoint.
User preference	Store "prefers email follow-up" only with consent and a clear deletion path.
Billing facts	Read from billing tools or databases, not from memory.
Policy docs	Retrieve from the RAG knowledge base with source version and tenant/product metadata.
Audit trail	Record approved refund actions, validated args, result status, and actor identity.

Bad design: storing "user is eligible for refund" as long-term memory. Eligibility is a business fact that can change and must come from the source of truth.

Checkpoints and durable execution

A checkpoint stores workflow state after each meaningful step. It enables:

Resume after model/provider/tool failure.
Human approval pauses.
Time-travel debugging.
Replay for evaluation.
Idempotent recovery after duplicate requests.

Checkpoint state should include enough to resume, but not raw secrets or full documents.

Memory lifecycle

Loading diagram…

Storage safety rules

Rule	Why
Separate state from memory	Run checkpoints are not long-term user facts.
Separate memory from audit	User deletion may apply to memory; audit may have legal retention.
Do not put secrets in prompts	Prompts and traces spread data across systems.
Store references, not blobs, in state	State stays small and resumable.
Use TTL and scopes	Stale memory causes wrong personalization.
Version memory writes	You need to know which model/tool wrote a fact.
Let users correct/delete memory	Builds trust and supports compliance.

Interview questions

1. What is state management for agents?

Tracking the current workflow: messages, node, tool calls, observations, approvals, counters, and terminal status.

Follow-up: What should not be stored in state?

Raw secrets, full documents, large blobs, and authoritative business records. Store references and fetch from controlled systems.

2. Why is long-term memory risky?

It can store sensitive, stale, or incorrect facts and reapply them later in the wrong context.

Follow-up: How do you make memory safe?

Use consent, scopes, TTL, validation, source attribution, and user correction/deletion.

3. What belongs in a vector memory store?

Searchable summaries or facts with metadata and permissions. Not secrets, raw credentials, or authoritative records.

4. How do you prevent context overflow?

Rank, summarize, dedupe, trim by token budget, preserve high-priority instructions, and retrieve only relevant facts.

5. What is the source of truth: memory or tools?

Tools/business systems are the source of truth. Memory is a convenience cache unless explicitly designed otherwise.

Interview answer template

For "How would you design memory for an agent?", answer:

Separate thread state, scratchpad, user memory, task memory, knowledge base, business source of truth, and audit trace.
Explain what is retrieved into context and what stays behind references.
Apply permissions, TTL, retention, and deletion rules.
Use checkpoints for resumability and human approval pauses.
Treat tools/databases as source of truth for business facts.

Common bad answers

Bad answer	Why it is weak
"Store everything in memory so the agent remembers."	This creates privacy, staleness, deletion, and context-bloat problems.
"Use vector memory as source of truth."	Memory is not an authoritative business record.
"Put full documents in state."	State becomes large, leaky, hard to resume, and hard to delete.

Self-check

You are ready if you can explain:

Thread state vs user memory vs task memory vs knowledge.
Why checkpoints matter.
What belongs in state versus external storage.
Why memory needs TTL, consent, and deletion.
Why tools/databases beat memory for business facts.

Agentic architecture workflow · LangGraph for agents · RAG

On this page