LangGraph for agents

LangGraph models agents as explicit graphs: nodes mutate shared state, and edges encode control flow, including loops that simple chains cannot express cleanly. Checkpointing enables resume, replay, human interrupts, and durable execution for long-running workflows.

Prerequisites: messages + tools (LangChain for agents).

Do not use LangGraph when a fixed chain or normal backend workflow is enough. It earns its complexity when you need branching, loops, checkpoints, replay, streaming state, or human interrupts.

Process — compiled graph lifecycle

Loading diagram…

Reducer example: add_messages merges message lists deterministically rather than overwriting.

State-to-state graph anatomy

Loading diagram…

Why interviewers like this: it makes failure handling visible. You can point to where evidence is verified, where tools are gated, and where a human can interrupt before an irreversible transition.

Classic ReAct loop as a graph shape

Loading diagram…

Termination = router returns END path when latest AI chunk has no tool_calls.

Pattern A — preset ReAct (`create_react_agent`)

Fastest sane loop; swaps model & tool list cleanly.

from langchain_core.messages import HumanMessage
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent


@tool
def kb_lookup(query: str) -> str:
    """Hybrid search stub—replace with ACL-aware retrieval."""
    return f"[stub]{query}: policy snippet..."


model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
graph = create_react_agent(model=model, tools=[kb_lookup])

state = graph.invoke(
    {"messages": [HumanMessage("Summarize PII logging stance from KB")]},
    config={"configurable": {"thread_id": "tenant-42-thread-9"}},
)
print(state["messages"][-1].content)

thread_id pairs with eventual checkpointing for conversational memory & recovery.

Pattern B — explicit router edges

Customize branching (retrieve → rerank gate → reply) while sharing one state shape.

from typing import Annotated

from langchain_core.messages import AnyMessage, HumanMessage
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langgraph.graph import END, START, StateGraph
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode, tools_condition
from typing_extensions import TypedDict


class AgentState(TypedDict):
    messages: Annotated[list[AnyMessage], add_messages]


@tool
def add_ints(a: int, b: int) -> int:
    """Add two integers."""
    return a + b


llm_tools = ChatOpenAI(model="gpt-4o-mini", temperature=0).bind_tools([add_ints])
tool_runner = ToolNode([add_ints])


def call_model(state: AgentState):
    return {"messages": [llm_tools.invoke(state["messages"])]}


builder = StateGraph(AgentState)
builder.add_node("agent", call_model)
builder.add_node("tools", tool_runner)
builder.add_edge(START, "agent")
builder.add_conditional_edges(
    "agent",
    tools_condition,
    {"tools": "tools", END: END},
)
builder.add_edge("tools", "agent")
compiled = builder.compile()

out = compiled.invoke(
    {"messages": [HumanMessage("Use tools: compute 19+23")]},
)

Add checkpointing:

from langgraph.checkpoint.memory import MemorySaver

compiled_ckpt = builder.compile(checkpointer=MemorySaver())
compiled_ckpt.invoke(..., config={"configurable": {"thread_id": "abc"}})

(MemorySaver is illustrative; Postgres/SQL store for durable deploy.)

Production notes

Concern	LangGraph practice
Durability	Use a real checkpointer for production so interrupted runs can resume from the last saved state.
Human-in-the-loop	Pause at approval nodes before destructive, financial, legal, or external-message actions.
Replay	Re-run from checkpoints to debug why a route or tool call happened.
State design	Keep state typed and small: messages, task facts, counters, approvals, evidence ids, and tool results.
Determinism	Keep node side effects isolated; do not hide network writes inside "planning" nodes.

Example production graph

Scenario: Refund-review assistant.

Node	Responsibility
`classify`	Determine whether the request is informational, refund-related, or abusive.
`retrieve_policy`	Fetch refund policy evidence with tenant/product metadata.
`check_order`	Tool call to source-of-truth order system.
`decide`	Choose final answer, tool action, or human approval.
`human_review`	Pause before irreversible refund creation.
`create_ticket`	Create a refund ticket with idempotency key.
`finalize`	Return answer with ticket id, citations, or refusal reason.

Why graph shape helps: each state has a trace, retry policy, approval point, and terminal path.

Interview questions — LangGraph

1. LangGraph vs simple while-loop around chat completions?

Graph gives explicit node boundaries, reproducible checkpoints, concurrency patterns, eventual streaming partial states.

Follow-up: When is a while-loop acceptable?

For prototypes or simple bounded loops. Production workflows need explicit states, checkpoints, interrupts, and traceable transitions.

2. Thread vs checkpointer?

Thread: logical continuity key (thread_id) across invocations.
Checkpointer: saves intermediate state checkpoints keyed by (thread_id, step).

3. How encode human approvals?

Interrupt nodes pause before irreversible transitions; resume with operator decision—maps to approvals & compliance audits.

4. Where does max-steps guard live best?

Router wrapper counting iterations or reducer tracking step; refuse additional tool hops.

5. Debugging infinite tool oscillation.

Inspect last two AI tool calls equivalence; widen schema constraints; escalate on duplicate pattern.

Follow-up: How do you prevent it before production?

Add step counters, duplicate-action detectors, max wall-clock time, tool-specific quotas, and eval cases for oscillation.

6. Why is durable execution useful for agents?

A long run can pause for approval, survive failures, and resume without replaying every prior model/tool step.

7. What belongs in graph state vs external storage?

State carries workflow facts and references. Large documents, secrets, raw files, and long-term records belong in storage with access control.

8. Where should tool authorization live?

In the tool gateway or tool implementation, not only in the graph prompt. The graph can route to a guard node, but the server still enforces identity and scope.

Interview answer template

For "How would you design an agent with LangGraph?", answer:

Define typed state: messages, evidence ids, tool results, approvals, counters, terminal status.
Define nodes with one responsibility each.
Define conditional edges and explicit stop conditions.
Add checkpoints for resume/replay and human interrupts.
Keep side effects behind guarded tool nodes.
Trace every transition and evaluate representative trajectories.

Common bad answers

Bad answer	Why it is weak
"A graph is just a while-loop."	Graphs make state, routing, checkpoints, and interrupts explicit and inspectable.
"Put all data in graph state."	State should carry workflow facts and references, not raw secrets or large documents.
"The graph prompt handles authorization."	Tool authorization must live in the gateway/tool implementation.

Self-check

You are ready if you can explain:

Thread id vs checkpointer.
Node responsibility and conditional edges.
Human interrupt placement.
How to prevent infinite tool oscillation.
What belongs in state versus storage.

Observe runs in LangSmith · Harden rollout in Agentic production.

On this page