Stateful agents with LangGraph

TL;DR

Stateless chains lose all intermediate work on crash. LangGraph persists state as a typed object after every node, enabling resume-from-checkpoint on any failure.
State is a Python TypedDict that flows through the graph. Every node reads the full state and returns only the fields it updates. The graph merges updates using reducer functions.
Conditional edges are Python functions routing to different nodes based on the current state. This replaces spaghetti if-else logic with explicit, testable routing functions.
Attach SQLite, PostgreSQL, or Redis checkpointers to persist state after each node. Resume a crashed run by passing the same thread_id.
interrupt_before pauses the graph before a specified node and freezes state in the checkpointer until a human resumes it. This is the production pattern for approval gates on irreversible actions.
LangGraph deployments at Klarna, LinkedIn, and Uber handle customer service, research automation, and code review workflows in production.

You build a 12-step research pipeline: fetch documents, extract entities, cluster themes, generate summaries, cross-reference sources, and produce a final report. The pipeline runs correctly on the first 10 jobs. Job 11, a network timeout kills the process at step 7. All 6 completed steps are gone. The job restarts from step 0.

With a stateless chain, there is no other option. State exists only in memory. When the process dies, the state dies with it. Every restart costs the full run time and compute of the failed job.

I have watched teams burn entire sprints rewriting pipeline state management by hand: serializing intermediate results to JSON files, loading them on restart, writing corrupt-state recovery logic. It works, but it is brittle infrastructure that the team maintains forever.

LangGraph makes state a first-class concept. Every node reads from and writes to a typed state object. The checkpointer persists that state to a database after every node completes. When the pipeline crashes at step 7, you restart with the same thread_id. LangGraph loads the last checkpoint and resumes from step 8.

What is it?

LangGraph is a Python library for building stateful, graph-structured agent workflows. You define a graph where nodes are Python functions and edges connect them. A TypedDict state object flows through the graph, read and updated by each node. A checkpointer persists the state after every node to a database.

Think of a relay race. Each runner (node) receives the baton (state), runs their leg, and hands off. The baton accumulates context from every leg. If a runner stumbles (node fails), the race does not restart from the starting line. You resume from the last successful handoff. LangGraph is the relay race coordinator.

How it works

State as a TypedDict

Every LangGraph graph defines a State class. It is a TypedDict (or Pydantic model) that holds everything the workflow needs to track: inputs, intermediate outputs, control flags, accumulated results, and the message history.

# LangGraph state definition for a document research pipeline
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages

class ResearchState(TypedDict):
    # Input
    query: str
    # Accumulated retrieval results
    retrieved_docs: list[str]
    # Generated output
    draft_answer: str
    # Control flags
    needs_verification: bool
    verified: bool
    # Final output
    final_answer: str
    # Message history (add_messages reducer appends, not replaces)
    messages: Annotated[list, add_messages]

The Annotated[list, add_messages] annotation is a reducer. When a node returns {"messages": [new_message]}, LangGraph calls add_messages(existing_list, [new_message]) rather than replacing the list. Reducers are how you accumulate data across nodes without managing merge logic manually.

Every node receives the full current state as a Python dict and returns only the fields it changes. LangGraph merges the returned dict into the state automatically.

Node functions

A node is a Python function that takes the current state and returns a dict of updates. Nodes can call LLMs, execute tools, run database queries, send API requests, or do pure Python computation. Any function that reads state and returns a partial update is a valid node.

# LangGraph node functions
def retrieve_docs(state: ResearchState) -> dict:
    """Retrieve relevant documents from vector store."""
    docs = vector_store.similarity_search(state["query"], k=5)
    return {"retrieved_docs": [doc.page_content for doc in docs]}

def generate_answer(state: ResearchState) -> dict:
    """Generate a draft answer using retrieved context."""
    context = "\n\n".join(state["retrieved_docs"])
    response = llm.invoke(
        f"Context:\n{context}\n\nQuery: {state['query']}\n\nAnswer:"
    )
    return {
        "draft_answer": response.content,
        # Flag long answers for verification
        "needs_verification": len(response.content.split()) > 200
    }

def verify_answer(state: ResearchState) -> dict:
    """Cross-check the draft answer against source documents."""
    verdict = verifier_llm.invoke(
        f"Claim: {state['draft_answer']}\nSources: {state['retrieved_docs']}\n"
        f"Is this claim fully supported by the sources? Answer yes or no."
    )
    return {
        "verified": verdict.content.strip().lower().startswith("yes"),
        "final_answer": state["draft_answer"] if verdict.content.strip().lower().startswith("yes") else ""
    }

Nodes should do one thing. Combining retrieval and generation into a single node makes the graph harder to test, harder to checkpoint at the right granularity, and impossible to branch conditionally between those two operations.

Conditional edges

Conditional edges route the graph to different nodes based on the current state. The routing function is a plain Python function: it receives the state and returns the name of the next node (or END to terminate the graph).

from langgraph.graph import StateGraph, END

workflow = StateGraph(ResearchState)

# Register nodes
workflow.add_node("retrieve_docs", retrieve_docs)
workflow.add_node("generate_answer", generate_answer)
workflow.add_node("verify_answer", verify_answer)

# Set entry point
workflow.set_entry_point("retrieve_docs")

# Unconditional edge: retrieve always leads to generation
workflow.add_edge("retrieve_docs", "generate_answer")

# Conditional edge: generation routes based on needs_verification flag
def route_after_generation(state: ResearchState) -> str:
    if state["needs_verification"]:
        return "verify_answer"
    return END

workflow.add_conditional_edges("generate_answer", route_after_generation)
workflow.add_edge("verify_answer", END)

The routing function is pure Python operating on state. It has no side effects, is easy to unit test, and makes the branching logic explicit and readable.

Graph Init

>Awaiting query input...

retrieve_docs

>Vector store ready...

generate_answer

>LLM standby...

Conditional Edge

>Routing logic waiting...

Checkpointer

>Postgres checkpointer ready...

Complete

>Awaiting final state...

LangGraph execution: state flows through nodes, the conditional router branches based on the needs_verification flag, and the checkpointer saves state after every node.

Persistence and checkpointing

To persist state across crashes and sessions, attach a checkpointer when compiling the graph. LangGraph ships SQLite (development), PostgreSQL (multi-instance production), and Redis (high-throughput, short-lived workflows) checkpointers.

TL;DR

Stateless chains lose all intermediate work on crash. LangGraph persists state as a typed object after every node, enabling resume-from-checkpoint on any failure.
State is a Python TypedDict that flows through the graph. Every node reads the full state and returns only the fields it updates. The graph merges updates using reducer functions.
Conditional edges are Python functions routing to different nodes based on the current state. This replaces spaghetti if-else logic with explicit, testable routing functions.
Attach SQLite, PostgreSQL, or Redis checkpointers to persist state after each node. Resume a crashed run by passing the same thread_id.
interrupt_before pauses the graph before a specified node and freezes state in the checkpointer until a human resumes it. This is the production pattern for approval gates on irreversible actions.
LangGraph deployments at Klarna, LinkedIn, and Uber handle customer service, research automation, and code review workflows in production.

The problem it solves

With a stateless chain, there is no other option. State exists only in memory. When the process dies, the state dies with it. Every restart costs the full run time and compute of the failed job.

What is it?

How it works

State as a TypedDict

# LangGraph state definition for a document research pipeline
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages

class ResearchState(TypedDict):
    # Input
    query: str
    # Accumulated retrieval results
    retrieved_docs: list[str]
    # Generated output
    draft_answer: str
    # Control flags
    needs_verification: bool
    verified: bool
    # Final output
    final_answer: str
    # Message history (add_messages reducer appends, not replaces)
    messages: Annotated[list, add_messages]

Every node receives the full current state as a Python dict and returns only the fields it changes. LangGraph merges the returned dict into the state automatically.

Node functions

# LangGraph node functions
def retrieve_docs(state: ResearchState) -> dict:
    """Retrieve relevant documents from vector store."""
    docs = vector_store.similarity_search(state["query"], k=5)
    return {"retrieved_docs": [doc.page_content for doc in docs]}

def generate_answer(state: ResearchState) -> dict:
    """Generate a draft answer using retrieved context."""
    context = "\n\n".join(state["retrieved_docs"])
    response = llm.invoke(
        f"Context:\n{context}\n\nQuery: {state['query']}\n\nAnswer:"
    )
    return {
        "draft_answer": response.content,
        # Flag long answers for verification
        "needs_verification": len(response.content.split()) > 200
    }

def verify_answer(state: ResearchState) -> dict:
    """Cross-check the draft answer against source documents."""
    verdict = verifier_llm.invoke(
        f"Claim: {state['draft_answer']}\nSources: {state['retrieved_docs']}\n"
        f"Is this claim fully supported by the sources? Answer yes or no."
    )
    return {
        "verified": verdict.content.strip().lower().startswith("yes"),
        "final_answer": state["draft_answer"] if verdict.content.strip().lower().startswith("yes") else ""
    }

Conditional edges

from langgraph.graph import StateGraph, END

workflow = StateGraph(ResearchState)

# Register nodes
workflow.add_node("retrieve_docs", retrieve_docs)
workflow.add_node("generate_answer", generate_answer)
workflow.add_node("verify_answer", verify_answer)

# Set entry point
workflow.set_entry_point("retrieve_docs")

# Unconditional edge: retrieve always leads to generation
workflow.add_edge("retrieve_docs", "generate_answer")

# Conditional edge: generation routes based on needs_verification flag
def route_after_generation(state: ResearchState) -> str:
    if state["needs_verification"]:
        return "verify_answer"
    return END

workflow.add_conditional_edges("generate_answer", route_after_generation)
workflow.add_edge("verify_answer", END)

The routing function is pure Python operating on state. It has no side effects, is easy to unit test, and makes the branching logic explicit and readable.

Graph Init

>Awaiting query input...

retrieve_docs

>Vector store ready...

generate_answer

>LLM standby...

Conditional Edge

>Routing logic waiting...

Checkpointer

>Postgres checkpointer ready...

Complete

>Awaiting final state...

LangGraph execution: state flows through nodes, the conditional router branches based on the needs_verification flag, and the checkpointer saves state after every node.

Stateful agents with LangGraph

TL;DR

The problem it solves

What is it?

How it works

State as a TypedDict

Node functions

Conditional edges

Persistence and checkpointing

Continue Reading with Premium

Comments

Stateful agents with LangGraph

TL;DR

The problem it solves

What is it?

How it works

State as a TypedDict

Node functions

Conditional edges

Persistence and checkpointing

Continue Reading with Premium

Comments