Filesystem-based agent state

TL;DR

Filesystem-based agent state writes all working state (plans, progress, context, scratchpad) to plain files on disk instead of holding it in-memory or in the context window.
Files are human-readable (Markdown for plans, JSON for structured data, plain text for logs), so developers can inspect, debug, and modify agent state at any point during execution.
Crash recovery comes free: if the agent process dies, state survives on disk. Restart the agent and it resumes from the last checkpoint instead of starting from scratch.
Production agents using this pattern report zero data loss from process crashes and 60-80% faster recovery compared to context-only state management.
Git-friendly by design: since state is files, you get version control for free. Track how agent state evolved, diff between runs, and bisect to find where reasoning went wrong.
Limitation: filesystem I/O is slower than in-memory access (microseconds vs. nanoseconds), and concurrent multi-agent access requires file locking or directory-per-agent isolation.

Your coding agent is 45 minutes into refactoring a large codebase. It has processed 30 of 50 files, built up a mental model of the dependency graph, and identified three circular imports to fix. Then the process crashes. An out-of-memory error, a network timeout, a model API rate limit, whatever the cause. All that intermediate work was stored in the context window and Python dictionaries. It's gone.

The agent restarts. It has no idea which 30 files it already processed, what the dependency graph looked like, or which circular imports it found. It starts over from file 1 and burns through the same API calls, the same tokens, the same 45 minutes of work. I've watched this happen on a real production agent that was processing a 500-file migration. The team lost $400 in API costs and four hours because the agent had no persistent memory of its progress.

The deeper issue: context windows are volatile and bounded. Everything inside the context window vanishes when the session ends or the process restarts. For tasks that run for hours, span multiple sessions, or exceed context window limits, keeping state only in-memory is building on sand.

The cost isn't just the lost API spend. It's the lost reasoning. The agent had built a nuanced understanding of the codebase's dependency structure, discovered three subtle bugs, and developed a fix strategy. None of that was written down. On restart, the agent might reach different conclusions because LLM outputs are non-deterministic. You're not just repeating work; you're potentially getting different (and possibly worse) results.

What Is It?

Filesystem-based agent state externalizes all working state to files on disk, treating the filesystem as the agent's persistent scratchpad rather than relying on in-memory variables or the context window. The agent reads what it needs, processes it, writes results to files, and moves on. The context window stays small and task-focused while the filesystem holds the full picture.

The pattern is deceptively simple: write files, read files, use the filesystem as a database. But the simplicity is the point. Every developer knows how to inspect, edit, diff, and version-control files. No one needs to learn a new tool or API to understand what the agent is doing. The filesystem is the most universal, lowest-common-denominator persistence layer in computing.

Think of it as the difference between a chef who memorizes the entire recipe and every step's result (and forgets everything if they leave the kitchen), versus a chef who writes each completed step on a clipboard hanging on the wall. The second chef can leave, come back tomorrow, glance at the clipboard, and pick up exactly where they left off. The first chef has to start from scratch every time.

Filesystem-based state is the clipboard.

How It Works

Why files, not databases or in-memory caches?

The obvious question: why plain files? Why not SQLite, Redis, or a proper database?

Files win on three dimensions that matter most for agent workflows. Inspectability: you can cat .agent/plan.md and immediately understand what the agent is doing. Try that with a Redis cache or SQLite blob. Tooling: every developer has grep, diff, git, and a text editor. No special clients, no connection strings, no schema browsers.

Simplicity of setup: files require zero infrastructure. No database server, no connection pool, no migrations. The agent creates a directory and starts writing. For single-machine agent workloads (which is 90%+ of current usage), this simplicity is a massive advantage.

The tradeoff: files don't support concurrent writes well, don't have built-in query capabilities, and don't scale to distributed multi-machine agents. When you need those capabilities, graduate to a database. But start with files and only add complexity when the constraints actually bite.

I've built agent state on both files and databases. Files win for 90% of use cases because agent workflows are single-machine, single-process operations. The database option becomes necessary only when you have multiple machines running agents that need shared state.

The file-per-concern pattern

The core organizing principle: one file per type of state. Don't dump everything into a single state.json. Separate concerns into distinct files so the agent can read only the state it needs for the current step, keeping context window usage minimal.

A typical agent state directory:

.agent/
├── plan.md              # Task decomposition and step ordering
├── progress.json        # Which steps are done, in-progress, or pending
├── context.md           # Accumulated knowledge and decisions
├── scratchpad.md        # Temporary notes and working hypotheses
├── errors.log           # Failures and retries for debugging
└── output/              # Intermediate artifacts
    ├── step-01-analysis.md
    ├── step-02-refactor.md
    └── step-03-tests.md

plan.md holds the task decomposition: what needs to happen, in what order, with what dependencies. The agent writes this at the start and updates it if the plan evolves. This file is Markdown because plans are naturally hierarchical (headers, nested lists) and humans need to read them.

progress.json tracks execution state as structured data: which steps are complete, which are in-progress, which failed and why. JSON because progress needs to be machine-parseable (the agent reads it on restart to know where to resume).

scratchpad.md is the agent's working memory for the current task. Notes, hypotheses, observations that don't fit neatly into other files. This is the most volatile file, frequently overwritten.

I've seen teams try to use a single state.json for everything. It works until the file hits 10K lines and the agent spends half its context window just reading state. File-per-concern keeps each read small and focused.

The checkpoint-and-resume loop

The checkpoint-and-resume loop is the execution model that makes filesystem state work. After every meaningful step, the agent writes a checkpoint. On restart, it reads the latest checkpoint and continues.

The critical rule: write state before moving to the next step, not after. If the agent processes step 5, moves to step 6, and then tries to write the step 5 checkpoint, a crash during step 6 means step 5's work is lost even though it completed. Always checkpoint before advancing.

Atomic writes prevent corruption. Write to a temporary file first (progress.json.tmp), then rename it to the target (progress.json). Renaming is atomic on most filesystems, so you never end up with a half-written checkpoint file.

For your interview: "atomic checkpoint writes using temp-file-then-rename" is a one-liner that demonstrates real systems experience. Every engineer who has built crash-resilient systems knows this trick.

State format choices: when to use what

Choosing the right file format for each type of state matters for both human readability and machine parseability.

State Type	Format	Why
Plans, reasoning	Markdown (.md)	Human-readable, hierarchical, good for LLM consumption
Progress tracking	JSON (.json)	Machine-parseable, structured, easy to query programmatically
Logs, audit trails	Plain text (.log)	Append-only, line-oriented, searchable with standard tools
Configurations	YAML (.yaml) or JSON	Structured, human-editable, widely supported
Intermediate artifacts	Markdown or code files	Matches the output domain (analysis in .md, code in .py/.ts)
Binary data	Avoid in agent state	Use references (paths) to external binary files instead

Markdown is the default for anything the agent or human needs to read and reason about. JSON for anything the agent needs to parse programmatically. Never store large binary blobs (images, compiled outputs) inline in state files. Store them separately and reference them by path.

The format choice also affects the agent's ability to consume its own state. When the agent reads plan.md back into its context window, Markdown parses naturally into the LLM's understanding. JSON requires the agent to interpret structured data, which it does well but at a higher token cost per unit of information. Use the format that minimizes tokens while maximizing the agent's comprehension.

The .agent/ directory convention

TL;DR

Filesystem-based agent state writes all working state (plans, progress, context, scratchpad) to plain files on disk instead of holding it in-memory or in the context window.
Files are human-readable (Markdown for plans, JSON for structured data, plain text for logs), so developers can inspect, debug, and modify agent state at any point during execution.
Crash recovery comes free: if the agent process dies, state survives on disk. Restart the agent and it resumes from the last checkpoint instead of starting from scratch.
Production agents using this pattern report zero data loss from process crashes and 60-80% faster recovery compared to context-only state management.
Git-friendly by design: since state is files, you get version control for free. Track how agent state evolved, diff between runs, and bisect to find where reasoning went wrong.
Limitation: filesystem I/O is slower than in-memory access (microseconds vs. nanoseconds), and concurrent multi-agent access requires file locking or directory-per-agent isolation.

The Problem It Solves

What Is It?

Filesystem-based state is the clipboard.

How It Works

Why files, not databases or in-memory caches?

The obvious question: why plain files? Why not SQLite, Redis, or a proper database?

The file-per-concern pattern

A typical agent state directory:

.agent/
├── plan.md              # Task decomposition and step ordering
├── progress.json        # Which steps are done, in-progress, or pending
├── context.md           # Accumulated knowledge and decisions
├── scratchpad.md        # Temporary notes and working hypotheses
├── errors.log           # Failures and retries for debugging
└── output/              # Intermediate artifacts
    ├── step-01-analysis.md
    ├── step-02-refactor.md
    └── step-03-tests.md

scratchpad.md is the agent's working memory for the current task. Notes, hypotheses, observations that don't fit neatly into other files. This is the most volatile file, frequently overwritten.

The checkpoint-and-resume loop

State format choices: when to use what

Choosing the right file format for each type of state matters for both human readability and machine parseability.

State Type	Format	Why
Plans, reasoning	Markdown (.md)	Human-readable, hierarchical, good for LLM consumption
Progress tracking	JSON (.json)	Machine-parseable, structured, easy to query programmatically
Logs, audit trails	Plain text (.log)	Append-only, line-oriented, searchable with standard tools
Configurations	YAML (.yaml) or JSON	Structured, human-editable, widely supported
Intermediate artifacts	Markdown or code files	Matches the output domain (analysis in .md, code in .py/.ts)
Binary data	Avoid in agent state	Use references (paths) to external binary files instead

Filesystem-based agent state

TL;DR

The Problem It Solves

What Is It?

How It Works

Why files, not databases or in-memory caches?

The file-per-concern pattern

The checkpoint-and-resume loop

State format choices: when to use what

The .agent/ directory convention

Continue Reading with Premium

Comments

Filesystem-based agent state

TL;DR

The Problem It Solves

What Is It?

How It Works

Why files, not databases or in-memory caches?

The file-per-concern pattern

The checkpoint-and-resume loop

State format choices: when to use what

The .agent/ directory convention

Continue Reading with Premium

Comments