Action caching and replay

TL;DR

Action caching stores tool call results keyed by (tool_name, hash(normalized_args)) and returns cached results on identical future calls, skipping redundant API calls, database queries, and web requests.
Production agents executing retry loops or multi-run workflows see 40-70% reduction in API costs and 10-100x faster replays when caching is enabled.
Replay mode re-runs an entire agent session from recorded logs using cached results instead of live tool calls, enabling deterministic debugging, regression testing, and offline evaluation.
Cache key design is the hardest part: normalize arguments (sort keys, strip timestamps, canonicalize paths) before hashing. Non-deterministic arguments (random IDs, current timestamps) bust the cache if not handled.
Determinism test: a tool is cacheable if identical inputs always produce identical outputs. Side-effecting tools (send_email, write_file, deploy) are never cacheable.
Limitation: stale caches return outdated results. TTL policies must match the volatility of each tool's data source, or you'll debug phantom bugs that only exist in the cache.

Your coding agent is refactoring a 200-file codebase. It calls get_file_contents on the same 15 utility files across 40 different refactoring steps. Each call hits the filesystem API, adds latency, and burns context-building tokens. The agent has already read utils/auth.ts eight times in the past hour, and the file hasn't changed once.

Now multiply this across retries. The agent hits a rate limit on step 37, backs off, and restarts from step 30. Steps 30-36 re-execute identical tool calls: the same file reads, the same linter runs, the same test executions. You pay for every one of them again.

I've watched production agents burn through $200 in a single day purely on redundant tool calls during retry-heavy workflows. The agent wasn't doing more work. It was doing the same work over and over, paying full price each time.

What Is It?

Action caching stores the results of deterministic tool calls in a key-value cache, keyed by the tool name and a hash of normalized arguments. When the agent makes the same call again, the cache returns the stored result instantly instead of executing the tool. Replay mode takes this further: it records an entire session's tool calls and results, then plays them back without any live execution.

Think of it as a court stenographer. During the first trial (agent run), the stenographer records every question asked and every answer given. When the trial is rehearsed for appeal preparation (replay), no one needs to bring the witnesses back. The attorneys read from the transcript. Same questions, same answers, zero witness fees.

How It Works

Cache key design: the make-or-break decision

The cache key determines whether your cache hits or misses. A good key maximizes hits on truly identical calls while never returning stale results for different calls. The standard formula: hash(tool_name + canonical(args)).

Canonicalization is where teams get it wrong. Consider these two calls:

# Call A
get_file(path="./src/utils/auth.ts", encoding="utf-8")

# Call B
get_file(path="src/utils/auth.ts", encoding="utf-8")

These are semantically identical but produce different hashes if you hash raw arguments. Canonicalization resolves paths to absolute form, sorts dictionary keys alphabetically, strips default values, and normalizes whitespace. Without it, your cache hit rate drops from 85% to under 40%.

Non-deterministic arguments need special handling. If a tool call includes timestamp: Date.now() or request_id: uuid(), those values change every call and bust the cache. The solution: define a per-tool argument filter that strips non-deterministic fields before hashing.

# Cache key generation with argument normalization
import hashlib, json, os

def make_cache_key(tool_name: str, args: dict, strip_keys: set = None) -> str:
    """Generate a deterministic cache key from tool call."""
    clean_args = dict(args)

    # Strip non-deterministic fields
    for key in (strip_keys or set()):
        clean_args.pop(key, None)

    # Normalize paths to absolute
    for key in ("path", "file", "directory"):
        if key in clean_args and isinstance(clean_args[key], str):
            clean_args[key] = os.path.abspath(clean_args[key])

    # Sort keys for deterministic serialization
    canonical = json.dumps(clean_args, sort_keys=True, default=str)
    content = f"{tool_name}:{canonical}"
    return hashlib.sha256(content.encode()).hexdigest()[:16]

The cacheability test: deterministic vs. side-effecting tools

Not every tool call can be cached. The rule is simple: cache reads, never cache writes.

A tool is cacheable if the same inputs always produce the same outputs and the call has no side effects. Reading a file is cacheable. Sending an email is not. Running a linter on unchanged code is cacheable. Deploying to production is not.

I've seen teams make the mistake of caching web_search results with no TTL. The agent searched for "current stock price of AAPL" three hours ago, got $185, and now it's returning that cached result while the actual price has moved to $192. The cache turned a reliable tool into a time-delayed one.

TTL policies: matching cache lifetime to data volatility

Different tools need different cache durations. A database schema changes once a week. A web search result is stale after 15 minutes. A compiled binary is valid until the source changes. Using a single TTL for all tools either causes stale results (TTL too long) or cache thrashing (TTL too short).

Tool Category	TTL	Rationale
Static config / schema	24 hours	Changes require deploys
File contents (read-only)	Until file modified (inotify or mtime check)	Invalidate on mutation
Linter / compiler output	Until source changes	Deterministic for same input
Web search	10-30 minutes	Results drift with time
API responses (third-party)	5-15 minutes	External data changes
Directory listings	5 minutes	Files may be added/removed

Event-based invalidation is stronger than time-based TTL for file operations. If you watch the filesystem for changes (inotify, FSEvents), you can keep file caches valid indefinitely until the underlying file actually changes. This pushes hit rates above 90% for file-heavy agent workflows.

Replay mode: deterministic debugging without live calls

Replay mode is the second major capability built on action caching. Instead of caching individual tool calls, replay records an entire session (every tool call, every result, every agent reasoning step) and plays it back later using cached results instead of live execution.

This solves three problems. First, debugging: when an agent misbehaves, you replay the session step-by-step to find exactly where reasoning went wrong, without paying for another full run. Second, regression testing: after changing the model or prompt, replay historical sessions against the new configuration and compare outputs. Third, evaluation: replay recorded sessions with different models to benchmark quality without re-running tools.

TL;DR

Action caching stores tool call results keyed by (tool_name, hash(normalized_args)) and returns cached results on identical future calls, skipping redundant API calls, database queries, and web requests.
Production agents executing retry loops or multi-run workflows see 40-70% reduction in API costs and 10-100x faster replays when caching is enabled.
Replay mode re-runs an entire agent session from recorded logs using cached results instead of live tool calls, enabling deterministic debugging, regression testing, and offline evaluation.
Cache key design is the hardest part: normalize arguments (sort keys, strip timestamps, canonicalize paths) before hashing. Non-deterministic arguments (random IDs, current timestamps) bust the cache if not handled.
Determinism test: a tool is cacheable if identical inputs always produce identical outputs. Side-effecting tools (send_email, write_file, deploy) are never cacheable.
Limitation: stale caches return outdated results. TTL policies must match the volatility of each tool's data source, or you'll debug phantom bugs that only exist in the cache.

# Call A
get_file(path="./src/utils/auth.ts", encoding="utf-8")

# Call B
get_file(path="src/utils/auth.ts", encoding="utf-8")

# Cache key generation with argument normalization
import hashlib, json, os

def make_cache_key(tool_name: str, args: dict, strip_keys: set = None) -> str:
    """Generate a deterministic cache key from tool call."""
    clean_args = dict(args)

    # Strip non-deterministic fields
    for key in (strip_keys or set()):
        clean_args.pop(key, None)

    # Normalize paths to absolute
    for key in ("path", "file", "directory"):
        if key in clean_args and isinstance(clean_args[key], str):
            clean_args[key] = os.path.abspath(clean_args[key])

    # Sort keys for deterministic serialization
    canonical = json.dumps(clean_args, sort_keys=True, default=str)
    content = f"{tool_name}:{canonical}"
    return hashlib.sha256(content.encode()).hexdigest()[:16]

The cacheability test: deterministic vs. side-effecting tools

Not every tool call can be cached. The rule is simple: cache reads, never cache writes.

TTL policies: matching cache lifetime to data volatility

Tool Category	TTL	Rationale
Static config / schema	24 hours	Changes require deploys
File contents (read-only)	Until file modified (inotify or mtime check)	Invalidate on mutation
Linter / compiler output	Until source changes	Deterministic for same input
Web search	10-30 minutes	Results drift with time
API responses (third-party)	5-15 minutes	External data changes
Directory listings	5 minutes	Files may be added/removed

Action caching and replay

TL;DR

The Problem It Solves

What Is It?

How It Works

Cache key design: the make-or-break decision

The cacheability test: deterministic vs. side-effecting tools

TTL policies: matching cache lifetime to data volatility

Replay mode: deterministic debugging without live calls

Continue Reading with Premium

Comments

Action caching and replay

TL;DR

The Problem It Solves

What Is It?

How It Works

Cache key design: the make-or-break decision

The cacheability test: deterministic vs. side-effecting tools

TTL policies: matching cache lifetime to data volatility

Replay mode: deterministic debugging without live calls

Continue Reading with Premium

Comments