Self-modifying system prompts

TL;DR

Self-modifying system prompts let agents write learned rules to their own instruction files (claude.md, agents.md, gemini.md) so corrections persist across sessions without any external database.
The error correction loop is simple: agent reads prompt file at session start, makes a mistake, receives correction, appends a numbered rule, and never repeats that mistake in future sessions.
Teams report a 40-70% reduction in repeated errors after 10-15 sessions, following a power-law decay curve where most gains happen in the first 5 sessions.
The two-tier structure (global user-wide rules + local project-specific rules) keeps preferences portable while letting project conventions stay scoped.
Practical ceiling: most users accumulate 50-200 high-value rules before diminishing returns. Beyond that, rules start conflicting and the file needs periodic pruning.

You spend 20 minutes explaining to your coding agent that this project uses PostgreSQL, not MySQL. It gets it right for the rest of the session. Next morning, you open a new session and the agent generates MySQL migrations again. You correct it. Again.

Every session boundary is a hard reset. The agent forgets your formatting preferences, your deployment targets, your team's naming conventions, and the five architectural decisions you explained last week. You become a human memory bank, repeating the same corrections to a system that should be learning from them.

I've watched engineers burn 15-30 minutes per day on re-explaining preferences that haven't changed in months. That's 5-10 hours per month of pure waste, not counting the errors that slip through when the engineer forgets to re-state a correction.

What Is It?

Self-modifying system prompts turn agent correction into persistent memory by writing learned rules directly to the agent's instruction file. Instead of storing preferences in a database or vector store, the agent appends plain-text rules to a file that gets loaded at the top of every conversation.

Think of it as a restaurant kitchen's correction board. Every time a dish comes back ("customer is allergic to shellfish," "table 4 wants extra crispy"), the chef writes it on a whiteboard that the whole kitchen reads at the start of each shift. The board grows over time. New cooks read it and avoid known mistakes without anyone explaining them verbally. The knowledge lives in the environment, not in any single person's head.

The key insight: the file serves as both the agent's instructions and its memory. There is no separate storage layer or retrieval pipeline. The agent reads the file, follows the rules, and writes new rules when it learns something. The entire memory mechanism is a text file and an append operation.

How It Works

The self-correcting loop

Every session follows the same five-phase cycle:

Load: The agent reads the system prompt file (claude.md, CLAUDE.md, agents.md, or gemini.md depending on the platform). The file's contents are prepended to the conversation context before any user message.
Execute: The user assigns a task. The agent works within the constraints established by the loaded rules.
Correct: The user spots an error or provides a preference. "Always use tabs, not spaces" or "This project deploys to AWS, not GCP."
Write: The agent appends a new numbered rule to the "Learned Rules" section of the prompt file. The rule follows a strict format: [category] Always/Never do X because Y.
Persist: The file is saved to disk. When the next session starts, the agent loads the updated file and the new rule is active from the first message.

The correction doesn't need to be explicit. If the agent detects its own mistake (a test fails, a build breaks, an output doesn't match requirements), it can self-correct by writing a rule without user intervention.

The two-tier hierarchy

Not all rules belong in the same file. Self-modifying systems use a two-tier structure to separate concerns:

Global rules live at ~/.claude/claude.md (or the platform equivalent). These capture your personal preferences: "I prefer concise responses," "Always use TypeScript over JavaScript," "Never generate code without types." They travel with you across every project.

Local rules live at the project root (e.g., ./claude.md, ./CLAUDE.md, ./agents.md). These capture project-specific knowledge: "This repo uses PostgreSQL 16," "Tests run with pytest, not unittest," "The API follows REST conventions with snake_case fields." They stay with the project.

I've found that separating global from local rules is the single most important structural decision. Without it, you end up explaining personal preferences in every project or polluting a shared project file with individual quirks.

The meta-prompt: instructions for self-modification

The system prompt file isn't just a list of rules. It contains a meta-prompt: instructions telling the agent HOW to modify itself. This is the self-referential core of the pattern.

A typical meta-prompt section looks like this:

# System Prompt — Project X

## Instructions for this file
Before starting any task, read this entire file.
When the user corrects you or you make a mistake, immediately
append a new numbered rule to the Learned Rules section below.
Format: [category] Always/Never do X because Y.
Do not remove or modify existing rules without explicit permission.

## Project Context
- Language: TypeScript 5.4
- Framework: Next.js 15
- Database: PostgreSQL 16
- Deploy target: Vercel

## Learned Rules
1. [style] Always use named exports, never default exports,
   because this project's ESLint config enforces it.
2. [testing] Always mock database calls in unit tests because
   the CI pipeline has no database access.
3. [deploy] Never use edge runtime for API routes because
   this project requires Node.js crypto module.

The meta-prompt creates a closed loop: the file tells the agent to modify the file. Each modification makes the file more comprehensive, which makes the agent more accurate, which reduces the frequency of future modifications.

Rule format and categorization

Effective rules follow a consistent structure that the agent can parse reliably:

Format: [category] Always/Never do X because Y.

The "because Y" clause is critical. Without it, rules become opaque commands that the agent follows blindly. The reason gives the agent context to generalize: if the reason is "because the CI pipeline has no database access," the agent can infer that other integration-dependent operations might also need mocking.

Common categories:

Category	Example rule
`[style]`	Always use 2-space indentation because the .editorconfig enforces it.
`[architecture]`	Never create circular dependencies between modules because the build tool fails silently.
`[testing]`	Always write integration tests for payment flows because unit tests missed a Stripe webhook edge case in March.
`[deployment]`	Never use environment variables in client-side code because they leak to the browser bundle.
`[communication]`	Always explain changes before making them because the user prefers reviewing plans first.
`[security]`	Never log request bodies containing authentication tokens because the log aggregator is not PII-safe.

TL;DR

Self-modifying system prompts let agents write learned rules to their own instruction files (claude.md, agents.md, gemini.md) so corrections persist across sessions without any external database.
The error correction loop is simple: agent reads prompt file at session start, makes a mistake, receives correction, appends a numbered rule, and never repeats that mistake in future sessions.
Teams report a 40-70% reduction in repeated errors after 10-15 sessions, following a power-law decay curve where most gains happen in the first 5 sessions.
The two-tier structure (global user-wide rules + local project-specific rules) keeps preferences portable while letting project conventions stay scoped.
Practical ceiling: most users accumulate 50-200 high-value rules before diminishing returns. Beyond that, rules start conflicting and the file needs periodic pruning.

The Problem It Solves

What Is It?

How It Works

The self-correcting loop

Every session follows the same five-phase cycle:

Load: The agent reads the system prompt file (claude.md, CLAUDE.md, agents.md, or gemini.md depending on the platform). The file's contents are prepended to the conversation context before any user message.
Execute: The user assigns a task. The agent works within the constraints established by the loaded rules.
Correct: The user spots an error or provides a preference. "Always use tabs, not spaces" or "This project deploys to AWS, not GCP."
Write: The agent appends a new numbered rule to the "Learned Rules" section of the prompt file. The rule follows a strict format: [category] Always/Never do X because Y.
Persist: The file is saved to disk. When the next session starts, the agent loads the updated file and the new rule is active from the first message.

The two-tier hierarchy

Not all rules belong in the same file. Self-modifying systems use a two-tier structure to separate concerns:

The meta-prompt: instructions for self-modification

The system prompt file isn't just a list of rules. It contains a meta-prompt: instructions telling the agent HOW to modify itself. This is the self-referential core of the pattern.

A typical meta-prompt section looks like this:

# System Prompt — Project X

## Instructions for this file
Before starting any task, read this entire file.
When the user corrects you or you make a mistake, immediately
append a new numbered rule to the Learned Rules section below.
Format: [category] Always/Never do X because Y.
Do not remove or modify existing rules without explicit permission.

## Project Context
- Language: TypeScript 5.4
- Framework: Next.js 15
- Database: PostgreSQL 16
- Deploy target: Vercel

## Learned Rules
1. [style] Always use named exports, never default exports,
   because this project's ESLint config enforces it.
2. [testing] Always mock database calls in unit tests because
   the CI pipeline has no database access.
3. [deploy] Never use edge runtime for API routes because
   this project requires Node.js crypto module.

Rule format and categorization

Effective rules follow a consistent structure that the agent can parse reliably:

Format: [category] Always/Never do X because Y.

Common categories:

Category	Example rule
`[style]`	Always use 2-space indentation because the .editorconfig enforces it.
`[architecture]`	Never create circular dependencies between modules because the build tool fails silently.
`[testing]`	Always write integration tests for payment flows because unit tests missed a Stripe webhook edge case in March.
`[deployment]`	Never use environment variables in client-side code because they leak to the browser bundle.
`[communication]`	Always explain changes before making them because the user prefers reviewing plans first.
`[security]`	Never log request bodies containing authentication tokens because the log aggregator is not PII-safe.

Self-modifying system prompts

TL;DR

The Problem It Solves

What Is It?

How It Works

The self-correcting loop

The two-tier hierarchy

The meta-prompt: instructions for self-modification

Rule format and categorization

Continue Reading with Premium

Comments

Self-modifying system prompts

TL;DR

The Problem It Solves

What Is It?

How It Works

The self-correcting loop

The two-tier hierarchy

The meta-prompt: instructions for self-modification

Rule format and categorization

Continue Reading with Premium

Comments