Background agent with CI

TL;DR

A background agent with CI runs a coding agent asynchronously on a separate branch, triggered by an issue, PR comment, or ticket assignment, not by a developer sitting in an IDE waiting for output.
CI pipelines (build, test, lint, security scan) act as automated quality gates. The agent's work must pass CI before a human even looks at it, filtering out broken code automatically.
The developer assigns a task and walks away. The agent works like a junior engineer who codes overnight: you assign the ticket at 5 PM, and there's a PR waiting at 9 AM.
Background agents can use cheaper models and more retry cycles because they're not time-constrained. A 15-minute solve cycle is fine when nobody is waiting.
Cost reduction of 40-60% compared to synchronous agent workflows due to off-peak compute, cheaper model selection, and batched CI runs.
Limitation: context staleness. The agent works on a branch that may drift from main. Rebase before the final CI check or risk merge conflicts on review.

You open an issue: "Add input validation to the /users endpoint." It's a well-specified, isolated change. You know exactly what needs to happen. But you also have three design docs to review, a production incident to investigate, and a sprint planning meeting in 20 minutes.

So the ticket sits. For hours. Sometimes days. Not because it's hard, but because no human has the uninterrupted 30-minute block to pick it up, write the code, run the tests, fix the lint errors, and push the PR. The bottleneck isn't intelligence or skill. It's attention.

Synchronous coding agents (the kind that run in your IDE while you watch) solve part of this problem, but they create a new one: you're still babysitting. You prompt the agent, wait for output, review it, run CI, wait for results, prompt again. The agent works fast, but your attention is pinned for the entire cycle. I've seen developers spend 40 minutes "supervising" an agent that did 10 minutes of actual work.

The background agent model flips this entirely. You assign the task, the agent works on its own branch, CI validates the output, and you get a notification when there's a PR ready for review. Your attention cost drops from 40 minutes to 5 minutes of code review.

What Is It?

A background agent with CI is an asynchronous coding agent that receives task assignments (GitHub issues, Linear tickets, PR comments), works independently on a separate branch, uses CI pipelines as its quality feedback loop, and produces a ready-to-review pull request. The human re-enters the loop only for final code review, not during the agent's work cycle.

Think of it as the difference between a restaurant where you stand at the counter watching your food being made (synchronous) versus one where you place your order, sit down, and get notified when it's ready (asynchronous). The kitchen (CI) checks the dish before it reaches you. If it's wrong, the kitchen fixes it before you even know there was a problem.

The key difference from the synchronous coding agent CI feedback loop: the developer is not present during the agent's work cycle. There's no interactive prompt-and-wait. The agent operates autonomously until it either succeeds or escalates.

How It Works

Task ingestion and prioritization

The background agent starts with a task queue. Events arrive from multiple sources: GitHub issue assignments, PR review comments, webhook triggers from project management tools, or scheduled cron jobs. Each event gets parsed into a structured task specification.

Not all tasks are suitable for background agents. The system needs a task classifier that filters for well-specified, isolated changes. Good candidates: bug fixes with reproduction steps, test additions for uncovered code paths, dependency updates, documentation updates, lint/formatting fixes. Bad candidates: architectural refactors, features requiring product decisions, changes that touch 20 files.

I've found the sweet spot is tasks that a senior engineer could spec in one paragraph. If the issue description needs a design doc, it's not a background agent task.

def classify_task(issue) -> TaskSuitability:
    """Classify whether a task is suitable for background agent execution."""
    signals = {
        "has_reproduction_steps": bool(issue.body and "steps to reproduce" in issue.body.lower()),
        "single_file_scope": len(issue.mentioned_files) <= 3,
        "has_test_criteria": bool(issue.labels & {"bug", "test", "docs", "deps"}),
        "no_design_decision": "rfc" not in issue.labels and "design" not in issue.labels,
        "estimated_size": estimate_change_size(issue),  # small/medium/large
    }
    
    if signals["estimated_size"] == "large" or not signals["no_design_decision"]:
        return TaskSuitability.HUMAN_ONLY
    if signals["has_reproduction_steps"] and signals["single_file_scope"]:
        return TaskSuitability.HIGH_CONFIDENCE
    return TaskSuitability.MEDIUM_CONFIDENCE

Branch isolation and execution

Once a task is accepted, the agent creates an isolated environment. It clones the repository (or uses a cached copy), creates a feature branch from the latest main, and begins work. This branch-per-task isolation is critical: it prevents the agent's work from interfering with other development, and it makes the output reviewable through standard PR workflows.

The agent works in a sandboxed environment (Docker container, cloud VM, or ephemeral codespace). It has read access to the full repo and write access only to its feature branch. No access to production credentials, deployment pipelines, or other branches. I've seen teams learn this the hard way when an early agent prototype accidentally pushed to main.

CI as quality gate

The CI pipeline is the background agent's objective feedback loop. Unlike self-critique (where the agent evaluates its own work), CI provides deterministic, verifiable signals. Tests either pass or they don't. The linter either flags issues or it doesn't. There's no ambiguity, no sycophancy, no "looks good to me" bias.

The agent's CI pipeline should include at minimum: compilation/syntax check, unit tests, integration tests (if fast), linting, type checking, and a security scan. Each stage produces structured output that the agent parses into actionable feedback.

One critical detail: the CI pipeline for background agents should be fast. Under 5 minutes is ideal, under 10 is acceptable. Beyond that, the retry cycles become expensive and slow. Some teams run a "fast CI" subset (unit tests + lint) for the agent's iteration cycles, then a "full CI" suite on the final version before opening the PR.

The retry and escalation loop

The agent doesn't just run CI once. It enters a retry loop: generate code, run CI, parse failures, fix code, run CI again. This loop continues until either all checks pass or the agent hits its retry budget.

The retry budget is a critical safety mechanism. Without it, an agent can enter infinite loops where fixing one test breaks another, burning compute and tokens indefinitely. A typical budget is 3-5 attempts with a 30-minute wall clock limit.

TL;DR

A background agent with CI runs a coding agent asynchronously on a separate branch, triggered by an issue, PR comment, or ticket assignment, not by a developer sitting in an IDE waiting for output.
CI pipelines (build, test, lint, security scan) act as automated quality gates. The agent's work must pass CI before a human even looks at it, filtering out broken code automatically.
The developer assigns a task and walks away. The agent works like a junior engineer who codes overnight: you assign the ticket at 5 PM, and there's a PR waiting at 9 AM.
Background agents can use cheaper models and more retry cycles because they're not time-constrained. A 15-minute solve cycle is fine when nobody is waiting.
Cost reduction of 40-60% compared to synchronous agent workflows due to off-peak compute, cheaper model selection, and batched CI runs.
Limitation: context staleness. The agent works on a branch that may drift from main. Rebase before the final CI check or risk merge conflicts on review.

def classify_task(issue) -> TaskSuitability:
    """Classify whether a task is suitable for background agent execution."""
    signals = {
        "has_reproduction_steps": bool(issue.body and "steps to reproduce" in issue.body.lower()),
        "single_file_scope": len(issue.mentioned_files) <= 3,
        "has_test_criteria": bool(issue.labels & {"bug", "test", "docs", "deps"}),
        "no_design_decision": "rfc" not in issue.labels and "design" not in issue.labels,
        "estimated_size": estimate_change_size(issue),  # small/medium/large
    }
    
    if signals["estimated_size"] == "large" or not signals["no_design_decision"]:
        return TaskSuitability.HUMAN_ONLY
    if signals["has_reproduction_steps"] and signals["single_file_scope"]:
        return TaskSuitability.HIGH_CONFIDENCE
    return TaskSuitability.MEDIUM_CONFIDENCE

Background agent with CI

TL;DR

The Problem It Solves

What Is It?

How It Works

Task ingestion and prioritization

Branch isolation and execution

CI as quality gate

The retry and escalation loop

Continue Reading with Premium

Comments

Background agent with CI

TL;DR

The Problem It Solves

What Is It?

How It Works

Task ingestion and prioritization

Branch isolation and execution

CI as quality gate

The retry and escalation loop

Continue Reading with Premium

Comments