Multi-agent browser automation
Learn how to orchestrate multiple AI agents controlling parallel Chrome instances to perform simultaneous web tasks like form filling, research, data extraction, and complex multi-site workflows.
TL;DR
- Multi-agent browser automation spawns N independent AI agents, each controlling its own Chrome instance via MCP, to perform parallel web tasks like form filling, data scraping, research aggregation, and workflow execution across multiple sites simultaneously.
- The orchestrator (Claude Opus 4.6) plans and distributes subtasks. Sub-agents (Sonnet 4.6) each get an isolated browser profile, a focused task description, and their own Chrome DevTools MCP connection, preventing session/cookie cross-contamination.
- Each Chrome instance consumes 200-500MB RAM. A modern workstation supports 10-20 parallel browser agents; cloud instances handle 50-100. Token costs scale linearly with agent count.
- The three core distribution patterns are fan-out/fan-in (split URL list among agents), pipeline stages (discovery, extraction, enrichment), and competitive search (same query across different platforms, best result wins).
- Limitation: bot detection systems flag parallel browser activity aggressively. Mitigation requires staggered start times, proxy rotation, human-like interaction patterns, and strict rate limit compliance.
The Problem It Solves
You need to fill out 20 job applications across LinkedIn, Indeed, Glassdoor, and five company career portals. Each site has a different form structure, different required fields, and different authentication flows. With a single-browser agent, this takes 3-4 hours of sequential clicking, typing, and waiting for page loads. Most of that time is wasted on network latency and page rendering, not actual decision-making.
This is the serial bottleneck. Humans naturally open multiple browser tabs but operate them one at a time. Click, wait, read, click, switch tab, wait, read, click. Even the fastest human processes tabs sequentially. A single AI agent controlling one browser has the same constraint: it acts in one browser context at a time, blocking on every page load and DOM render.
I watched a team spend an entire afternoon using a single-browser agent to scrape apartment listings from 8 sites. The agent navigated perfectly but finished in 4 hours. Running 8 agents in parallel (one per site) would have taken 30 minutes. The automation logic was identical. The only difference was parallelism.
The deeper problem is that web tasks are embarrassingly parallel. Searching Zillow and searching Apartments.com share zero state. Filling a LinkedIn application and filling a Glassdoor application have no data dependency between them. Yet single-browser automation forces sequential execution on inherently independent workloads.
What Is It?
Multi-agent browser automation is an orchestration pattern where a central planning agent spawns multiple sub-agents, each controlling its own isolated Chrome browser instance, to execute independent web tasks in true parallel.
Think of it as a team of research assistants in a library, each at their own computer terminal. The team lead (orchestrator) hands each assistant a specific task: "You search this database, you search that archive, you check this government website." Each assistant works independently, at their own pace, on their own machine. When they finish, they bring their findings back to the lead, who combines them into a single report.
The key design decision: each sub-agent gets its own full Chrome process (not just a tab). Separate processes mean separate cookies, sessions, localStorage, and network stacks. One agent logging into LinkedIn doesn't affect another agent logged into Indeed. This isolation is what makes true parallelism safe.
How It Works
The orchestrator plans and distributes
Everything starts with the orchestrator agent (Claude Opus 4.6). It receives a high-level task from the user, decomposes it into independent subtasks, and assigns each subtask to a sub-agent. The orchestrator never touches a browser itself. Its job is planning, distributing, monitoring, and merging.
The decomposition strategy depends on the task type. For "search 8 apartment sites," the orchestrator creates 8 identical subtasks with different target URLs. For "build a competitive analysis of 5 companies," each subtask includes a different company plus specific data points to extract (pricing, features, team size, recent blog posts).
I've found that the orchestrator should give each sub-agent the minimum context it needs. Sending the full master plan to every sub-agent wastes tokens and introduces confusion. Agent 3 doesn't need to know what Agent 1 is doing on Zillow. It just needs to know: "Search Craigslist for apartments in Austin under $2,000, extract address, price, bedrooms, square footage, and listing URL."
Sub-agent isolation model
Each sub-agent is a self-contained unit with four components:
-
Its own LLM context window. The sub-agent receives a focused task prompt, not the full orchestrator plan. This keeps context usage small (typically 2,000-4,000 input tokens per sub-agent) and prevents cross-task confusion.
-
Its own Chrome DevTools MCP connection. Each sub-agent connects to a dedicated Chrome instance through the Chrome DevTools Protocol via MCP. This gives it full programmatic control: navigate, click, type, take screenshots, inspect DOM, execute JavaScript.
-
Its own browser profile. Cookies, session storage, cache, and authentication tokens are isolated per instance. Agent A can be logged into LinkedIn while Agent B browses Glassdoor anonymously.
-
Its own error boundary. If Agent C crashes (CAPTCHA wall, blocked IP, JavaScript error), Agents A, B, and D continue unaffected. The orchestrator handles Agent C's failure independently.
Chrome MCP: the browser control interface
The Chrome DevTools MCP server exposes browser capabilities as structured tool calls. Each sub-agent uses these tools without needing to write raw CDP (Chrome DevTools Protocol) commands. The typical tool surface includes:
browser.navigate(url)for page navigationbrowser.click(selector)for clicking elementsbrowser.type(selector, text)for form fillingbrowser.screenshot()for visual state capturebrowser.evaluate(js)for JavaScript execution in page contextbrowser.waitForSelector(selector, timeout)for handling async page loadsbrowser.getContent(selector)for DOM text extraction
Each Chrome instance launches with a unique user-data directory, ensuring complete profile isolation. The MCP server spawns Chrome with flags like --user-data-dir=/tmp/agent-3-profile and --remote-debugging-port=9224 (each agent gets a different port).
Work distribution patterns
Three distribution patterns cover most real-world use cases:
Fan-out / fan-in is the most common. The orchestrator splits a URL list across N agents, each processes their share, and results merge. Example: 40 company websites split across 8 agents, 5 per agent. Each agent visits their 5 sites, extracts contact info, and returns structured data to the orchestrator for deduplication and merging.
Pipeline stages work when tasks have sequential dependencies but the stages themselves can parallelize. Agent pool A discovers URLs (search engines, directories). Agent pool B extracts structured data from discovered URLs. Agent pool C validates and enriches the extracted data. Each pool runs in parallel internally, with data flowing sequentially between pools.
Competitive search dispatches the same query to multiple agents on different platforms. "Find the cheapest flight from Austin to Denver on June 15." One agent checks Google Flights, another checks Kayak, another checks airline direct sites. The orchestrator collects all results and picks the best option.
Animated pipeline walkthrough
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.