Multi-agent orchestration

TL;DR

Single agents hit limits at the context window boundary: they can't parallelize work or specialize by domain. Multi-agent systems break through both limits.
The supervisor pattern: one orchestrator decomposes tasks and dispatches to specialized workers. The orchestrator coordinates; it doesn't do the work.
Parallel workers: independent subtasks run simultaneously. Deep research agents spawn 5-10 subagents at once. Latency equals the slowest task, not the sum.
Communication failure is the most common source of multi-agent bugs. Agents must communicate via structured messages with defined schemas, not free-text strings.
State management for shared work must be explicit: a database, a message queue, or a shared store. Agents passing state through context chains are brittle.

You're building a research agent that needs to: search for recent news on a topic, find academic papers, summarize each source, compare findings, and write a final synthesis. If one agent does this sequentially, it takes 60+ seconds and must hold all intermediate results in a single context window that rapidly fills up.

More fundamentally, the same LLM that searches academic databases isn't necessarily the best model for writing synthesis reports. A generalist model handles both but doesn't excel at either.

Multi-agent systems solve both problems simultaneously: parallel execution collapses the sequential latency, and specialist agents with purpose-built system prompts and targeted tools outperform generalists on narrow tasks.

What is it?

A multi-agent system is a collection of individual AI agents that collaborate to complete tasks that a single agent cannot handle well alone. Each agent has its own system prompt, tools, and context window. A coordination layer (the orchestrator) manages the flow of work between them.

The key architectural decision is how agents communicate and how the orchestrator manages state. Get this wrong and you end up with a system that's harder to debug than a single agent, not easier.

How it works

Supervisor (orchestrator + worker) pattern

The supervisor pattern is the standard architecture for most multi-agent systems. The orchestrator agent receives the top-level task, breaks it down into subtasks, routes each subtask to the appropriate specialist worker, collects the results, and synthesizes the final output.

Decompose Task

>Waiting for input...

Search Agent

>Idle

Code Agent

>Idle

Analysis Agent

>Idle

Synthesize

>Waiting for results...

Supervisor orchestration: decompose, dispatch parallel workers, collect, synthesize

The orchestrator's job is coordination, not execution. A common mistake is loading the orchestrator with both task decomposition logic and substantive work. When the orchestrator does both, it becomes a bottleneck and its context window fills faster.

Parallel execution

Workers that handle independent subtasks should run in parallel. This requires an async execution model: dispatch all workers simultaneously and await their results with asyncio.gather() in Python or similar concurrent patterns.

Deep research products (Perplexity Deep Research, OpenAI Deep Research) run 5-10 search+summarize subagents in parallel. What would take 50-70 seconds sequentially takes 10-15 seconds in parallel. The latency is bounded by the slowest subtask, not by their sum.

Parallelism requires that subtasks are genuinely independent. If Worker B needs Worker A's output before starting, those two workers must run sequentially. Design the task decomposition to maximize independence.

Hierarchical agents

For complex tasks, the supervisor pattern can nest: an orchestrator dispatches to sub-orchestrators, which each manage their own workers. This creates a tree of agents.

The practical limit is 2-3 levels of hierarchy. Each level adds communication latency, error propagation complexity, and debugging difficulty. Deeper trees compound these costs. I default to a flat supervisor + workers structure and only add hierarchy when a subtask domain is genuinely complex enough to warrant its own decomposition logic.

Communication protocols

Agents communicating via structured messages instead of free text is non-negotiable for maintainable multi-agent systems. Define message schemas for every agent handoff.

from pydantic import BaseModel
from typing import Literal

class WorkerTask(BaseModel):
    task_id: str
    task_type: Literal["search", "code", "analysis"]
    input_data: dict
    context: str
    priority: int

class WorkerResult(BaseModel):
    task_id: str
    status: Literal["success", "failure", "partial"]
    output: dict
    error: str | None
    token_usage: int

TL;DR

Single agents hit limits at the context window boundary: they can't parallelize work or specialize by domain. Multi-agent systems break through both limits.
The supervisor pattern: one orchestrator decomposes tasks and dispatches to specialized workers. The orchestrator coordinates; it doesn't do the work.
Parallel workers: independent subtasks run simultaneously. Deep research agents spawn 5-10 subagents at once. Latency equals the slowest task, not the sum.
Communication failure is the most common source of multi-agent bugs. Agents must communicate via structured messages with defined schemas, not free-text strings.
State management for shared work must be explicit: a database, a message queue, or a shared store. Agents passing state through context chains are brittle.

The problem it solves

More fundamentally, the same LLM that searches academic databases isn't necessarily the best model for writing synthesis reports. A generalist model handles both but doesn't excel at either.

What is it?

The key architectural decision is how agents communicate and how the orchestrator manages state. Get this wrong and you end up with a system that's harder to debug than a single agent, not easier.

How it works

Supervisor (orchestrator + worker) pattern

Decompose Task

>Waiting for input...

Search Agent

>Idle

Code Agent

>Idle

Analysis Agent

>Idle

Synthesize

>Waiting for results...

Supervisor orchestration: decompose, dispatch parallel workers, collect, synthesize

from pydantic import BaseModel
from typing import Literal

class WorkerTask(BaseModel):
    task_id: str
    task_type: Literal["search", "code", "analysis"]
    input_data: dict
    context: str
    priority: int

class WorkerResult(BaseModel):
    task_id: str
    status: Literal["success", "failure", "partial"]
    output: dict
    error: str | None
    token_usage: int

Multi-agent orchestration

TL;DR

The problem it solves

What is it?

How it works

Supervisor (orchestrator + worker) pattern

Parallel execution

Hierarchical agents

Communication protocols

Continue Reading with Premium

Comments

Multi-agent orchestration

TL;DR

The problem it solves

What is it?

How it works

Supervisor (orchestrator + worker) pattern

Parallel execution

Hierarchical agents

Communication protocols

Continue Reading with Premium

Comments