Multi-agent orchestration
Learn how to architect multi-agent systems with supervisor patterns, how parallel subagents improve throughput, and what makes agent communication fail at scale.
TL;DR
- Single agents hit limits at the context window boundary: they can't parallelize work or specialize by domain. Multi-agent systems break through both limits.
- The supervisor pattern: one orchestrator decomposes tasks and dispatches to specialized workers. The orchestrator coordinates; it doesn't do the work.
- Parallel workers: independent subtasks run simultaneously. Deep research agents spawn 5-10 subagents at once. Latency equals the slowest task, not the sum.
- Communication failure is the most common source of multi-agent bugs. Agents must communicate via structured messages with defined schemas, not free-text strings.
- State management for shared work must be explicit: a database, a message queue, or a shared store. Agents passing state through context chains are brittle.
The problem it solves
You're building a research agent that needs to: search for recent news on a topic, find academic papers, summarize each source, compare findings, and write a final synthesis. If one agent does this sequentially, it takes 60+ seconds and must hold all intermediate results in a single context window that rapidly fills up.
More fundamentally, the same LLM that searches academic databases isn't necessarily the best model for writing synthesis reports. A generalist model handles both but doesn't excel at either.
Multi-agent systems solve both problems simultaneously: parallel execution collapses the sequential latency, and specialist agents with purpose-built system prompts and targeted tools outperform generalists on narrow tasks.
What is it?
A multi-agent system is a collection of individual AI agents that collaborate to complete tasks that a single agent cannot handle well alone. Each agent has its own system prompt, tools, and context window. A coordination layer (the orchestrator) manages the flow of work between them.
The key architectural decision is how agents communicate and how the orchestrator manages state. Get this wrong and you end up with a system that's harder to debug than a single agent, not easier.
How it works
Supervisor (orchestrator + worker) pattern
The supervisor pattern is the standard architecture for most multi-agent systems. The orchestrator agent receives the top-level task, breaks it down into subtasks, routes each subtask to the appropriate specialist worker, collects the results, and synthesizes the final output.
The orchestrator's job is coordination, not execution. A common mistake is loading the orchestrator with both task decomposition logic and substantive work. When the orchestrator does both, it becomes a bottleneck and its context window fills faster.
Parallel execution
Workers that handle independent subtasks should run in parallel. This requires an async execution model: dispatch all workers simultaneously and await their results with asyncio.gather() in Python or similar concurrent patterns.
Deep research products (Perplexity Deep Research, OpenAI Deep Research) run 5-10 search+summarize subagents in parallel. What would take 50-70 seconds sequentially takes 10-15 seconds in parallel. The latency is bounded by the slowest subtask, not by their sum.
Parallelism requires that subtasks are genuinely independent. If Worker B needs Worker A's output before starting, those two workers must run sequentially. Design the task decomposition to maximize independence.
Hierarchical agents
For complex tasks, the supervisor pattern can nest: an orchestrator dispatches to sub-orchestrators, which each manage their own workers. This creates a tree of agents.
The practical limit is 2-3 levels of hierarchy. Each level adds communication latency, error propagation complexity, and debugging difficulty. Deeper trees compound these costs. I default to a flat supervisor + workers structure and only add hierarchy when a subtask domain is genuinely complex enough to warrant its own decomposition logic.
Communication protocols
Agents communicating via structured messages instead of free text is non-negotiable for maintainable multi-agent systems. Define message schemas for every agent handoff.
from pydantic import BaseModel
from typing import Literal
class WorkerTask(BaseModel):
task_id: str
task_type: Literal["search", "code", "analysis"]
input_data: dict
context: str
priority: int
class WorkerResult(BaseModel):
task_id: str
status: Literal["success", "failure", "partial"]
output: dict
error: str | None
token_usage: int
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.