A single AI agent is remarkably capable, but it has one body and one desk. Everything it's working on has to fit inside its single context window, and it does its tasks one after another. Hand it a job that's genuinely wide — "audit every file in this repo for hardcoded secrets" — and you feel the limits: the relevant material won't all fit at once, and reading it serially is slow. The fix is to stop thinking of one agent and start thinking of a team.
From one worker to a crew
Multi-agent orchestration introduces a lead agent — the orchestrator — whose job is not to do the work itself but to divide and delegate. It breaks the task into independent pieces and spins up a subagent for each one. The subagents run in parallel, each tackling its slice at the same time. When they finish, the orchestrator collects their findings and synthesizes them into a single result for you. It's the manager-and-team pattern, applied to AI.
The quietly important detail is that each subagent gets its own fresh context window. A subagent searching the billing module isn't carrying around everything the auth-module subagent read. That isolation is what makes the approach scale: instead of one window straining to hold the whole repo, you have many windows each holding just one focused chunk. The orchestrator only needs the summaries the subagents send back, not their raw working memory.
How it works: fan out, then synthesize
The shape is always the same. The orchestrator fans out — launching N subagents, each pointed at one part of the problem — and then fans in — reading every subagent's report and merging them. Two situations make this genuinely better than a lone agent. The first is breadth: searching, reading, or reviewing across many files at once, where parallelism turns a long serial slog into a quick sweep. The second is independent verification: spinning up a separate agent to check the first one's work, with no shared context to bias it. The diagram below traces one fan-out-and-synthesize cycle.
- OrchestratorThe lead agent: it splits the task, hands pieces to subagents, and merges what comes back.
- SubagentA worker agent with its own fresh context window, focused on one slice of the job.
- SynthesisThe orchestrator reads every subagent's result and combines them into one answer.
In our stack — Claude Code can act as an orchestrator and launch subagents — separate Claude instances, each with its own fresh context window — to work in parallel. A common use is breadth: spin up several subagents to comb different parts of a large codebase at once, then have the lead agent synthesize their reports. Another is an independent reviewer subagent that checks the main agent's output without inheriting its context. Each subagent runs on one of Anthropic's Claude models, and you can mix model sizes — a lighter model for wide search, a stronger one for synthesis.
The cost: coordination and tokens
Orchestration isn't free, and treating it as a default is a mistake. Every subagent burns its own tokens, so a fan-out of five agents can cost several times what a single agent would. There's coordination overhead too: the orchestrator has to write clear sub-tasks, wait on the slowest subagent, and stitch together results that might disagree. For a small, linear task — fix one function, rename one variable — all of that is pure waste, and a single agent is faster and cheaper.
So the rule of thumb is simple: reach for multiple agents when the task is wide or needs an independent check, and stick with one agent when it's narrow and sequential. Used well, orchestration buys you breadth and a second, unbiased opinion — which is exactly what you want when you move on to verifying AI output.