Part
6
  |  
MCP and the Agent Frontier
  |  
Chapter
21

Multi-Agent Collaboration

A single agent can reason. Multiple agents can specialize, delegate, and finish faster — if you know when to split the work.
Reading Time
12
mins
BACK TO CLAUDE MASTERCLASS

The trap is believing that a smarter model eliminates the need for multiple agents. It doesn't. A single agent handling a complex workflow is like a single engineer doing product management, frontend, backend, code review, and QA on the same task in the same cognitive session. Every context switch costs accuracy. Every role change muddies the output. The model doesn't run out of intelligence — it runs out of focus.

I've watched single-agent systems deteriorate in real time. The first task in the chain is pristine. The second is good. By the fourth, the model is mixing up which role it's playing, referencing outputs from step two while trying to produce step five, and generating work that's technically correct but structurally incoherent. The problem isn't the model's capability ceiling. The problem is that a single context window is being asked to hold too many distinct responsibilities at once.

The model doesn't run out of intelligence — it runs out of focus. Every role change inside a single agent muddies the output.

Multi-agent systems solve this by splitting responsibilities across specialized agents, each with its own system prompt, its own tool access, and its own context. The planner plans. The researcher researches. The implementer implements. The reviewer reviews. No agent has to switch roles, which means no agent pays the cognitive tax of context switching.

When One Agent Isn't Enough

Not every task needs multiple agents. A single agent is fine when the task is linear, the reasoning is homogeneous, and the output is a single artifact. Writing a function, answering a factual question, summarizing a document — these are single-agent tasks.

You need multiple agents when three conditions are present:

  1. The task requires different types of thinking. Research requires breadth and exploration. Implementation requires depth and precision. Review requires skepticism and attention to edge cases. These are different cognitive modes, and a single system prompt can't optimize for all of them simultaneously.
  2. The work has natural parallelism. If the documentation and the code don't depend on each other, building them sequentially wastes time. Two agents working simultaneously finish in the time of the slower one, not the sum of both.
  3. Quality requires separation of concerns. An agent that writes code and then reviews its own code is performing theater. The review will always be generous because the reviewer and the author share the same context — and the same blind spots.
Framework · The Cognitive Mode Test · CMT

Before adding agents, ask: does this task require the system to switch between fundamentally different types of thinking? If the answer is yes — if research, implementation, and review each demand a different mindset — split them into separate agents. If the task is a single type of thinking applied repeatedly, keep it in one agent.

The Three Roles: Planner, Worker, Reviewer

The canonical multi-agent architecture assigns three roles. Not every system needs all three, but understanding them gives you the vocabulary to design any multi-agent workflow.

The planner decomposes a goal into actionable steps. It looks at the user's request, considers available context, and produces a structured plan — what to build, in what order, using which tools. The planner never executes. Its output is a specification that another agent can follow without needing to re-derive the reasoning.

The worker takes the specification and builds. It has access to tools — file creation, code execution, API calls — and its system prompt is focused entirely on implementation quality. The worker doesn't question the plan. It executes it with precision.

The reviewer inspects the worker's output against the planner's specification. It checks for completeness, correctness, and quality. Its system prompt is deliberately skeptical — it's looking for what's missing, not what's present.

✕ Single agent
  • Plans and executes in the same context
  • Reviews its own work with the same biases
  • Context window fills with mixed responsibilities
  • Quality degrades as task complexity grows
✓ Multi-agent
  • Each agent has a focused role and system prompt
  • Reviewer has fresh context, no author bias
  • Each context window is clean and purpose-built
  • Quality is consistent regardless of complexity

Building a Two-Agent Pipeline

Here's a working implementation of the most practical multi-agent pattern: a research agent that analyzes and plans, followed by an implementation agent that builds. This is the pipeline I reach for on any task where the research and the output are different types of work.

import os
import asyncio
from claude_sdk import query, ClaudeAgentOptions
from dotenv import load_dotenv

load_dotenv()

async def research_agent(goal: str, context: str = "") -> str:
    """Analyze a request and produce an actionable specification."""
    
    research_prompt = f"""Goal: {goal}

Context from knowledge base:
{context}

Analyze this request and create a concise specification:
1. What exactly needs to be built
2. Which files should be created
3. Implementation guidelines and constraints
4. Edge cases to handle"""

    options = ClaudeAgentOptions(
        system_prompt=(
            "You are a research agent. Your job is to analyze requests "
            "and produce clear, actionable specifications. Never implement — "
            "only specify. Be concise. Be specific."
        ),
        allowed_tools=["read", "grep"],
        model="claude-sonnet-4-20250514"
    )

    specification = ""
    async for message in query(research_prompt, options):
        if hasattr(message, "content"):
            for block in message.content:
                if hasattr(block, "text"):
                    specification += block.text
                    print(block.text, end="")
    
    return specification


async def implementation_agent(
    spec: str, 
    goal: str, 
    workspace: str = "./workspace"
) -> list[str]:
    """Take a specification and build it."""
    
    impl_prompt = f"""Specification:
{spec}

Original goal: {goal}

Implement this specification completely:
- Create every file described in the spec
- Use proper error handling and type hints
- Verify each file after creation
- Summarize what was built"""

    options = ClaudeAgentOptions(
        system_prompt=(
            "You are an implementation agent. Your job is to build exactly "
            "what the specification describes. Use the write tool to create "
            "files. Use the read tool to verify them. Do not deviate from "
            "the specification."
        ),
        allowed_tools=["write", "read", "edit"],
        permission_mode="accept_edits",
        cwd=workspace,
        model="claude-sonnet-4-20250514"
    )

    files_created = []
    async for message in query(impl_prompt, options):
        if hasattr(message, "content"):
            for block in message.content:
                if hasattr(block, "text"):
                    print(block.text, end="")
    
    return files_created


async def run_pipeline(goal: str, context: str = ""):
    """Orchestrate the two-agent workflow."""
    print("=" * 60)
    print("RESEARCH AGENT: Starting analysis")
    print("=" * 60)
    
    spec = await research_agent(goal, context)
    
    print("\n" + "=" * 60)
    print("IMPLEMENTATION AGENT: Starting build")
    print("=" * 60)
    
    files = await implementation_agent(spec, goal)
    
    print("\n" + "=" * 60)
    print("Pipeline complete.")
    print("=" * 60)


if __name__ == "__main__":
    goal = "Create a login form component with email validation"
    asyncio.run(run_pipeline(goal))

The critical design choice is what each agent is not allowed to do. The research agent has read and grep — it can inspect existing files, but it can't create or modify anything. The implementation agent has write, read, and edit — it can build, but its system prompt tells it to follow the specification, not re-derive it. The separation of permissions mirrors the separation of concerns.

Why permission separation matters

Giving every agent full tool access defeats the purpose of specialization. If the research agent can write files, it will. If the implementation agent can search broadly, it will spend time re-researching instead of building. Constrained tool access forces each agent to stay in its lane.

Routing: Letting an Agent Choose the Agent

The next level of sophistication is a router — an agent whose only job is to read the incoming request and decide which specialist should handle it. The router doesn't do the work. It classifies the work and delegates.

async def router_agent(request: str, context: str = "") -> dict:
    """Decide which specialist agent should handle this request."""
    
    routing_prompt = f"""Request: {request}

Available agents:
- CODE: Writes code, creates components, builds features
- DOCS: Writes documentation, guides, README files

Analyze the request and decide:
1. Which agent should handle it (CODE or DOCS)
2. Why that agent is the right choice

Respond with exactly:
AGENT: [CODE or DOCS]
REASON: [one sentence]"""

    options = ClaudeAgentOptions(
        system_prompt=(
            "You are a routing agent. Read the request, choose the "
            "best specialist, and explain your decision in one line. "
            "Do not do the work yourself."
        ),
        allowed_tools=[],  # Router has no tools — it only decides
        model="claude-sonnet-4-20250514"
    )

    response = ""
    async for message in query(routing_prompt, options):
        if hasattr(message, "content"):
            for block in message.content:
                if hasattr(block, "text"):
                    response += block.text

    # Parse the routing decision
    agent_type = "code"  # default
    reason = ""
    for line in response.strip().split("\n"):
        if line.startswith("AGENT:"):
            agent_type = line.split(":")[1].strip().lower()
        elif line.startswith("REASON:"):
            reason = line.split(":", 1)[1].strip()

    return {"agent": agent_type, "reason": reason}

Notice that the router has zero tools. No read, no write, no grep. Its only capability is reasoning about text. This is intentional — the router is a pure decision function. It takes input, produces a classification, and passes control. The moment you give a router tools, it stops being a router and becomes a worker that sometimes routes.

The moment you give a router tools, it stops being a router and becomes a worker that sometimes routes.

Parallel Execution: Agents Working Simultaneously

The most powerful pattern in multi-agent systems is parallel execution — two or more agents working on independent tasks at the same time. The coordinator decomposes the request into parallel-safe subtasks, and asyncio.gather runs them concurrently.

async def parallel_workflow(request: str, context: str = ""):
    """Run code and docs agents simultaneously."""
    
    # Step 1: Coordinator decomposes the task
    coordinator = await coordinator_agent(request, context)
    code_task = coordinator["code_task"]
    docs_task = coordinator["docs_task"]
    
    # Step 2: Both agents start at the same time
    results = await asyncio.gather(
        code_agent(code_task, context),
        docs_agent(docs_task, context)
    )
    
    code_result, docs_result = results
    
    print(f"Code agent: {len(code_result['files'])} files "
          f"in {code_result['duration']:.1f}s")
    print(f"Docs agent: {len(docs_result['files'])} files "
          f"in {docs_result['duration']:.1f}s")

The line asyncio.gather is where the magic happens. Without it, the code agent finishes and then the docs agent starts — sequential execution. With it, both agents start at the same moment and run simultaneously. The total time is the duration of the slower agent, not the sum of both.

When not to parallelize

Parallel execution only works when the tasks are truly independent. If the docs agent needs to reference the code the code agent produces, they can't run simultaneously — the docs agent would be writing documentation for code that doesn't exist yet. Always verify that parallel subtasks have no data dependencies before using asyncio.gather.

The Orchestration Hierarchy

Every multi-agent system I've built follows the same hierarchy, and understanding it saves you from over-engineering:

  1. Sequential pipeline — Agent A produces output, Agent B consumes it. Use when the second agent depends on the first agent's work. This is the research-then-implement pattern.
  2. Router — An agent classifies the task and delegates to one specialist. Use when requests vary in type and different types need different skills.
  3. Parallel execution — A coordinator splits the task into independent subtasks and runs multiple agents concurrently. Use when parts of the work are independent and speed matters.
  4. Full orchestration — Combines routing, sequential handoffs, and parallel execution. A coordinator routes the task, some agents run in parallel, their outputs feed into a sequential review. Use only when the problem genuinely requires this complexity.
Key takeaway

Multi-agent systems are not about having more agents. They're about having the right agents with focused responsibilities. A two-agent pipeline that separates research from implementation will outperform a five-agent system where every agent has overlapping tool access and vague system prompts.

Most teams start at level four and wonder why their system is fragile. Start at level one. Add a router when you genuinely have different task types. Add parallel execution when timing benchmarks prove the bottleneck is sequential processing. The simplest architecture that solves the problem is always the right architecture.

What to Do Monday Morning

Build a two-agent pipeline for your next complex task

Pick a task you'd normally give a single agent — building a component, writing a report, analyzing a dataset. Split it into a research agent that produces a specification and an implementation agent that builds from the spec. Compare the output quality against a single-agent approach.

Enforce tool separation between agents

Give each agent only the tools it needs for its role. The researcher gets read-only access. The implementer gets write access. The router gets no tools at all. Run the pipeline and verify that no agent exceeds its designated responsibilities.

Add a router to handle mixed-type requests

Build a router agent that classifies incoming requests as "code" or "documentation" and delegates to the appropriate specialist. Test it with ten requests of varying types and verify it routes correctly at least eight times.

Benchmark sequential vs. parallel execution

Take two independent subtasks and time them running sequentially, then in parallel with asyncio.gather. Measure the wall-clock difference. If parallel execution isn't at least 30% faster, the tasks may not be independent enough to benefit.

Resist the urge to add more agents

Before adding a third agent, prove that two agents are insufficient for the quality bar you need. The most common multi-agent mistake is over-decomposition — splitting work so finely that the coordination overhead exceeds the specialization benefit.

The simplest architecture that solves the problem is always the right architecture. Start with two agents. Add more only when you can prove the third one earns its coordination cost.