Swarm mode: building teams of agents at A0X

4 agents

Planner · Writer · Auditor · Verifier — one autonomous build pipeline

When to use multiple agents: are there subtasks that need different context windows, different prompting strategies, or that can usefully run in parallel? If yes — swarm. If no — single agent, simpler.

At A0X, we hit all three of those conditions at once.

The problem with a single agent

A0X agents handle conversations that span multiple contexts: user history (memory), current product knowledge (knowledge base), task execution (tool calls), and social dynamics (platform-specific response style).

A single agent prompted to handle all of this ends up with a system prompt that’s 6,000 tokens before the conversation starts.

Long prompts have two problems:

Expensive. Every token in the system prompt costs at input-token rates on every call.
They degrade. The model’s attention is spread across too many instructions; the less-prominent ones get underweighted. In production: memory retrieval instructions were being ignored 30% of the time in heavily-loaded sessions.

The other problem was specialization. “How to be jessexbt on Telegram” is a different prompting problem from “how to search the knowledge base” — which is a different problem from “how to review and reject a proposed memory before storing it.” These benefit from different prompting strategies.

Team roles

Planner

Break task into steps

Defines interfaces between subtasks

Writer(s)

Execute individual steps

Run in parallel where possible

Auditor

Review against plan

Terse, decisive — mandatory

Verifier

Run tests, emit PASS/FAIL

Deterministic, schema-bound

Multi-agent topology

flowchart TD UM["User message"] --> CO["Coordinator\n(thin router)"] CO -->|"parallel"| MA["Memory agent\nretrieve past context"] CO -->|"parallel"| KA["Knowledge agent\nsearch KB by domain"] CO --> PA["Platform agent\n(format for channel)"] MA --> MERGE["Merge context"] KA --> MERGE MERGE --> PA PA --> RES["Response to user"]

The coordinator is a thin router — it doesn’t do reasoning, it does dispatch. Its prompt is essentially a routing table: given this message type, which agents need to run, and in what order.

Handoff state machine

stateDiagram-v2 [*] --> Planner : Task arrives Planner --> Writers : defines steps Writers --> Auditor : output Auditor --> Writers : FAIL (retry up to 3x) Auditor --> Verifier : PASS Verifier --> Writers : FAIL (loop back) Verifier --> DONE : PASS DONE --> [*] Verifier --> Planner : 3 consecutive failures → revise plan

The DONE gate matters. Tasks don’t complete until Verifier returns PASS. If it fails, the task re-enters the loop. The Planner only re-activates if the loop fails three consecutive times — it can then revise the plan.

This is the architecture we used to build the Event Tickets NFT marketplace on Flow Cadence — zero human code authorship, fully autonomous multi-agent build from spec to deployment.

Swarm spawn logic

async def dispatch_swarm(message: UserMessage, context: ConversationContext):
    classification = await coordinator.classify(message)
    
    if classification.is_trivial:
        # Fast path: bypass swarm for simple lookups
        return await single_agent.respond(message, context)
    
    # Parallel stage: memory + knowledge run concurrently
    memory_task = asyncio.create_task(
        memory_agent.retrieve(message.text, namespace=context.agent_id)
    )
    knowledge_task = asyncio.create_task(
        knowledge_agent.search(message.text, domain=classification.domain)
    )
    memory_ctx, knowledge_ctx = await asyncio.gather(memory_task, knowledge_task)
    
    # Sequential stage: platform agent uses both contexts
    return await platform_agent.respond(
        message=message,
        memory_context=memory_ctx,
        knowledge_context=knowledge_ctx,
        channel=context.channel,  # telegram / xmtp / twitter / farcaster
    )

Single agent vs. swarm comparison

Single agent

Swarm (A0X)

Single agent

One 6,000-token system prompt before conversation starts
All context types co-mingled — memory, KB, platform style, tools
Memory retrieval instructions ignored 30% of the time under load
Simpler to debug — one trace, one context window
Good for: trivial queries, single-context tasks

Swarm (A0X production)

Coordinator routes to specialized agents with focused prompts
Memory agent + Knowledge agent run in parallel (lower latency)
Auditor step mandatory — plan adherence enforced
Verifier gate — task doesn’t complete until PASS
Fast path: trivial queries bypass swarm entirely
Good for: multi-context conversations, long-horizon builds

What broke in practice

Coordination overhead is real. For trivial queries (single KB lookup), the coordinator added latency. We added a fast-path classifier: if the query is classified “trivial,” bypass swarm dispatch entirely.

Agent communication format matters more than you’d expect. We tried natural language summaries between agents — it caused ambiguity downstream. Switched to structured JSON with typed schemas. Inter-agent communication errors dropped significantly.

The auditor role is underweighted if you don’t force it. Initially, the Auditor was optional — the Planner could skip it for “simple” tasks. The Auditor catches plan adherence errors that Writers never catch. We made it mandatory.

The collective brain refactor

Swarm mode revealed that the knowledge base architecture didn’t support parallel queries efficiently. Multiple agents hitting a single Pinecone index with different namespaces caused rate limit issues.

This kicked off the collective brain refactor — from a single knowledge-base-shared Pinecone index to domain-partitioned indices. Each domain (Flow, A0X platform, general crypto) has its own index. Agents query the relevant domain index, not the global one. The RAG migration story is in the next post.

The takeaway on multi-agent

Multi-agent systems are not inherently better. They’re better when the problem genuinely decomposes into specialized subtasks that benefit from isolation — different prompting strategies, parallelism, or context window management.

The question isn’t “should we use agents?” It’s “what’s the minimum number of agents that makes this task meaningfully better?”