Swarm mode: building teams of agents at A0X
Why multi-agent over single-agent, how we orchestrate parallel specialized agents at A0X, and the lessons from running planner/writer/auditor teams on production tasks.
When to use multiple agents: are there subtasks that need different context windows, different prompting strategies, or that can usefully run in parallel? If yes — swarm. If no — single agent, simpler.
At A0X, we hit all three of those conditions at once.
The problem with a single agent
A0X agents handle conversations that span multiple contexts: user history (memory), current product knowledge (knowledge base), task execution (tool calls), and social dynamics (platform-specific response style).
A single agent prompted to handle all of this ends up with a system prompt that’s 6,000 tokens before the conversation starts.
Long prompts have two problems:
- Expensive. Every token in the system prompt costs at input-token rates on every call.
- They degrade. The model’s attention is spread across too many instructions; the less-prominent ones get underweighted. In production: memory retrieval instructions were being ignored 30% of the time in heavily-loaded sessions.
The other problem was specialization. “How to be jessexbt on Telegram” is a different prompting problem from “how to search the knowledge base” — which is a different problem from “how to review and reject a proposed memory before storing it.” These benefit from different prompting strategies.
Team roles
Multi-agent topology
The coordinator is a thin router — it doesn’t do reasoning, it does dispatch. Its prompt is essentially a routing table: given this message type, which agents need to run, and in what order.
Handoff state machine
The DONE gate matters. Tasks don’t complete until Verifier returns PASS. If it fails, the task re-enters the loop. The Planner only re-activates if the loop fails three consecutive times — it can then revise the plan.
This is the architecture we used to build the Event Tickets NFT marketplace on Flow Cadence — zero human code authorship, fully autonomous multi-agent build from spec to deployment.
Swarm spawn logic
async def dispatch_swarm(message: UserMessage, context: ConversationContext):
classification = await coordinator.classify(message)
if classification.is_trivial:
# Fast path: bypass swarm for simple lookups
return await single_agent.respond(message, context)
# Parallel stage: memory + knowledge run concurrently
memory_task = asyncio.create_task(
memory_agent.retrieve(message.text, namespace=context.agent_id)
)
knowledge_task = asyncio.create_task(
knowledge_agent.search(message.text, domain=classification.domain)
)
memory_ctx, knowledge_ctx = await asyncio.gather(memory_task, knowledge_task)
# Sequential stage: platform agent uses both contexts
return await platform_agent.respond(
message=message,
memory_context=memory_ctx,
knowledge_context=knowledge_ctx,
channel=context.channel, # telegram / xmtp / twitter / farcaster
)
Single agent vs. swarm comparison
Single agent
- One 6,000-token system prompt before conversation starts
- All context types co-mingled — memory, KB, platform style, tools
- Memory retrieval instructions ignored 30% of the time under load
- Simpler to debug — one trace, one context window
- Good for: trivial queries, single-context tasks
Swarm (A0X production)
- Coordinator routes to specialized agents with focused prompts
- Memory agent + Knowledge agent run in parallel (lower latency)
- Auditor step mandatory — plan adherence enforced
- Verifier gate — task doesn’t complete until PASS
- Fast path: trivial queries bypass swarm entirely
- Good for: multi-context conversations, long-horizon builds
What broke in practice
Coordination overhead is real. For trivial queries (single KB lookup), the coordinator added latency. We added a fast-path classifier: if the query is classified “trivial,” bypass swarm dispatch entirely.
Agent communication format matters more than you’d expect. We tried natural language summaries between agents — it caused ambiguity downstream. Switched to structured JSON with typed schemas. Inter-agent communication errors dropped significantly.
The auditor role is underweighted if you don’t force it. Initially, the Auditor was optional — the Planner could skip it for “simple” tasks. The Auditor catches plan adherence errors that Writers never catch. We made it mandatory.
The collective brain refactor
Swarm mode revealed that the knowledge base architecture didn’t support parallel queries efficiently. Multiple agents hitting a single Pinecone index with different namespaces caused rate limit issues.
This kicked off the collective brain refactor — from a single knowledge-base-shared Pinecone index to domain-partitioned indices. Each domain (Flow, A0X platform, general crypto) has its own index. Agents query the relevant domain index, not the global one. The RAG migration story is in the next post.
The takeaway on multi-agent
Multi-agent systems are not inherently better. They’re better when the problem genuinely decomposes into specialized subtasks that benefit from isolation — different prompting strategies, parallelism, or context window management.
The question isn’t “should we use agents?” It’s “what’s the minimum number of agents that makes this task meaningfully better?”