Why we don't build chatbots: the two multi-agent paradigms we rejected

Most AI systems are built in one of two shapes — and both break at scale. This is the story of what we chose instead.

When we say “we don’t build chatbots,” we don’t mean conversation is worthless. We mean conversation is not an architecture. A chat window is just a front door; what happens behind that door decides whether your system stands or collapses in the real world.

In the course of building agentic systems, we kept arriving at two common paradigms that looked appealing at first and that we rejected in practice. It’s worth explaining why — because that “why” is exactly what separates an impressive demo from a system you can rely on.

Paradigm one: agents that chat with each other

The first paradigm is tempting: build several agents, put them in a chat room, and let them talk until they reach an answer. One plays the planner, one the critic, one the executor, passing the work back and forth in a free-form conversation.

It looks beautiful in demos. But at scale it has a structural problem. When coordination emerges from a free-form conversation, no one is really accountable. Errors accumulate down the chain; a small misreading at the start of the thread grows by the end. And because the boundaries of responsibility are blurry, when something breaks, figuring out where it broke is nearly impossible. A system whose failures can’t be traced is not a system you can trust.

Agent-to-agent chatter doesn’t hide the cost of coordination; it just makes it invisible. And an invisible cost is the most dangerous kind.

Paradigm two: one agent and a pile of tools

The second paradigm overcorrects in the opposite direction: build one very capable agent and give it every tool it might ever need. One model, dozens of tools, and the hope that the agent will figure out when to use which.

This is simpler, but it has its own fragility. The more tools you give a single agent, the harder its decision becomes and the larger its error surface grows. An agent expected to be customer support, financial analyst, and writer all at once is not truly expert at any of them. And because everything is compressed into one mind, you can’t evaluate one part in isolation, improve one part in isolation, or constrain one part in isolation. When everything is one thing, nothing is containable.

This is the jack-of-all-trades student whose score on any exam is untrustworthy — not because they aren’t bright, but because they have no boundary.

The third path: structure, not conversation

What we arrived at is neither free-form conversation nor a do-everything mind. It is a structure: focused roles, with clear boundaries, that coordinate through defined, traceable events rather than by chatting.

In this view, each agent owns one domain of responsibility, and only that. Coordination emerges not from a lucky conversation but from a clear flow — like a good organization where everyone knows their role and work moves from one station to the next with a defined hand-off. When something breaks, you know exactly where to look. When you want to improve something, you know which role to coach. And when you want to constrain an agent, its boundary is defined up front.

The distinction is subtle but decisive. The first paradigm dissolves responsibility into conversation until it disappears. The second compresses it into a single mind until it can’t be separated. The third path keeps responsibility somewhere you can see it, measure it, and trust it.

Why this matters for you

If AI is going to do real work in your organization — not just an interesting demo — what matters is not the chat window. It’s whether, when the system is under pressure, its behavior can be traced, whether it can be improved part by part, and whether its boundaries can be guaranteed.

We don’t build chatbots because a chatbot is a surface, and we care about the structure that stands behind the surface. The right paradigm isn’t the one that shines brightest in the demo; it’s the one that is still standing on a bad day.