When you build your first agent, itβs natural to hand everything to one: a single model, a few tools, a little memory. For many jobs thatβs enough. But as the work grows, a question appears β should everything stay inside one agent, or should the work be split across several specialised ones? The answer depends on the complexity of the task, not on which architecture looks βmore advanced.β
What a single agent is
An agent, at its simplest, is a language model plus the power to make decisions, a few tools, and some memory to track the work. A single agent holds all of that in one place and does things in sequence: it understands the input, decides, calls a tool, and returns the result.
The advantage of this approach is simplicity. One path, one memory, one place to debug. When a task is simple and needs only a few tools, a single agent is usually the best choice; adding more agents just adds overhead and complexity.
When one agent isnβt enough
As the work gets more complex, a single agent becomes a bottleneck. Imagine a user who wants to plan a trip: search for flights, find a hotel, suggest activities. Each of these needs different expertise and different tools, and thereβs no reason they should run one after another. A single agent is forced to push them through one at a time, in one crowded context window. This is where splitting the work makes sense.
The coordinatorβworker pattern
The most common way to organise several agents is a three-role pattern. A coordinator takes the request, understands it, and sets the strategy. It then hands the work to several specialised workers, each handling one part β and because theyβre independent, they can run at the same time. At the end, the results are gathered and turned into one coherent answer.
In the trip example, the coordinator hands three workers their jobs: one for flights, one for the hotel, one for activities. All three search in parallel and return their results. The benefits are clear: each worker is more expert because it knows only one job; independent tasks run in parallel and the whole thing is faster; and adding a new capability means adding a new worker rather than rewriting one large agent.
Why parallel workers are safe
Running several workers at once sounds risky β couldnβt they trip over each other? They donβt, and the reason is a design rule worth stating plainly: each worker is stateless. It holds nothing of its own between calls; everything it needs is handed to it as input, and when itβs done it returns a result and keeps no memory of having run. Two workers doing the same kind of job share no hidden state, so there is nothing for them to corrupt by running together. Give a worker the same input twice and you get the same output twice, whichever machine it lands on.
This is what lets splitting the work scale. Because a worker carries no state, you can run ten copies of it as safely as one, add more when the load grows, and reproduce any result by replaying its input. The state that does need to persist β what the user wants, how the overall job is going β lives in one shared place the coordinator manages, not scattered inside the workers. Keep the workers stateless and the hardest parts of running them in parallel mostly disappear.
Which one to choose
The choice between the two is an engineering decision, not a matter of taste. Pick a single agent when:
- The task is simple and its path is short.
- The tools are few and donβt interfere with each other.
- Parallelism brings no benefit, because the steps depend on one another.
And split the work across several agents when:
- Several distinct specialties are needed that donβt blend together (searching flights has nothing to do with searching hotels).
- Parts of the work are independent and running them in parallel saves time.
- You want the system to grow by adding new roles, without touching the whole architecture each time.
The cost of many agents
Splitting the work isnβt free. The moment several agents are involved, you have to manage coordination: who takes which task, how the results come together, and what happens if a worker fails. This new complexity is only justified when the task is large enough. A good rule: start with the simplest thing that works, and only split the work when a single agent has genuinely reached its limit.
Splitting too early is just as much a mistake as splitting too late. The skill is in spotting the right moment β and that moment is set by the complexity of the task, not by the appeal of the architecture.