Smaller, faster, cheaper: the case against one big model for everything

It's tempting to hand every job to the most capable model. But most jobs need only one focused capability — and the right smaller model does it faster and cheaper.

One big model for everything

It’s tempting to pick the most capable model and hand it every task. But the reality is that most jobs don’t need the full power of a frontier model; they need one defined capability at a defined level. When you use a big model for a simple job, you pay for power that never gets used.

Every job, a focused need

Rather than rely on a single ranking, it’s better to weigh each role on its own. A classifier that detects user intent doesn’t need deep reasoning; it relies on speed and stability. A summariser needs discipline and structure in its output format. Each role calls for only a subset of capabilities, at a level set by how sensitive it is and what its failure would cost. Once you sharpen the need this way, you often find a smaller, more focused model does the same job well.

The hidden cost of size

This isn’t merely a matter of technical elegance; the real issue is speed and cost. Bigger models are slower and more expensive, and in processes that sit directly on the path of user interaction, that latency and cost become strikingly visible at scale. A simple rule helps: among the models that genuinely meet the job’s real need, choose the smallest. Reaching for a bigger model is justified only when the smaller one truly falls short on a decisive capability.

When no single model is enough

Sometimes no small model can cover every dimension of a job on its own. In that situation, the right answer is not to reach for a giant model, but to combine a primary model with a focused verifier — one that makes up the specific shortfall, and only on the portion of the data flow that needs it. The underlying principle is the same everywhere: fill the gaps by adding focused intelligence, not by scaling up the whole system.