When no single model is enough: the primary and verifier pattern

Sometimes a task has two critical demands that no single model satisfies at once. The answer isn't to accept a weak model; it's to combine two.

Sometimes you hit a task with two critical demands — say, it has to both produce something and guarantee that thing has no errors — and no single model satisfies both at the top level. The easy temptation is to take the best available model and settle for a not-quite-good-enough result. But there’s a better way: instead of one weak model, combine two in a pipeline.

The basic idea

The pattern is simple. A primary model produces the main output, and a verifier model checks it and catches errors the primary can’t prevent on its own. Each model is responsible for what it’s strong at. This division of labour lets you build a combination that, in practice, meets both critical demands — even when no single model can handle both.

The key rule: the verifier must genuinely be capable

There’s a decisive rule here that, if ignored, breaks the whole pattern. The verifier must be capable at the critical level on the very axis it checks. If the verifier is itself weak at spotting errors, it just stamps a worthless approval.

And a second, less obvious rule: the primary must not be so weak that the verifier can’t tell error from correctness. A great verifier can’t make something good out of a thoroughly broken output; it can only catch subtle errors in an almost-good one. In other words, the primary should be only one band below the demand, not two. Combining a fairly good primary with a very strong verifier yields a critical-level result; combining a weak primary with the same verifier does not.

Three shapes of pipeline

This pattern has three common implementations with different economics:

Sequential: the primary produces, then the verifier approves or rejects. All traffic passes through both, so cost and latency equal the sum of the two models. The simplest and most reliable form.
Tiered fallback: the verifier runs only on a flagged subset (e.g. suspicious cases). Cost is much lower, because the verifier sees only a small fraction of traffic.
Parallel arbitration: both models run independently and a deterministic rule arbitrates between them. The highest cost and maximum latency, but for cases where the independence of two opinions matters.

The choice among the three is an engineering decision that depends on the sensitivity of the task and the cost and latency budget.

When not to reach for this pattern

This pattern isn’t free; two models mean more cost, latency, and complexity. So it’s only justified when no single model can meet the demand. If one model handles the task, pick it — simpler, cheaper, and with fewer failure points. Keep the combination for where the set of single-model candidates is genuinely empty.

An honest caveat

The last point matters: combining a primary and verifier doesn’t give an absolute guarantee; it only sharply reduces the probability of error. You still have to measure in practice whether this pair really reaches the target quality. The pattern is a good tool, but it doesn’t replace real evaluation.