Sometimes you hit a task with two critical demands β say, it has to both produce something and guarantee that thing has no errors β and no single model satisfies both at the top level. The easy temptation is to take the best available model and settle for a not-quite-good-enough result. But thereβs a better way: instead of one weak model, combine two in a pipeline.
The basic idea
The pattern is simple. A primary model produces the main output, and a verifier model checks it and catches errors the primary canβt prevent on its own. Each model is responsible for what itβs strong at. This division of labour lets you build a combination that, in practice, meets both critical demands β even when no single model can handle both.
The key rule: the verifier must genuinely be capable
Thereβs a decisive rule here that, if ignored, breaks the whole pattern. The verifier must be capable at the critical level on the very axis it checks. If the verifier is itself weak at spotting errors, it just stamps a worthless approval.
And a second, less obvious rule: the primary must not be so weak that the verifier canβt tell error from correctness. A great verifier canβt make something good out of a thoroughly broken output; it can only catch subtle errors in an almost-good one. In other words, the primary should be only one band below the demand, not two. Combining a fairly good primary with a very strong verifier yields a critical-level result; combining a weak primary with the same verifier does not.
Three shapes of pipeline
This pattern has three common implementations with different economics:
- Sequential: the primary produces, then the verifier approves or rejects. All traffic passes through both, so cost and latency equal the sum of the two models. The simplest and most reliable form.
- Tiered fallback: the verifier runs only on a flagged subset (e.g. suspicious cases). Cost is much lower, because the verifier sees only a small fraction of traffic.
- Parallel arbitration: both models run independently and a deterministic rule arbitrates between them. The highest cost and maximum latency, but for cases where the independence of two opinions matters.
The choice among the three is an engineering decision that depends on the sensitivity of the task and the cost and latency budget.
When not to reach for this pattern
This pattern isnβt free; two models mean more cost, latency, and complexity. So itβs only justified when no single model can meet the demand. If one model handles the task, pick it β simpler, cheaper, and with fewer failure points. Keep the combination for where the set of single-model candidates is genuinely empty.
An honest caveat
The last point matters: combining a primary and verifier doesnβt give an absolute guarantee; it only sharply reduces the probability of error. You still have to measure in practice whether this pair really reaches the target quality. The pattern is a good tool, but it doesnβt replace real evaluation.