Common failure modes in LLM systems — and how to catch them

A language model fails in specific ways, not random ones. If you know these modes, you can catch them before your users do.

When a language model fails in a real product, it usually isn’t in a random way; it fails in one of a few specific modes. Knowing these modes is like having a checklist: you know where to look and how to test for it. Let’s go through the most important ones.

Comprehension: the model doesn’t understand the input

The first mode is a failure of comprehension: the model misses what’s stated in the input, or loses the thread in long, multi-step contexts. The sign is a reply that answers something that was never asked. The way to catch it is tests that probe understanding of difficult and long inputs.

Reasoning: brittle logic

The second mode is a failure of reasoning: the model settles for surface answers, or slips on multi-step inference and novel logical chains. This mode can be exposed by benchmarking the model on new problems — not examples it likely saw in training.

Structured generation: the format breaks

The third mode is breaking on structured output: the model produces invalid JSON, invents a field, or returns a value outside the allowed list. This one is easy to catch — a structure validator downstream catches any broken output and logs it.

Fidelity: hallucination

Perhaps the most important mode is a failure of fidelity — what we call hallucination. The model ignores the given context and makes something up, or cites a source that doesn’t exist, or backs down from a clear truth under pressure. This mode is dangerous because its output looks right. The way to catch it is measuring fidelity on data where the correct answer is known, and checking how well the model stays anchored to the context.

Calibration: misplaced confidence

The fifth mode is a failure of calibration, with two faces. First, overconfidence: the model is sure about everything and never says “I don’t know.” Second, sycophancy: under user pressure, the model abandons its own correct reasoning and agrees with the user’s wrong answer. Both can be measured with tests that deliberately invite the model to err and see whether it holds firm or yields.

Instruction-following: rules get ignored

The sixth mode is a failure of instruction-following: the model violates multi-part rules, drops a constraint, or ignores negations. An important sub-case of this mode is prompt injection: the model follows instructions hidden inside the user’s input instead of the system’s instructions. This can be exposed with instruction-following tests and adversarial inputs.

Initiative: it answers only what was asked

The last mode is a lack of initiative: the model answers only what was explicitly asked and never flags a gap it can see. This is fine for closed tasks but brittle for open ones. The test is giving open-ended tasks and seeing whether the model goes beyond the minimum.

From a list of failures to a list of tests

The key point is that these modes aren’t random, so you can build a test for each one. Instead of waiting for a user to discover a failure, turn these very modes into a checklist and measure every candidate model against it. This is the difference between a system that knows its failures and one that’s surprised by each new one.