Common fine-tuning pitfalls and how to debug them

Most failed fine-tunes trace back to a few recurring patterns. If you know the signs, debugging becomes a simple checklist instead of guesswork.

Fine-tuning feels like magic when it works; like confusion when it doesn’t. But the good news is that most failures trace back to a few recurring patterns, and each one has a clear sign. Let’s go through the most important pitfalls and how to spot them.

The loss won’t come down

If the loss stays roughly flat during training, there are a few suspects: the learning rate is too low, LoRA isn’t applied to the right layers, or the α/r ratio is too small. The first thing to do is check the number of trainable parameters; if that number is unexpectedly small, the configuration is probably wrong. Then raise the learning rate a little.

Overfitting

If the training loss comes down but the validation loss climbs, the model is memorising the data, not learning the pattern. The fix: reduce the number of epochs, raise the dropout, lower the rank, or use early stopping. Overfitting is especially common on small datasets.

Gibberish output

If the model produces garbled, meaningless output after training, the cause is usually one of these: the learning rate is too high, α is too large, there are too many epochs, or the data is corrupted. The first thing to do is lower the learning rate; this is the most common cause of broken output.

Running out of memory

If you run out of memory during training, try these in order: enable gradient checkpointing (the biggest saving), drop the batch size to one and make up for it with gradient accumulation, use four-bit mode (QLoRA), and finally shorten the sequence length. Usually a combination of the first two is enough.

Slow training

If training takes too long, you have a few options: use Flash Attention on compatible hardware, use bf16 instead of fp16, and increase the batch size if memory allows. Often the bottleneck is suboptimal settings, not the hardware itself.

Data quality

Sometimes everything looks right but the result is poor. In that case, suspect the data. Review a few of the training samples with your own eyes: is the format correct? Are the examples high-quality and consistent? A small clean set almost always works better than a large messy one.

The evaluation gap

The last pitfall is not seeing the problem. Don’t look only at the training loss; always measure on a separate validation set and compare the output qualitatively with the base model. Without that comparison, you might think you’ve done something you haven’t — or worse, lowered quality without knowing.

From pitfall to checklist

The key point is that these pitfalls have clear signs, so debugging needn’t be guesswork. When something breaks, read the sign — loss won’t come down? overfitting? gibberish output? — and go to that specific suspect. With these few checks, almost all early problems get resolved.