Fine-tuning feels like magic when it works; like confusion when it doesnβt. But the good news is that most failures trace back to a few recurring patterns, and each one has a clear sign. Letβs go through the most important pitfalls and how to spot them.
The loss wonβt come down
If the loss stays roughly flat during training, there are a few suspects: the learning rate is too low, LoRA isnβt applied to the right layers, or the Ξ±/r ratio is too small. The first thing to do is check the number of trainable parameters; if that number is unexpectedly small, the configuration is probably wrong. Then raise the learning rate a little.
Overfitting
If the training loss comes down but the validation loss climbs, the model is memorising the data, not learning the pattern. The fix: reduce the number of epochs, raise the dropout, lower the rank, or use early stopping. Overfitting is especially common on small datasets.
Gibberish output
If the model produces garbled, meaningless output after training, the cause is usually one of these: the learning rate is too high, Ξ± is too large, there are too many epochs, or the data is corrupted. The first thing to do is lower the learning rate; this is the most common cause of broken output.
Running out of memory
If you run out of memory during training, try these in order: enable gradient checkpointing (the biggest saving), drop the batch size to one and make up for it with gradient accumulation, use four-bit mode (QLoRA), and finally shorten the sequence length. Usually a combination of the first two is enough.
Slow training
If training takes too long, you have a few options: use Flash Attention on compatible hardware, use bf16 instead of fp16, and increase the batch size if memory allows. Often the bottleneck is suboptimal settings, not the hardware itself.
Data quality
Sometimes everything looks right but the result is poor. In that case, suspect the data. Review a few of the training samples with your own eyes: is the format correct? Are the examples high-quality and consistent? A small clean set almost always works better than a large messy one.
The evaluation gap
The last pitfall is not seeing the problem. Donβt look only at the training loss; always measure on a separate validation set and compare the output qualitatively with the base model. Without that comparison, you might think youβve done something you havenβt β or worse, lowered quality without knowing.
From pitfall to checklist
The key point is that these pitfalls have clear signs, so debugging neednβt be guesswork. When something breaks, read the sign β loss wonβt come down? overfitting? gibberish output? β and go to that specific suspect. With these few checks, almost all early problems get resolved.