Fine-tune, RAG, or prompt: which one, and what each costs

There are three ways to adapt a model to your need, and the wrong choice can get expensive. The difference is in what problem each one actually solves.

When a language model has to do something specific to you, you have three routes ahead: prompt engineering, context retrieval (RAG), or fine-tuning. These three are often confused, but each solves a different problem. The wrong choice can be both expensive and give a poor result. Let’s clear up the difference.

Prompt: the cheapest starting point

Prompt engineering means guiding the model with a good instruction alone — without changing anything. This is the cheapest and fastest route: no training is needed, and you can change it in minutes. For many tasks, a clear instruction is enough. A good rule is to always start here; only when the prompt falls short do you reach for the heavier routes.

RAG: when the model needs new knowledge

If the problem is that the model doesn’t have access to some information — your specific documents, say, or data that appeared after the model was trained — the answer is usually RAG: retrieving relevant information and placing it in the model’s context, right at answer time. The big advantage is that knowledge can be updated any time, without retraining the model. RAG is the answer to “the model doesn’t know something,” not “the model doesn’t know how to do something.”

Fine-tuning: when the model has to change its behaviour

Fine-tuning comes in elsewhere: when the problem isn’t knowledge but behaviour. If you want the model to adopt a particular style, to consistently follow a special format, or to internalise something hard to convey with an instruction, fine-tuning is the right route. But its cost is higher: it needs data, training, and maintenance. Fine-tuning is the answer to “the model doesn’t do something the way I want,” not “the model doesn’t know something.”

A simple rule for choosing

The difference can be reduced to a few simple questions. Does a better instruction solve the problem? → prompt. Does the model need information it doesn’t have? → RAG. Does the model have to change its behaviour or style durably? → fine-tuning. And often, the right answer is a combination of these: a fine-tuned model that accesses new knowledge through RAG and is guided by a good prompt.

The cost of each

In terms of cost, the order is clear: prompt is cheapest, RAG is in the middle (the cost of retrieval infrastructure), and fine-tuning is the most expensive (the cost of data, training, and maintenance). The common and costly mistake is to reach for fine-tuning to solve a problem that a prompt or RAG would have solved. Before paying the heavy cost, make sure you genuinely need it.

Putting it together

These three routes aren’t rivals; they’re tools for different problems. Prompt guides behaviour, RAG brings knowledge, and fine-tuning internalises behaviour. The key to a good decision is recognising which problem you’re actually facing — and then choosing the cheapest tool that solves it.