A simple next-word predictor
At the heart of every large language model sits a deceptively simple job: given the text so far, guess what comes next. The model reads the text token by token, predicts one token, appends it to the end, and starts over. Thatβs it. You can think of it as a kind of βautocomplete on steroids.β
But that simple job has a surprising consequence. To correctly complete the sentence βIf we heat water to 100 degreesβ¦β, you have to actually know that water boils. Because the model has seen millions of sentences like this across every domain, learning to predict the next word has forced it to absorb a broad knowledge of the world.
How it is trained
Training happens in three stages. First, pre-training: the model reads an enormous volume of text β books, web pages, encyclopedias, and code β and, without anyone teaching it directly, picks up language, general knowledge, and patterns of reasoning. Second, supervised tuning: with thousands of high-quality question-and-answer pairs, the model learns to answer rather than aimlessly continue the text. Third, learning from human feedback: people rank several answers so the model learns what counts as a βgood answerβ for a human.
Interestingly, at large scale, abilities emerge that were never taught directly β such as learning from a few examples, or reasoning step by step. This phenomenon is called emergence.
Six things a language model alone doesnβt do well
For all its ability, a language model on its own has six important limits. One, hallucination: it sometimes states, with confidence, something plausible but false β like a student who, instead of saying βI donβt know,β makes a firm guess. Two, a knowledge boundary: the modelβs knowledge runs up to a fixed date, like an encyclopedia printed up to a given year. Three, an inability to act: the model only produces text; it sends no email and saves no file β a brain with no hands. Four, weakness at precise arithmetic: it guesses a number more than it computes one. Five, no long-term memory: every conversation starts from zero. Six, sensitivity to how you ask: a small change in the wording can change the answer entirely.
From model to agent
This is where the idea of an agent comes in. An agent places the language model inside something larger. You can sum it up with a simple formula: an agent equals a language model, plus tools, memory, and planning. The model alone is like a consultant who advises but does nothing; an agent is like a project manager who takes a goal, decides the steps, carries them out, and corrects course when something fails.
But not everything needs a full agent. A rule of thumb helps: a large share of tasks can be solved with just a good prompt and a little information retrieval; another portion needs a chain of tools; and only a small portion truly needs a full agent with complex planning. Add complexity only when the task calls for it.