From LLMs to agents: the complete journey

A language model, at heart, just guesses the next word. Here's how that simple guess becomes an agent once you add tools, memory, and planning — and where you actually need one.

A simple next-word predictor

At the heart of every large language model sits a deceptively simple job: given the text so far, guess what comes next. The model reads the text token by token, predicts one token, appends it to the end, and starts over. That’s it. You can think of it as a kind of “autocomplete on steroids.”

But that simple job has a surprising consequence. To correctly complete the sentence “If we heat water to 100 degrees…”, you have to actually know that water boils. Because the model has seen millions of sentences like this across every domain, learning to predict the next word has forced it to absorb a broad knowledge of the world.

How it is trained

Training happens in three stages. First, pre-training: the model reads an enormous volume of text — books, web pages, encyclopedias, and code — and, without anyone teaching it directly, picks up language, general knowledge, and patterns of reasoning. Second, supervised tuning: with thousands of high-quality question-and-answer pairs, the model learns to answer rather than aimlessly continue the text. Third, learning from human feedback: people rank several answers so the model learns what counts as a “good answer” for a human.

Interestingly, at large scale, abilities emerge that were never taught directly — such as learning from a few examples, or reasoning step by step. This phenomenon is called emergence.

Six things a language model alone doesn’t do well

For all its ability, a language model on its own has six important limits. One, hallucination: it sometimes states, with confidence, something plausible but false — like a student who, instead of saying “I don’t know,” makes a firm guess. Two, a knowledge boundary: the model’s knowledge runs up to a fixed date, like an encyclopedia printed up to a given year. Three, an inability to act: the model only produces text; it sends no email and saves no file — a brain with no hands. Four, weakness at precise arithmetic: it guesses a number more than it computes one. Five, no long-term memory: every conversation starts from zero. Six, sensitivity to how you ask: a small change in the wording can change the answer entirely.

From model to agent

This is where the idea of an agent comes in. An agent places the language model inside something larger. You can sum it up with a simple formula: an agent equals a language model, plus tools, memory, and planning. The model alone is like a consultant who advises but does nothing; an agent is like a project manager who takes a goal, decides the steps, carries them out, and corrects course when something fails.

But not everything needs a full agent. A rule of thumb helps: a large share of tasks can be solved with just a good prompt and a little information retrieval; another portion needs a chain of tools; and only a small portion truly needs a full agent with complex planning. Add complexity only when the task calls for it.