How agents remember: memory and knowledge representation

A language model has no memory of its own and begins each conversation from scratch. What gives an agent memory is the layers built around the model.

A simple but important fact: a language model remembers nothing on its own. Each time you ask it something, it sees only the text you give it in that moment; yesterday’s conversation doesn’t exist for it. So when an agent appears to “remember,” that memory doesn’t come from the model itself — it comes from the layers built around it. Let’s get to know those layers.

Three kinds of memory

Agents usually need three kinds of memory, each with a different role.

Working memory is the shortest: the context of the very task in hand. It answers the question “what am I doing right now?” Its capacity is small and it holds only a few items at once.

Short-term memory is the history of the current conversation — what was said a few messages ago. This is what lets an agent understand what “multiply that by three” refers to. But this history isn’t endless; as the conversation grows, usually only its recent part is kept.

Long-term memory is persistent: preferences, learned facts, and things that should survive between sessions. It answers “what did the user prefer last month?”

The problem of remembering across long conversations

Because the model doesn’t hold the history itself, you have to give it back on every call. But the history keeps getting longer and the context window is limited. There are two common ways to handle this. First, the sliding window: keep only the last few messages and drop the old ones — simple and cheap, but you lose the distant context. Second, summarisation: compress the old messages into a summary and keep that summary alongside the recent messages — more expensive, but it preserves the important context.

Knowledge representation: what long-term memory looks like

Long-term memory isn’t just a pile of text; knowledge has to be stored in a form you can retrieve. There are a few classical methods.

Semantic networks hold knowledge as nodes (concepts) and labelled edges (relations): a “penguin” is a “bird,” and a “bird” has “wings.” This structure allows inheritance; a penguin inherits a bird’s traits, except where it overrides them itself (a penguin doesn’t fly).

Frames are templates with slots: the “bird” frame has slots like {wings: yes, fly: yes}, and the “penguin” frame inherits from it but changes the fly slot to “no.”

First-order logic holds knowledge as facts and rules and allows inference: if “Socrates is human” and “every human is mortal,” then “Socrates is mortal.” It’s precise, but it doesn’t cope well with uncertainty.

Semantic search and the vector store

But for an agent’s memory, you often need something softer. Word-by-word search is brittle: if knowledge is stored under “machine learning,” a search for “AI” won’t find it. The solution is semantic search. Each piece of text is turned into a vector that represents its meaning, and these vectors are stored in a vector store. At query time, the query itself is turned into a vector and the nearest vectors are retrieved — that is, the nearest meanings, not the nearest words. This is the very mechanism that lets an agent find similar past conversations, even when they were phrased in different words.

Knowledge graphs: when relationships matter

Another method is the knowledge graph, which holds facts as triples: (subject, relation, object) — like (Milad Tower, located in, Tehran). The advantage of this structure is chain inference: from “the tower is in Tehran” and “Tehran is in Iran,” you can reach a new conclusion. Combining a knowledge graph with a language model gives the best of both worlds: precise facts from the graph, and natural language from the model.

Putting it together

An agent’s memory isn’t one thing; it’s a set of layers, each doing a different job: working memory for the immediate task, short-term for the current conversation, and long-term for lasting knowledge. And because the model itself holds nothing, designing these layers well is what turns an agent from a forgetful responder into a colleague that remembers.