A simple but important fact: a language model remembers nothing on its own. Each time you ask it something, it sees only the text you give it in that moment; yesterdayβs conversation doesnβt exist for it. So when an agent appears to βremember,β that memory doesnβt come from the model itself β it comes from the layers built around it. Letβs get to know those layers.
Three kinds of memory
Agents usually need three kinds of memory, each with a different role.
Working memory is the shortest: the context of the very task in hand. It answers the question βwhat am I doing right now?β Its capacity is small and it holds only a few items at once.
Short-term memory is the history of the current conversation β what was said a few messages ago. This is what lets an agent understand what βmultiply that by threeβ refers to. But this history isnβt endless; as the conversation grows, usually only its recent part is kept.
Long-term memory is persistent: preferences, learned facts, and things that should survive between sessions. It answers βwhat did the user prefer last month?β
The problem of remembering across long conversations
Because the model doesnβt hold the history itself, you have to give it back on every call. But the history keeps getting longer and the context window is limited. There are two common ways to handle this. First, the sliding window: keep only the last few messages and drop the old ones β simple and cheap, but you lose the distant context. Second, summarisation: compress the old messages into a summary and keep that summary alongside the recent messages β more expensive, but it preserves the important context.
Knowledge representation: what long-term memory looks like
Long-term memory isnβt just a pile of text; knowledge has to be stored in a form you can retrieve. There are a few classical methods.
Semantic networks hold knowledge as nodes (concepts) and labelled edges (relations): a βpenguinβ is a βbird,β and a βbirdβ has βwings.β This structure allows inheritance; a penguin inherits a birdβs traits, except where it overrides them itself (a penguin doesnβt fly).
Frames are templates with slots: the βbirdβ frame has slots like {wings: yes, fly: yes}, and the βpenguinβ frame inherits from it but changes the fly slot to βno.β
First-order logic holds knowledge as facts and rules and allows inference: if βSocrates is humanβ and βevery human is mortal,β then βSocrates is mortal.β Itβs precise, but it doesnβt cope well with uncertainty.
Semantic search and the vector store
But for an agentβs memory, you often need something softer. Word-by-word search is brittle: if knowledge is stored under βmachine learning,β a search for βAIβ wonβt find it. The solution is semantic search. Each piece of text is turned into a vector that represents its meaning, and these vectors are stored in a vector store. At query time, the query itself is turned into a vector and the nearest vectors are retrieved β that is, the nearest meanings, not the nearest words. This is the very mechanism that lets an agent find similar past conversations, even when they were phrased in different words.
Knowledge graphs: when relationships matter
Another method is the knowledge graph, which holds facts as triples: (subject, relation, object) β like (Milad Tower, located in, Tehran). The advantage of this structure is chain inference: from βthe tower is in Tehranβ and βTehran is in Iran,β you can reach a new conclusion. Combining a knowledge graph with a language model gives the best of both worlds: precise facts from the graph, and natural language from the model.
Putting it together
An agentβs memory isnβt one thing; itβs a set of layers, each doing a different job: working memory for the immediate task, short-term for the current conversation, and long-term for lasting knowledge. And because the model itself holds nothing, designing these layers well is what turns an agent from a forgetful responder into a colleague that remembers.