AI Agent Memory: How Agents Remember Things Across Sessions
Agents forget everything when a conversation ends. The context window resets, the slate clears, and the next session starts from zero. This is the default behavior of every LLM-based agent, and it’s a real constraint for anything that needs continuity over time.
The solution is an explicit memory layer, attached to the agent and persisted outside the model. There are four distinct types of agent memory, and each one solves a different problem.
The Four Types of Agent Memory
In-Context Memory
This is everything currently loaded into the model’s context window: the conversation history, any documents you’ve passed in, previous tool results. It’s what the agent “knows” right now.
The limit is the context window size. Modern models have large windows (some exceeding 200,000 tokens), but that still runs out. Long conversations, verbose tool outputs, and large documents eat through context fast. When the window fills, older content gets dropped or summarized. The agent gets forgetful.
In-context memory is the fastest and most reliable form of memory. The model attends to everything in it. Use it for the current task. Don’t rely on it for anything that needs to survive across sessions.
Episodic Memory
Episodic memory stores logs of past sessions. When a user starts a new conversation, the agent retrieves relevant episodes from previous interactions and loads a compressed version into context.
Think of it as a personal journal the agent keeps. “Last Tuesday, the user asked me to draft a proposal for client X. They preferred bullet points and a formal tone. The client’s budget was around $50k.” That context shapes how the agent behaves today, even though that conversation ended days ago.
Libraries like Mem0, Zep, and Letta provide episodic memory layers you can attach to agents. Cloudflare launched a managed Agent Memory service in 2026. All of these follow roughly the same model: capture key facts from each session, store them, retrieve the relevant ones at the start of the next session.
The cost is retrieval latency and context budget. You’re spending tokens to load memory, and retrieval is never perfect. The agent may miss a relevant episode or load an irrelevant one.
Semantic Memory
Semantic memory is a knowledge base: facts, domain knowledge, product information, user preferences. It’s less about “what happened last time” and more about “what is true.”
If you’re building a customer service agent, semantic memory might hold your product catalog, pricing rules, and refund policies. The agent retrieves relevant facts when it needs them rather than having everything loaded at once.
Vector databases (Pinecone, Weaviate, pgvector) are the common backend for semantic memory. Retrieval is based on similarity, which works well for natural-language queries but can miss things when phrasing doesn’t match.
Semantic memory scales well. You can have millions of facts without slowing the agent down, as long as your retrieval is good.
Procedural Memory
Procedural memory stores how to do things: workflows, instructions, skill definitions. It’s less “remember this fact” and more “remember how to run this process.”
An agent with procedural memory can recall that when a user asks for a competitive analysis, it should run three searches, pull the top five results from each, and format the output as a table. That workflow is stored and retrieved on demand, rather than re-specified every time.
This is the most underused memory type. Most teams focus on episodic and semantic, but procedural memory is what makes agents consistent about how they work, not just what they know.
Tradeoffs at a Glance
No memory type is free. All of them trade context budget for continuity. Episodic memory requires writes at session end and reads at session start. Semantic memory requires a separate retrieval step. Procedural memory requires well-maintained skill definitions that don’t go stale.
For most agents, a combination works best: in-context for the current task, episodic for user preferences and past interactions, semantic for domain knowledge, procedural for complex workflows.
What Memory Unlocks for Agents
An agent with no memory is a calculator. It takes an input, produces an output, and forgets everything.
An agent with memory can manage a project across weeks. It knows the user prefers short responses. It knows the client’s name and last meeting date. It knows the preferred output format. It doesn’t ask the same clarifying questions on every session.
The concrete gain is fewer back-and-forth exchanges. A user shouldn’t have to re-explain their preferences on every interaction. A customer service agent shouldn’t lose track of a multi-day support ticket. A research agent should know what it already looked into two sessions ago.
External Tool Calls as On-Demand Retrieval
There’s a complementary form of memory that doesn’t involve storage at all: external tool calls.
When an agent runs a web search, fetches a stock quote, or looks up a news article, it’s retrieving information on demand rather than from stored memory. This is a form of retrieval-based memory, with a key advantage: the information is always fresh. Stored semantic memory goes stale. A search result from this morning is current.
External tools don’t replace stored memory. They complement it. Stored memory handles continuity (user preferences, past interactions, domain knowledge). External tools handle timeliness (current events, live data, real-time queries). An agent with both can answer questions that require knowing both the context and the current state of the world.
AgentPatch gives agents access to external tools across search, maps, finance, email, and more. One connection, one bill, no per-service auth. Combine that with a memory layer like Mem0 or Zep, and you get agents that are both current and continuous.
Wrapping Up
The four memory types each fill a different gap. In-context handles the current task. Episodic preserves past interactions. Semantic stores domain knowledge. Procedural captures how-to workflows. External tool calls provide on-demand retrieval of fresh data.
Most production agents need more than one. Start with episodic memory if you’re building anything with returning users. Add semantic memory when your agent needs domain knowledge that won’t fit in context. Use external tools for anything time-sensitive.
Visit agentpatch.ai to connect your agent to external tools without managing per-service auth.