Building the Assistant 5 — How to Remember

The Tin Man was wrong. He needed a memory to go along with his brain.

Researchers in the AI space are giving a lot of attention these days to memory and managing memory.

Let’s say you want a system that helps you keep track of ideas you’ve and decisions you’ve made while working on a project. More than anything, you’d like to avoid the sorts of problems LLM’s can have when they need to clear older material from the context window and start saying things that contradict something that came up earlier in a chat session.

The problem in this case is that the LLM is forgetting things that it knew just a couple of paragraphs before.

But the memory capabilities you need for, say, a personal assistant who actually manages your calendar, vets your emails, and pays for your airline tickets, are bigger than just managing not to forget things from the very session you’re in (though getting that part right isn’t so bad either).

What your personal assistant AI really needs is a way to remember things from one session to the next. And if you’re talking about how to feed horses, it might well be useful if the assistant can somehow remember that a few weeks ago you had a long conversation about how to feed donkeys.

It’s the context

An LLM, though, can only “remember” things if they are set in the model’s context window. So the trick for memory to pull off is to figure out what’s relevant to the prompt just entered by the user and then load that memory into context before letting the LLM process the prompt.

This can be devilishly hard to get right, and what we’ve got access to in the cloud models right now is pretty superficial out of a certain necessity… Doing anything fancier means shortfalls left and right.

ChatGPT remembers some things about you, but only some things. If you want to find out what ChatGPT remembers about you, you can literally ask it “What do you remember about me?” In my case, that currently produces about 600 words describing things about me. Most of the details are accurate, but not all. It misremembers the name of this newsletter, for instance.

We might stop for a moment, all the same, to appreciate how remarkably well an AI chat app can create the sense that it “knows” you with a mere 600 words and the ability to dynamically look back through a few older chat sessions. If you’re a ChatGPT user, you may have noticed it’s “memory” kicking in when it compares what you’re talking about now to a project you worked on in ChatGPT in the past. It creates an almost uncanny effect.

Claude is also developing these sorts of capabilities within Cowork. It’s already a bit dated by now (since March!), but you can get the general picture and some interesting directions you might want to head from John Conneely’s How I Finally Sorted My Claude Code Memory

Just don’t screw it up

The perceived effect of the LLM having memory is entirely undermined, of course, if the LLM gets the memory wrong, which has happened to me a number of times with ChatGPT. If you had a conversation where you decided one thing, then later had another conversation where you decided things another way, there’s a decent chance that ChatGPT will, in some subsequent chat, bring back the first decision as if it were the final conclusion you’d reached. Oof.

At the heart of any approach to memory is the fundamental problem of how to know which memories (stored chunks of words, the stuff of bullet points) are relevant to whatever prompt has just been given. A crude way of approaching this would be to just grab the most recent handful of chat transcripts. This can actually provide good results in certain instances because it does enable the LLM to “pick up where you left off last time.” But it’s not especially workable overall because it will inherently fail at pulling up older chats and in most cases the benefit that memory is supposed to provide is precisely being able to pull up old bits that you’d likely have forgotten.

Synthesis is king

The other problem with pulling recent chat transcripts is that it’s by no means the most efficient way to place memories in the current context window. Chats tend to be wordy and often meander, so it’s expensive to throw them into context whole. A better memory approach is to view memories as condensed and synthesized recordings of what happened in a chat. Bullet points.

Fortunately, LLMs are rather good at producing summaries (paradoxically, since they so often rattle on and on), so your memory system can grab finished chats and summarize them, making the summary the actual memory, rather than the chat transcript. OpenClaw handles this compaction of memory by placing information into a memory.md markdown file and by making notes about the daily interactions in files named by the convention memory/YYYY-MM-DD.md. When you interact with OpenClaw, it automatically loads today’s and yesterday’s daily notes in addition to memory.md.

This is considerably better than nothing, but it’s not memory in the sense that it doesn’t find related memories from the past (though it does sometimes search through an index it maintains for quick recall of daily memory files). For memory that finds related older material, we need to solve the rather thorny problem of knowing what, exactly, “related” means.

Vectors and cosines

The standard approach these days is to use vector embedding, in which text memories are crunched through an algorithm that represents the “meaning” of the text as a series of numbers. If you have two memories that say the same thing but in somewhat different ways, you should in theory expect both to have the same numerical representation.

This just kicks the problem down the road a bit, though, because we still need to solve the problem of what algorithm will capture meaning in such a mechanical way. For the purposes of this newsletter, it suffices to say that some magic handwaving occurs. What you wind up with is an “embedding” of real numbers in a matrix. This isn’t entirely unlike the matrix of weights used within an LLM to reflect the model’s training, but there are far fewer factors stored about the data embedded in a vector.

Here’s the point: the embedding is a number that exists in a multidimensional array. I think of it roughly as each point being a point in “thought space.” If two thoughts are similar (even if they are expressed in completely different words), they will wind up in the same basic area of the thought space.

Since we’re just looking at a point in a mathematical space that may well have a couple thousand dimensions (don’t think too hard about this, I’d advise), we’re faced with the question of how we know whether two points are close to each other. And if close, then how close?

There are several ways to approach this, but the one you’ll hear talked about most is ‘Cosine similarity.’ And the idea is actually pretty elegant and simple. If you have two vectors, the angle between them (regardless of how they are traveling through the multiple dimensions) will be smaller if they are closer together. Please don’t ask me to show you how it’s calculated.

If you’re getting a little mumbo-jumbo’d out, I don’t blame you. For myself, what I started thinking about was how well it would work to ignore most of this and come up with an extremely, even bone-headedly simple way to manage finding related memories. So I’ve launched an experiment.

The bone-headed approach

For this particular experiment, I’m using WordPress, which you may know as the somewhat dowdy open-source platform for blogging and publishing. In some respects, yes, it feels like it still has the sculpted carpet installed in the seventies, but in plenty of other ways it’s actually very modern and forward looking. It is, for instance, in the process of adding a memory mechanism with some similarities to the memory.md file that OpenClaw uses. I won’t clutter things up with further examples, but trust me, there are some.

What’s nice about WordPress, when creating an AI memory scheme, is that you get a well-ordered SQL database, simple concept of types of text data, user management if you want it, built-in search, a full REST API, an editing environment, plugins to handle all sorts of tasks (including automation of third-party services), and the ability to screw around with and change just about any aspect of the functions of the platform.

Taxonomy shows relatedness

My experiment boils down to using yet another thing you get for free, namely taxonomy support. If you tag the things you want remembered, you can use an LLM to associate whatever the current prompt is with relevant categories, and you can then narrow your search to only memories tagged with those relevant categories. Add a few additional simple heuristics, and you get surprisingly good results in terms of finding background the LLM might need to know.

How does the LLM retrieve the memories? The simplest way (and it should be no surprise that this is what I’m using for this experiment) is to add the memories to the top of the prompt before you send it off to the LLM. In my WordPress experiment, there’s a chat window that roughly duplicates what you’d see in ChatGPT or Claude. The window takes in the first prompt, makes a call that the user doesn’t see to ask the LLM what the relevant tags are, then searches through the tagged memories to create a ranking. The top five (an arbitrary choice on my part) are then prepended to the prompt and the whole bundle is sent off to the LLM.

As a side note, if you want to play along with the WordPress experiment, I’d be happy to set you up a small number of readers (I’m imagining no more than six) with a test WordPress install featuring my add-ons and let you have at building your very own Second Brain. Drop me a note.

I’d be remiss if I didn’t mention that there are lots of ways — some built with considerably more sophisticated tools — to go at this second brain concept. Platforms for storage include Notion and Obsidian. Obsidian has the considerable advantage of storing your entries as simple markdown files, meaning you can use them without Obsidian and, additionally, markdown files are readily consumed by LLMs. Obsidian has a broad ecosystem of third-party plugins, some of which engage AI in one way or another. This is another area I want to spend some time exploring, but so far I haven’t seen anything that actively and directly manages context on the fly (surely someone’s doing this, though). Obsidian is the tool that, more than any other, prompts people to talk about having or wanting a “second brain.”

Finally, may I just say that I hate the term “second brain.” I really feel like someone needs to come up with something that doesn’t sound so kitschy.

Just for fun, here’s the memory note Claude created for this issue of the newsletter. Not bad.


AI Memory Systems — How They Work & Experiments Source: Newsletter issue on AI memory

Core problem: LLMs can only “remember” what’s in their context window. Useful memory requires loading relevant past information into context before processing a new prompt — and getting relevance right is the hard part.

Approaches surveyed:

  • Raw chat transcripts — simple but inefficient; misses older material and wastes context space
  • Summarized/synthesized notes (e.g., OpenClaw’s memory.md + daily YYYY-MM-DD.md files) — much more efficient, but doesn’t surface semantically related older memories
  • Vector embeddings + cosine similarity — the standard approach; converts text to numerical representations in “thought space” so similar ideas cluster together, enabling semantic retrieval
  • Taxonomy/tagging (author’s experiment) — uses WordPress tags + an LLM to identify relevant categories for a prompt, retrieves top-ranked matching memories, and prepends them to the prompt. Surprisingly effective and far simpler than vector approaches.

Key insight: Memory fails if it retrieves the wrong thing — e.g., surfacing an old decision that was later reversed. Accuracy matters as much as recall.

Tools mentioned: ChatGPT memory, Claude/Cowork memory, OpenClaw, Obsidian, Notion, WordPress

Author’s take: Skeptical of “second brain” branding; interested in taxonomy as a pragmatic alternative to full vector search.