> If you appeared in a puff of smoke before the authors of that paper, just afte...

Yizahi · 2026-01-05T11:05:45 1767611145

We have LLM memory, it's a training data from which the model was initially programmed. To allow adding or changing LLM memory, we would need to retrain model completely or partially. And that is not realistic any time soon. All other attempts at LLM memory would be just an obscure hack of splitting context window into parts and feeding input from different files. Literally nothing would change if you input half of the query from one file, half from another called "memory.txt" or if you just input whole query from a single file twice as big.

rvz · 2026-01-05T04:21:42 1767586902

> It's clear that we need a paradigm shift on memory to unlock the next level of performance.

I think this is on point to the next phase of LLMs or a different neural network architecture that improves on top of them, alongside continual learning.

Adding memory capabilities would mostly benefit local "reasoning" models than online ones as you would be saving tokens to do more tasks, than generating more tokens to use more "skills" or tools. (Unless you pay more for memory capabilities to Anthropic or OpenAI).

It's kind of why you see LLMs being unable to play certain games or doing hundreds of visual tasks very quickly without adding lots of harnesses and tools or giving it a pre-defined map to help it understand the visual setting.

As I said before [0], the easiest way to understand the memory limitations with LLMs is Claude Playing Pokemon with it struggling with basic tasks that a 5 year old can learn continuously.

[0] https://news.ycombinator.com/item?id=43291895

danpalmer · 2026-01-05T05:32:39 1767591159

Continual learning is definitely part of it. Perhaps part of it (or something else) is learning much faster from many fewer examples.

fragmede · 2026-01-05T04:42:38 1767588158

with beads, or shoving it in git, or .MD files, it's not clear that we do.

danpalmer · 2026-01-05T05:19:24 1767590364

These are all very much in the same category of hacks that I mentioned.

A cat doesn't know its way around a house when it's born, but it also doesn't have to flick through markdown files to find its way around. A child can touch a hot stove once and be neurotic about touching hot things for the rest of their life, without having to read flash cards each morning or think for a few minutes about "what do I know about stoves" every time they're in the kitchen.

fragmede · 2026-01-05T11:06:23 1767611183

Call them a "hack" all you want, they seem to work. What's particularly intesting is how claude has been trained on skills, so it doesn't need to be taught how to use a skill, so that's been baked into it.

danpalmer · 2026-01-05T22:24:28 1767651868

I'm not claiming they don't work in some sense, but as a user you have to be fairly deeply aware of how they work, context engineering is A Thing, you have to tell LLMs to remember stuff, etc.

We're hacking around the fact that the models don't learn in normal use. That's in no way controversial.

A model that continuously learnt would not need the same sort of context engineering, external memory databases, etc.

fragmede · 2026-01-06T09:42:28 1767692548

You speak the truth but looking back, what I reacted to is

> It's clear that we need a paradigm shift on memory to unlock the next level of performance.

and my take is that we might not need to get there to get the next level of performance, based on how well the latest models are able to utilize these hacks of a memory feature. On top of that, Claude was specifically RLHF'd to have the skills concept, so it's good with those. We disagree. Let's let time see who ends up being right.