Show HN: Butter, a muscle memory cache for LLMs

MorganGallant · 2025-10-01T21:59:18 1759355958

I've known Erik for a while now — simply incredible founder. Doing this as a simple API proxy makes this practically effortless to integrate into existing systems, just a simple URL swap and you're good to go. Then, it's just a matter of watching the cache hit rate go up!

ketan_around · 2025-10-01T19:50:48 1759348248

Exciting to see a product like this launch! There are obviously a host of ‘memory’ solutions out there that try to integrate in fancy ways to cache knowledge / save tokens, but I think there’s a beauty in simplicity to just having a proxy over the OpenAI endpoint.

Interested to see where this goes!

edunteman · 2025-10-01T20:03:55 1759349035

An interesting alternative product to offer is injecting prompt cache tokens into requests where they could be helpful; not bypassing generations but at least low hanging fruit for cost savings

bigwheels · 2025-10-01T22:40:23 1759358423

Are you able to walk through a specific use case or example case in detail? I'm not yet totally grokking what Butter is going to do exactly.

edunteman · 2025-10-01T22:57:46 1759359466

I've got a blog on this from the launch of Muscle Mem, which should paint a better picture https://erikdunteman.com/blog/muscle-mem

Computer use agents (as an RPA alternative) is the easiest example to reach to: UIs change but not often, so the "trajectory" of click and key entry tool calls is mostly fixed over time and worth feeding to the agent as a canned trajectory. I discuss the flaws of computer use and RPA in the blog above.

A counterexample is coding agents: it's a deeply user-interractive workflow reading from a codebase that's evolving. So the set of things the model is inferencing on is always different, and trajectories are never repeated.

Hope this helps

bigwheels · 2025-10-02T00:09:49 1759363789

Still not clear - the tool calls come from the model, so what is being cached by Muscle Memory?

Also:

  After my time building computer-use agents, I’m convinced that the hybrid approach of Muscle Memory is the only viable way to offer 100% coverage on an RPA workload.

100% coverage of what?

I guess it'd be great if you could clarify the value proposition, many folks will be even less patient than myself.

Best of luck!

samraaj · 2025-10-01T20:56:10 1759352170

logged back in to HN to comment on this. looks really sick - i've been saying for a while that a surprising amount of LLM inference really comes down to repetition down a known path.

it's good to see others have seen this problem and are working to make things more efficient. I'm excited to see where this goes.

zyadelgohary1 · 2025-10-01T22:23:50 1759357430

This is awesome, Erik! Excited to see this launch. Definitely fixes some issues we had while building pure CopyCat

tsvoboda · 2025-10-01T20:02:50 1759348970

looks pretty cool! How would you integrate this into production agent stacks like langchain, autogpt, even closed loop robotics?

edunteman · 2025-10-01T20:09:29 1759349369

Thanks! For langchain you can repoint your base_url in the client. Autogpt I'm not as familiar with. Closed loop robotics using LLMs may be a stretch for now, especially since vision is a heavy component, but theoretically the patterns baked into small language models running on-device or hosted LLMs at higher level planning loops, could be emulated by a butter cache if observed in high enough volume.

raymondtana · 2025-10-01T20:18:01 1759349881

For AutoGPT, there is the option to set a llamafile endpoint, which follows the Chat Completions API. So, theoretically, you should be able to use that to point to Butter's LLM proxy.