Hi HN, Erik here. Today we launch Butter, an OpenAI-compatible API proxy that caches LLM generations and serves them deterministically on revisit.
Since April, we’ve been working on this concept of “muscle memory,” or deterministic replay, for agent systems performing automations. You may recall our first post in May, launching a python package called Muscle Mem:
https://news.ycombinator.com/item?id=43988381
Since then, the product has evolved entirely, now taking the form of an LLM Proxy. For a deep dive into this process, check out:
https://blog.butter.dev/muscle-mem-as-a-proxy
The proxy’s killer feature is being template-aware, meaning it can reuse cache entries across structurally similar requests. Inducing variable structure from context windows is no easy task, which we cover in a technical writeup here:
https://blog.butter.dev/template-aware-caching
The proxy is currently open-access and free to use so we can quickly discover and work through a slew of edge cases and template-induction errors. There’s much work to be done before it’s technically sound, but we’d love to see you take Butter for a spin and share how it went, where it breaks, if it’s helpful, if we're going down a dead end, etc.
Cheers!