There are plenty of LLMs that aren't MoE/ensemble, and there are also plenty of ...

mjburgess · on May 6, 2024

I dont see what I'm missing. I'm addressing why ChatGPT generated a response given a prompt. If another LLM had been used, something far simpler, the explanation would be different.

If a highly simplified LLM will generate text against discrete quantitative constraints, under a variety of scenarios, then I've underestimated how highly structured the relevant training data must be.

An LLM trained on a physics textbook isnt going to be conversational; one trained on shakespear will generate text from elizabethan english..

ie., in every case, the explanation of why any given response was generated is given by explaining the distribution of its dataset. So if a shakespear LLM generates, "to be or otherwise to be not is alike everything ere annon" we will be mostly explaining how/why those words were used by shakespear.

and if an LLM is small, and is actually discretely sensitive to quantities across a large vareity of domains.. my guess is that its training data has been specially prepared. This is jsut a guess about hte nature of human commnuication though, it has nothing to do with LLMs. I just guess that we don't distribute "quantity tokens" in such a highly patterned way that a simple LLM model would work to find it