People have spent enormous amounts of time and effort to find clever optimizers....

gwern · on Feb 18, 2023

> I don't really see this as any less clever than if someone had written the algorithm by hand.

And you can invent this by hand. I was talking to Shawn Presser literally days before about his experiments in cutting down Adam to low-precision, where he had repeatedly cut it down, eventually to 1-bit, and found it was still working on small-scale Transformers - ie. close to this LION. (He didn't invent it exactly, but was like an `abs()` away or something: https://twitter.com/theshawwn/status/1625681629074137088 ) So that's how you could have invented this yourself: follow the logic of '1-bit Adam' https://arxiv.org/abs/2102.02888#microsoft to see how much you can dispense with modeling the moments.

noduerme · on Feb 18, 2023

Well, it's exactly less clever because once you've written an optimizer to find optimizers, you've cut yourself out of the loop and you're just a manager of things you don't understand.

Go use my public tool https://doxyjs.com and find yourself a compression algo; you probably can. If you understand why it works, that's interesting.

michaelmior · on Feb 18, 2023

> you're just a manager of things you don't understand.

It seems like a pretty big leap to assume the authors don't understand their own algorithm.

wizzard0 · on Feb 18, 2023

End-to-end (or deeper-than-usual) understanding is the reason why I never worried about losing a job.

At the same time… trying to grok-everything consistently kills my attempts at anything business-like, where one has no choice but to focus and delegate.

rsfern · on Feb 18, 2023

There’s some interesting discussion about how to potentially improve the design of the search space. Plus they had to manually simplify the final optimization algorithm, so it’s not like they’ve cut themselves completely out of the loop, it’s just a higher order tool