I didn't have the patience to click through after visiting a few pages only to find the depth lacking.
About an hour ago or so I used Opus 4.5 to give me a flat list with summaries. I tried to post it here as a comment but it was too long and I didn't bother to split it up. They all seem to be things I've heard of in one way another, but nothing that really stood out for me. Don't get me wrong, they're decent concepts and it's clear others appreciate this resource more than I.
I have no doubt your intentions were good and I am a fan of open-source myself.
Unfortunately, since this was written, it’s become more widely known and commonplace for large corporations to disproportionally benefit from open source.
And that wouldn’t bother so many as much if it weren’t for the fact that large corporations often do not give back. It’s become so much of an issue that OSS maintainers have switched licenses, some have shifted closed-source, and others have simply abandoned their projects.
Just last week I began rethinking usage of MIT/Apache licenses for future work.
For the longest time I was hesitant about GPLv3 and almost scared to use in my personal projects, but it turns out my hesitations were fueled by...large corporations.
Unfortunately we have to play the game according to the rules on the field. You can decide to opt out entirely (which is fine) or you can play the game and try to win. Personally, I will play and try to win.
That means I will make things, talk about them, and accrue social and/or actual capital for me and my family. I can't stop any megacorp from training on my code, and it's futile to try. I CAN build cool things, talk about them, and get cool jobs or friends or a following or whatever. I understand not everyone is comfortable with this tradeoff.
> I can't stop any megacorp from training on my code, and it's futile to try.
I do not like this take and I do hope you reconsider repeating such.
This very much reads as accepting a lack of any claim to reasonable privacy and ownership, borderline on accepting what I would consider theft.
I understand. I am, however, impotent against OpenAI. If I publish code to GitHub, I understand it will be used for training. I'd rather get the win for my family than try to fight a megacorp. I'm sorry.
ik_llama is almost always faster when tuned. However, when untuned I've found them to be very similar in performance with varied results as to which will perform better.
But vLLM and Sglang tend to be faster than both of those.
There is plenty that can run within 32/64/96gb VRAM.
IMO models like Phi-4 are underrated for many simple tasks.
Some quantized Gemma 3 are quite good as well.
There are larger/better models as well, but those tend to really push the limits of 96gb.
FWIW when you start pushing into 128gb+, the ~500gb models really start to become attractive because at that point you’re probably wanting just a bit more out of everything.
IDK all of my personal and professional projects involve pushing the SOTA to the absolute limit. Using anything other than the latest OpenAI or Anthropic model is out of the question.
Smaller open source models are a bit like 3d printing in the early days; fun to experiment with but really not that valuable for anything other than making toys.
Text summarization, maybe? But even then I want a model that understands the complete context and does a good job. Even things like "generate one sentence about the action we're performing" I usually find I can just incorporate it into the output schema of a larger request instead of making a separate request to a smaller model.
It seems to me like the use case for local GPUs is almost entirely privacy.
If you buy a 15k AUD rtx 6000 96GB, that card will _never_ pay for itself on a gpt-oss:120b workload vs just using openrouter - no matter how many tokens you push through it - because the cost of residential power in Australia means you cannot generate tokens cheaper than the cloud even if the card were free.
> because the cost of residential power in Australia
This so doesn't really matter to your overall point which I agree with but:
The rise of rooftop solar and home battery energy storage flips this a bit now in Australia, IMO. At least where I live, every house has a solar panel on it.
Not worth it just for local LLM usage, but an interesting change to energy economics IMO!
- You can use the GPU for training and run your own fine tuned models
- You can have much higher generation speeds
- You can sell the GPU on the used market in ~2 years time for a significant portion of its value
- You can run other types of models like image, audio or video generation that are not available via an API, or cost significantly more
- Psychologically, you don’t feel like you have to constrain your token spending and you can, for instance, just leave an agent to run for hours or overnight without feeling bad that you just “wasted” $20
- You won’t be running the GPU at max power constantly
This is simply not true. Your heuristic is broken.
The recent Gemma 3 models, which are produced by Google (a little startup - heard of em?) outperform the last several OpenAI releases.
Closed does not necessarily mean better. Plus the local ones can be finetuned to whatever use case you may have, won't have any inputs blocked by censorship functionality, and you can optimize them by distilling to whatever spec you need.
Anyway all that is extraneous detail - the important thing is to decouple "open" and "small" from "worse" in your mind. The most recent Gemma 3 model specifically is incredible, and it makes sense, given that Google has access to many times more data than OpenAI for training (something like a factor of 10 at least). Which is of course a very straightforward idea to wrap your head around, Google was scrapign the internet for decades before OpenAI even entered the scene.
So just because their Gemma model is released in an open-source (open weights) way, doesn't mean it should be discounted. There's no magic voodoo happening behind the scenes at OpenAI or Anthropic; the models are essentially of the same type. But Google releases theirs to undercut the profitability of their competitors.
In theory, it’s only sufficient for pipeline parallel due to limited lanes and interconnect bandwidth.
Generally, scalability on consumer GPUs falls off between 4-8 GPUs for most.
Those running more GPUs are typically using a higher quantity of smaller GPUs for cost effectiveness.
> i think people that were promoted for building systems that turned out bad, should be demoted
Nope, in the same vein of "lording" over others, they become the expert of knowledge of bullshit. The environments that allow such behavior have already engrained reward of such behavior.
There’s been community commentary that many of the GPT models are a tad overfitted WRT benchmarks. Benchmarks are not representative of end user experiences. That’s not to say the benchmarks aren’t useful at all, but are only useful as a subjective indicator.
The goal is to weight initial context with principles to improve software development when using LLMs.
reply