Hacker Newsnew | past | comments | ask | show | jobs | submit | klaussilveira's commentslogin


I made a similar comment on a different thread, but I think it also fits here: I think the disconnect between engineers is due to their own context. If you work with frontend applications, specially React/React Native/HTML/Mobile, your experience with LLMs is completely different than the experience of someone working with OpenGL, io_uring, libev and other lower level stuff. Sure, Opus 4.5 can one shot Windows utilities and full stack apps, but can't implement a simple shadowing algorithm from a 2003 paper in C++, GLFW, GLAD: https://www.cse.chalmers.se/~uffe/soft_gfxhw2003.pdf

Codex/Claude Code are terrible with C++. It also can't do Rust really well, once you get to the meat of it. Not sure why that is, but they just spit out nonsense that creates more work than it helps me. It also can't one shot anything complete, even though I might feed him the entire paper that explains what the algorithm is supposed to do.

Try to do some OpenGL or Vulkan with it, without using WebGPU or three.js. Try it with real code, that all of us have to deal with every day. SDL, Vulkan RHI, NVRHI. Very frustrating.

Try it with boost, or cmake, or taskflow. It loses itself constantly, hallucinates which version it is working on and ignores you when you provide actual pointers to documentation on the repo.

I've also recently tried to get Opus 4.5 to move the Job system from Doom 3 BFG to the original codebase. Clean clone of dhewm3, pointed Opus to the BFG Job system codebase, and explained how it works. I have also fed it the Fabien Sanglard code review of the job system: https://fabiensanglard.net/doom3_bfg/threading.php

We are not sleeping on it, we are actually waiting for it to get actually useful. Sure, it can generate a full stack admin control panel in JS for my PostgreSQL tables, but is that really "not normal"? That's basic.


We have an in-house, Rust-based proxy server. Claude is unable to contribute to it meaningfully outside of grunt work like minor refactors across many files. It doesn't seem to understand proxying and how it works on both a protocol level and business logic level.

With some entirely novel work we're doing, it's actually a hindrance as it consistently tells us the approach isn't valid/won't work (it will) and then enters "absolutely right" loops when corrected.

I still believe those who rave about it are not writing anything I would consider "engineering". Or perhaps it's a skill issue and I'm using it wrong, but I haven't yet met someone I respect who tells me it's the future in the way those running AI-based companies tell me.


> We have an in-house, Rust-based proxy server. Claude is unable to contribute to it meaningfully outside

I have a great time using Claude Code in Rust projects, so I know it's not about the language exactly.

My working model is is that since LLM are basically inference/correlation based, the more you deviate from the mainstream corpus of training data, the more confused LLM gets. Because LLM doesn't "understand" anything. But if it was trained on a lot of things kind of like the problem, it can match the patterns just fine, and it can generalize over a lot layers, including programming languages.

Also I've noticed that it can get confused about stupid stuff. E.g. I had two different things named kind of the same in two parts of the codebase, and it would constantly stumble on conflating them. Changing the name in the codebase immediately improved it.

So yeah, we've got another potentially powerful tool that requires understanding how it works under the hood to be useful. Kind of like git.


Recently the v8 rust library changed it from mutable handle scopes to pinned scopes. A fairly simple change that I even put in my CLAUDE.md file. But it still generates methods with HandleScope's and then says... oh I have a different scope and goes on a random walk refactoring completely unrelated parts of the code. All the while Opus 4.5 burns through tokens. Things work great as long as you are testing on the training set. But that said, it is absolutely brilliant with React and Typescript.

Well, it's not like it never happened to me to "burn tokens" with some lifetime issue. :D But yeah, if you're working in Rust on something with sharp edges, LLM will get get hurt. I just don't tend to have these in my projects.

Even more basic failure mode. I told it to convert/copy a bit (1k LOC) of blocking code into a new module and convert to async. It just couldn't do a proper 1:1 logical _copy_. But when I manually `cp <src> <dst>` the file and then told it to convert that to async and fix issues, it did it 100% correct. Because fundamentally it's just non-deterministic pattern generator.


This isn't meant as a criticism, or to doubt your experience, but I've talked to a few people who had experiences like this. But, I helped them get Claude code setup, analyze the codebase and document the architecture into markdown (edit as needed after), create an agent for the architecture, and prompt it in an incremental way. Maybe 15-30 minutes of prep. Everyone I helped with this responded with things like "This is amazing", "Wow!", etc.

For some things you can fire up Claude and have it generate great code from scratch. But for bigger code bases and more complex architecture, you need to break it down ahead of time so it can just read about the architecture rather than analyze it every time.


Is there any good documentation out there about how to perform this wizardry? I always assumed if you did /init in a new code base, that Claude would set itself up to maximize its own understanding of the code. If there are extra steps that need to be done, why don't Claude's developers just add those extra steps to /init?

Not that I have seen, which is probably a big part of the disconnect. Mostly it's tribal knowledge. I learned through experimentation, but I've seen tips here and there. Here's my workflow (roughly)

> Create a CLAUDE.md for a c++ application that uses libraries x/y/z

[Then I edit it, adding general information about the architecture]

> Analyze the library in the xxx directory, and produce a xxx_architecture.md describing the major components and design

> /agent [let claude make the agent, but when it asks what you want it to do, explain that you want it to specialize in subsystem xxx, and refer to xxx_architecture.md

Then repeat until you have the major components covered. Then:

> Using the files named with architecture.md analyze the entire system and update CLAUDE.md to use refer to them and use the specialized agents.

Now, when you need to do something, put it in planning mode and say something like:

> There's a bug in the xxx part of the application, where when I do yyy, it does zzz, but it should do aaa. Analyze the problem and come up with a plan to fix it, and automated tests you can perform if possible.

Then, iterate on the plan with it if you need to, or just approve it.

One of the most important things you can do when dealing with something complex is let it come up with a test case so it can fix or implement something and then iterate until it's done. I had an image processing problem and I gave it some sample data, then it iterated (looking at the output image) until it fixed it. It spent at least an hour, but I didn't have to touch it while it worked.


I've taken time today to do this. With some of your suggestions, I am seeing an improvement in it's ability to do some of the grunt work I mentioned. It just saved me an hour refactoring a large protocol implementation into a few files and extracted some common utilities. I can recognise and appreciate how useful that is for me and for most other devs.

At the same time, I think there's limitations to these tools and that I wont ever be able to achieve what I see others saying about 95% of code being AI written or leaving the AI to iterate for an hour. There's just too many weird little pitfalls in our work that the AI just cannot seem to avoid.

It's understandable, I've fallen victim to a few of them too, but I have the benefit of the ability to continuously learn/develop/extrapolate in a way that the LLM cannot. And with how little documentation exists for some of these things (MASQUE proxying for example) anytime the LLM encounters this code it throws a fit, and is unable to contribute meaningfully.

So thanks for your suggestions, it has made Claude better and clearly I was dragging my feet a little. At the very least, it's freed up a some more of my time to work on the complex things Claude can't do.


To be perfectly honest, I've never used a single /command besides /init. That probably means I'm using 1% of the software's capabilities. In frankness, the whole menu of /-commands is intimidating and I don't know where to start.

/commands are like macros or mayyybe aliases. You just put in the commands you see yourself repeating often, like "commit the unstaged files in distinct commits, use xxx style for the commit messages..." - then you can iterate on it if you see any gaps or confusion, even give example commands to use in the different steps.

Skills on the other hand are commands ON STEROIDS. They can be packaged with actual scripts and executables, the PEP723 Python style + uv is super useful.

I have one skill for example that uses Python+Treesitter to check the unit thest quality of a Go project. It does some AST magic to check the code for repetition, stupid things like sleeps and relative timestamps etc. A /command _can_ do it, but it's not as efficient, the scripts for the skill are specifically designed for LLM use and output the result in a hyper-compact form a human could never be arsed to read.


You don't need to do much, the /agent command is the most useful, and it walks you through it. The main thing though is to give the agent something to work with before you create it. That's why I go through the steps of letting Claude analyze different components and document the design/architecture.

The major benefit of agents is that it keeps context clean for the main job. So the agent might have a huge context working through some specific code, but the main process can do something to the effect of "Hey UI library agent, where do I need to put code to change the color of widget xyz", then the agent does all the thinking and can reply with "that's in file 123.js, line 200". The cleaner you keep the main context, the better it works.


Never thought of Agents in that way to be honest. I think I need to try that style =)

> In frankness, the whole menu of /-commands is intimidating and I don't know where to start.

claude-code has a built in plugin that it can use to fetch its own docs! You don't have to ever touch anything yourself, it can add the features to itself, by itself.


This is some great advice. What I would add is to avoid the internal plan mode and just build your own. Built in one creates md files outside the project, gives the files random names and its hard to reference in the future.

It's also hard to steer the plan mode or have it remember some behavior that you want to enforce. It's much better to create a custom command with custom instructions that acts as the plan mode.

My system works like this:

/implement command acts as an orchestrator & plan mode, and it is instructed to launch predefined set of agents based on the problem and have them utilize specific skills. Every time /implement command is initiated, it has to create markdown file inside my own project, and then each subagent is also instructed to update the file when it finished working.

This way, orchestrator can spot that agent misbehaved, and reviewer agent can see what developer agent tried to do and why it was wrong.


> if you did /init in a new code base, that Claude would set itself up to maximize its own understanding of the code.

This is definitely not the case, and the reason anthropic doesnt make claude do this is because its quality degrades massively as you use up its context. So the solution is to let users manage the context themselves in order to minimize the amount that is "wasted" on prep work. Context windows have been increasing quite a bit so I suspect that by 2030 this will no longer be an issue for any but the largest codebases, but for now you need to be strategic.


Are you still talking about Opus 4.5 I’ve been working on a Rust, kotlin and c++ and it’s been doing well. Incredible at C++, like the number of mistakes it doesn’t make

> I still believe those who rave about it are not writing anything I would consider "engineering".

Correct. In fact, this is the entire reason for the disconnect, where it seems like half the people here think LLMs are the best thing ever and the other half are confused about where the value is in these slop generators.

The key difference is (despite everyone calling themselves an SWE nowadays) there's a difference between a "programmer" and an "engineer". Looking at OP, exactly zero of his screenshotted apps are what I would consider "engineering". Literally everything in there has been done over and over to the death. Engineering is.. novel, for lack of a better word.

See also: https://www.seangoedecke.com/pure-and-impure-engineering/


> Engineering is.. novel, for lack of a better word.

Tell that to the guys drawing up the world's 10 millionth cable suspension bridge



I don't think it's that helpful to try to gatekeep the "engineering" term or try to separate it into "pure" and "impure" buckets, implying that one is lesser than the other. It should be enough to just say that AI assisted development is much better at non-novel tasks than it is at novel tasks. Which makes sense: LLMs are trained on existing work, and can't do anything novel because if it was trained on a task, that task is by definition not novel.

Respectfully, it's absolutely important to "gatekeep" a title that has an established definition and certain expectations attached to the title.

OP says, "BUT YOU DON’T KNOW HOW THE CODE WORKS.. No I don’t. I have a vague idea, but you are right - I do not know how the applications are actually assembled." This is not what I would call an engineer. Or a programmer. "Prompter", at best.

And yes, this is absolutely "lesser than", just like a middleman who subcontracts his work to Fiverr (and has no understanding of the actual work) is "lesser than" an actual developer.


That's not the point being made to you. The point is that most people in the "software engineering" space are applying known tools and techniques to problems that are not groundbreaking. Very few are doing theoretical computer science, algorithm design, or whatever you think it is that should be called "engineering."

So the TL;DR here is... If you're in the business of recreating wheels - then you're in luck! We've automated wheel recreation to an acceptable degree of those wheels being true.

Most physical engineers are just applying known techniques all the time too. Most products or bridges or whatever are not solving some heretofore-unsolved problem.

It's how you use the tool that matters. Some people get bitter and try to compare it to top engineers' work on novel things as a strawman so they can go "Hah! Look how it failed!" as they swing a hammer to demonstrate it cannot chop down a tree. Because the tool is so novel and it's use us a lot more abstract than that of an axe, it is taking awhile for some to see its potential, especially if they are remembering models from even six months ago.

Engineering is just problem solving, nobody judges structural engineers for designing structures with another Simpson Strong Tie/No.2 Pine 2x4 combo because that is just another easy (and therefore cheap) way to rapidly get to the desired state. If your client/company want to pay for art, that's great! Most just want the thing done fast and robustly.


I think it's also that the potential is far from being realized yet we're constantly bombarded by braindead marketers trying to convince us that it's the best thing ever already. This is tiring especially when the leadership (not held back by any technical knowledge) believes them.

I'm sure AI will get there, I also think it's not very good yet.


Coding agents as of Jan 2026 are great at what 95% of software engineers do. For remaining 5% that do really novel stuff -- the agents will get there in a few years.

When he said 'just look at what I'v been able to build', I was expecting anything but an 'image converter'

I've had Opus 4.5 hand rolling CUDA kernels and writing a custom event loop on io_uring lately and both were done really well. Need to set up the right feedback loops so it can test its work thoroughly but then it flies.

Yeah I've handed it a naive scalar implementation and said "Make this use SIMD for Mac Silicon / NEON" and it just spits out a working implementation that's 3-6x faster and passes the tests, which are binary exact specifications.

It can do this at the level of a function, and that's -useful-, but like the parent reply to top-level comment, and despite investing the time, using skills & subagents, etc., I haven't gotten it to do well with C++ or Rust projects of sufficient complexity. I'm not going to say they won't some day, but, it's not today.

Anecdotally, we use Opus 4.5 constantly on Zed's code base, which is almost a million lines of Rust code and has over 150K active users, and we use it for basically every task you can think of - new features, bug fixes, refactors, prototypes, you name it. The code base is a complex native GUI with no Web tech anywhere in it.

I'm not talking about "write this function" but rather like implementing the whole feature by writing only English to the agent, over the course of numerous back-and-forth interactions and exhausting multiple 200K-token context windows.

For me personally, definitely at least 99% all of the Rust code I've committed at work since Opus 4.5 came out has been from an agent running that model. I'm reading lots of Rust code (that Opus generated) but I'm essentially no longer writing any of it. If dot-autocomplete (and LLM autocomplete) disappeared from IDE existence, I would not notice.


Woah that's a very interesting claim you made I was shying away from writing Rust as I am not a Rust developer but hearing from your experience looks like claude has gotten very good at writing Rust

Honestly I think the more you can give Claude a type system and effective tests, the more effective it can be. Rust is quite high up on the test strictness front (though I think more could be done...), so it's a great candidate. I also like it's performance on Haskell and Go, both get you pretty great code out of the box.

Have you ever worried that by programming in this way, you are methodically giving Anthropic all the information it needs to copy your product? If there is any real value in what you are doing, what is to stop Anthropic or OpenAI or whomever from essentially one-shotting Zed? What happens when the model providers 10x their costs and also use the information you've so enthusiastically given them to clone your product and use the money that you paid them to squash you?

Zed's entire code base is already open source, so Anthropic has a much more straightforward way to see our code:

https://github.com/zed-industries/zed


That's what things like AWS bedrock are for.

Are you worried about microsoft stealing your codebase from github?


Isn’t it widely assumed Microsoft used private repos for LLM training?

And even with a narrower definition of stealing, Microsoft’s ability to share your code with US government agencies is a common and very legitimate worry in plenty of threat model scenarios.


I just uninstalled Zed today when I realized the reason I couldn't delete a file on Windows because it was open in Zed. So I wouldn't speak too highly of the LLM's ability to write code. I have never seen another editor on Windows make the mistake of opening files without enabling all 3 share modes.

Just based on timing, I am almost 100% sure whatever code is responsible was handwritten before anyone working on Windows was using LLMs...but anyway, thank you for the bug report - I'll pass it along!

The article is arguing that it will basically replace devs. Do you think it can replace you basically one-shotting features/bugs in Zed?

And also - doesn’t that make Zed (and other editors) pointless?


Trying to one-shot large codebases is a exercise in futility. You need to let Claude figure out and document the architecture first, then setup agents for each major part of the project. Doing this keeps the context clean for the main agent, since it doesn't have to go read the code each time. So one agent can fill it's entire context understanding part of the code and then the main agent asks it how to do something and gets a shorter response.

It takes more work than one-shot, but not a lot, and it pays dividends.


Is there a guide for doing that successfully somewhere? I would love to play with this on a large codebase. I would also love to not reinvent the wheel on getting Claude working effectively on a large code base. I don’t even know where to start with, e.g., setting up agents for each part.

> Do you think it can replace you basically one-shotting features/bugs in Zed?

Nobody is one-shotting anything nontrivial in Zed's code base, with Opus 4.5 or any other model.

What about a future model? Literally nobody knows. Forecasts about AI capabilities have had horrendously low accuracy in both directions - e.g. most people underestimated what LLMs would be capable of today, and almost everyone who thought AI would at least be where it is today...instead overestimated and predicted we'd have AGI or even superintelligence by now. I see zero signs of that forecasting accuracy improving. In aggregate, we are atrocious at it.

The only safe bet is that hardware will be faster and cheaper (because the most reliable trend in the history of computing has been that hardware gets faster and cheaper), which will naturally affect the software running on it.

> And also - doesn’t that make Zed (and other editors) pointless?

It means there's now demand for supporting use cases that didn't exist until recently, which comes with the territory of building a product for technologists! :)


Thanx. More of a "faster keyboard" so far then?

And yeah - if I had a crystal ball, I would be on my private island instead of hanging on HN :)


Definitely more than a faster keyboard (e.g. I also ask the model to track down the source of a bug, or questions about the state of the code base after others have changed it, bounce architectural ideas off the model, research, etc.) but also definitely not a replacement for thinking or programming expertise.

I don't know if you've tried Chatgpt-5.2 but I find codex much better for Rust mostly due to the underlying model. You have to do planning and provide context, but 80%+ of the time it's a oneshot for small-to-medium size features in an existing codebase that's fairly complex. I honestly have to say that it's a better programmer than I am, it's just not anywhere near as good a software developer for all of the higher and lower level concerns that are the other 50% of the job.

If you have any opensource examples of your codebase, prompt, and/or output, I would happily learn from it / give advice. I think we're all still figuring it out.

Also this SIMD translation wasn't just a single function - it was multiple functions across a whole region of the codebase dealing with video and frame capture, so pretty substantial.


"I honestly have to say that it's a better programmer than I am, it's just not anywhere near as good a software developer for all of the higher and lower level concerns that are the other 50% of the job."

That's a good way to say it, I totally identify.


Is that a context issue? I wonder if LSP would help there. Though Claude Code should grep the codebase for all necessary context and LSP should in theory only save time, I think there would be a real improvement to outcomes as well.

The bigger a project gets the more context you generally need to understand any particular part. And by default Claude Code doesn't inject context, you need to use 3rd party integrations for that.


I'm a quite senior frontend using React and even I see Sonnet 4.5 struggle with basic things. Today it wrote my Zod validation incorrectly, mixing up versions, then just decided it wasn't working and attempted to replace the entire thing with a different library.

There’s little reason to use sonnet anymore. Haiku for summaries, opus for anything else. Sonnet isn’t a good model by today’s standards.

I have been chastened in the opposite direction by others. I've also subjectively really disliked Opus's speed and I've seen Opus do really silly things too. But I'll try out using it as a daily driver and see if I like it more.

Why do we all of a sudden hold these agents to some unrealistic high bar? Engineers write bugs all the time and write incorrect validations. But we iterate. We read the stacktrace in Sentry and realise what the hell I was thinking when I wrote that, and we fix things. If you're going to benefit from these agents, you'd need to be a bit more patient and point them correctly to your codebase.

My rule of thumb is that if you can clearly describe exactly what you want to another engineer, then you can instruct the agent to do it too.


> Engineers write bugs all the time

Why do we hold calculators to such high bars? Humans make calculation mistakes all the time.

Why do we hold banking software to such high bars? People forget where they put their change all the time.

Etc etc.


I don't hold calculators to high bars. They think 0.1 + 0.2 = 0.30000000000000004:

https://qntm.org/notpointthree


Some of them. The good ones don't.

my unrealistic bar lies somewhere above "pick a new library" bug resolution

I built an open to "game engine" entirely in Lua a many years ago, but relying on many third party libraries that I would bind to with FFI.

I thought I'd revive it, but this time with Vulkan and no third-party dependencies (except for Vulkan)

4.5 Sonet, Opus and Gemini 3.5 flash has helped me write image decoders for dds, png jpg, exr, a wayland window implementation, macOS window implementation, etc.

I find that Gemini 3.5 flash is really good at understanding 3d in general while sonnet might be lacking a little.

All these sota models seem to understand my bespoke Lua framework and the right level of abstraction. For example at the low level you have the generated Vulkan bindings, then after that you have objects around Vulkan types, then finally a high level pipeline builder and whatnot which does not mention Vulkan anywhere.

However with a larger C# codebase at work, they really struggle. My theory is that there are too many files and abstractions so that they cannot understand where to begin looking.


I'll second this. I'm making a fairly basic iOS/Swift app with an accompanying React-based site. I was able to vibe-code the React site (it isn't pretty, but it works and the code is fairly decent). But I've struggled to get the Swift code to be reliable.

Which makes sense. I'm sure there's lots of training data for React/HTML/CSS/etc. but much less with Swift, especially the newer versions.


I had surprising success vibe coding a swift iOS app a while back. Just for fun, since I have a bluetooth OBD2 dongle and an electric truck, I told Claude to make me an app that could connect to the truck using the dongle, read me the VIN, odometer, and state of charge. This was middle of 2025, so before Opus 4.5. It took Claude a few attempts and some feedback on what was failing, but it did eventually make a working app after a couple hours.

Now, was the code quality any good? Beats me, I am not a swift developer. I did it partly as an experiment to see what Claude was currently capable of and partly because I wanted to test the feasibility of setting up a simple passive data logger for my truck.

I'm tempted to take another swing with Opus 4.5 for the science.


I hate "vibe code" as a verb. May I suggest "prompt" instead? "I was able to prompt the React site…."

You aren't prompting the React site, you're prompting the LLM.

> It also can't do rust really well

I have not had this experience at all. It often doesn't get it right on the first pass, yes, but the advantage with Rust vibecoding is that if you give it a rule to "Always run cargo check before you think you're finished" then it will go back and fix whatever it missed on the first pass. What I find particularly valuable is that the compiler forces it to handle all cases like match arms or errors. I find that it often misses edge cases when writing typescript, and I believe that the relative leniency of the typescript compiler is why.

In a similar vein, it is quite good at writing macros (or at least, quite good given how difficult this otherwise is). You often have to cajole it into not hardcoding features into the macro, but since macros resolve at compile time they're quite well-suited for an LLM workflow as most potential bugs will be apparent before the user needs to test. I also think that the biggest hurdle of writing macros to humans is the cryptic compiler errors, but I can imagine that since LLMs have a lot of information about compilers and syntax parsing in their training corpus, they have an easier time with this than the median programmer. I'm sure an actual compiler engineer would be far better than the LLM, but I am not that guy (nor can I afford one) so I'm quite happy to use LLMs here.

For context, I am purely a webdev. I can't speak for how well LLMs fare at anything other than writing SQL, hooking up to REST APIs, React frontend, and macros. With the exception of macros, these are all problems that have been solved a million times thus are more boilerplate than novelty, so I think it is entirely plausible that they're very poor for different domains of programming despite my experiences with them.


i've also been using opus 4.5 with lots of heavy rust development. i don't "vibe code", but lead it with a relatively firm hand- and it produces pretty good results in surprisingly complicated tasks.

for example, one of our public repos works with rust compiler artifacts and cache restoration (https://github.com/attunehq/hurry); if you look at the history you can see it do some pretty surprisingly complex (and well made, for an LLM) changes. its code isn't necessarily what i would always write, or the best way to solve the problem, but it's usually perfectly serviceable if you give it enough context and guidance.


Have you experimented with all of these things on the latest models (e.g. Opus 4.5) since Nov 2025? They are significantly better at coding than earlier models.

Yes, December 2025 and January 2026.

I've had pretty good luck with LLM agents coding C. In this case a C compiler that supports a subset of C and targets a customizable microcoded state machine/processor. Then I had Gemini code up a simulator/debugger for the target machine in C++ and it did it in short order and quite successfully - lets you single step through the microcode and examine inputs (and set inputs), outputs & current state - did that in an afternoon and the resulting C++ code looks pretty decent.

That's remarkably similar to something I've just started on - I want to create a self-compiling C compiler targeting (and to run on) an 8-bit micro via a custom VM. This a basically a retro-computing hobby project.

I've worked with Gemini Fast on the web to help design the VM ISA, then next steps will be to have some AI (maybe Gemini CLI - currently free) write an assembler, disassembler and interpreter for the ISA, and then the recursive descent compiler (written in C) too.

I already had Gemini 3.0 Fast write me a precedence climbing expression parser as a more efficient drop-in replacement for a recursive descent one, although I had it do that in C++ as a proof-of-concept since I don't know yet what C libraries I want to build and use (arena allocator, etc). This involved a lot of copy-paste between Gemini output and an online C++ dev environment (OnlineGDB), but that was not too bad, although Gemini CLI would have avoided that. Too bad that Gemini web only has "code interpreter" support for Python, not C and/or C++.

Using Gemini to help define the ISA was an interesting process. It had useful input in a "pair-design" process, working on various parts of the ISA, but then failed to bring all the ideas together into a single ISA document, repeatedly missing parts of what had been previously discussed until I gave up and did that manually. The default persona of Gemini seems not very well suited to this type of work flow where you want to direct what to do next, since it seems they've RL'd the heck out of it to want to suggest next step and ask questions rather than do what is asked and wait for further instruction. I eventually had to keep asking it to "please answer then stop", and interestingly quality of the "conversation" seemed to fall apart after that (perhaps because Gemini was now predicting/generating a more adversarial conversation than a collaborative one?).

I'm wondering/hoping that Gemini CLI might be better at working on documentation than Gemini web, since then the doc can be an actual file it is editing, and it can use it's edit tool for that, as opposed to hoping that Gemini web can assemble chunks of context (various parts of the ISA discussion) into a single document.


Just as a self follow-up here (I hate to do it!), after chatting to Gemini some more more about document creation alternatives, it does seem that Gemini CLI is by far the best way to go, since it's working in similar fashion to Claude Code and making targeted edits (string replacements) to files, rather than regenerating from scratch (unless it has misinterpreted something you said as a request to do that, which would be obvious when it showed you the suggested diff).

Another alternative (not recommended due to potential for "drift") is to use Gemini's Canvas capability where it is working on a document rather than a specification being spread out over Chat, but this document is fully regenerated for every update (unlike Claude's artifacts), so there is potential for it to summarize or drop sections of the document ("drift") rather than just making requested changes. Canvas also doesn't have Artifact's versioning to allow you to go back to undo unwanted drifts/changes.


Yeah, the online Gemini app is not good for long lived conversations that build up a body of decisions. The context window gets too large and things drop.

What I’ve learned is that once you reach that point you’ve got to break that problem down into smaller pieces that the AI can work productively with.

If you’re about to start with Gemini-cli I recommend you look up https://github.com/github/spec-kit. It’s a project out of Microsoft/Github that encodes a rigorous spec-then-implement multi pass workflow. It gets the AI to produce specs, double check the specs for holes and ambiguity, plan out implementation, translate that into small tasks, then check them off as it goes. I don’t use spec-kit all the time, but it taught me that what explicit multi pass prompting can do when the context is held in files on disk, often markdown that I can go in and change as needed. I think it ask basically comes down to enforcing enough structure in the form of codified processes, self checks and/or tests for your code.

Pro tip, tell spec-kit to do TDD in your constitution and the tests will keep it on the rails as you progress. I suspect “vibe coding” can get a bad rap due to lack of testing. With AI coding I think test coverage gets more important.


Thanks for the spec-kit recommendation - I'll give it a try!

I've found it to be pretty hit-or-miss with C++ in general, but it's really, REALLY bad at 3D graphics code. I've tried to use it to port an OpenGL project to SDL3_GPU, and it really struggled. It would confidently insist that the code it wrote worked, when all you had to do was run it and look at the output to see a blank screen.

I hope I’m not committing a faux pas by saying this—and please feel free to tell me that I’m wrong—but I imagine a human who has been blind since birth would also struggle to build 3D graphics code.

The Claude models are technically multi-modal, but IME the vision side of the equation is really lacking. As a result, Claude is quite good at reasoning about logic, and it can build e.g. simpler web pages where the underlying html structure is enough to work with, but it’s much worse at tasks that inherently require seeing.


Yea, for obvious reasons, it seems to be best at code that transforms data: text/binary input to text/binary output. And where the logic can be tracked and verified at runtime with sufficient (text) logging. In other words, it's much better close loop than open loop. I tried to help it by prompting it to please take a screen capture of its output to verify functionality, but it seems LLMs aren't quite ready for that yet.

They work much better off a test that must pass. That they can “see”. Without it they are just making up some other acceptance criteria.

> It also can't do Rust really well, once you get to the meat of it. Not sure why that is

Because types are proofs and require global correctness, you can't just iterate, fix things locally, and wait until it breaks somewhere else that you also have to fix locally.


I have not tried C++, but Codex did a good job with low-level C code, shaders as well as porting 32 bit to 64 bit assembly drawing routines. I have also tried it with retro-computing programming with relative success.

> Mobile

From what I've seen, CC has troubles with the latest Swift too, partially because of it being latest and partially because it's so convoluted nowadays.

But it's übercharged™ for C#


Both Codex/Claude Code are terrible with C++. Not sure why that is, but they just spit out nonsense that creates more work than it helps me.

Have you tried to do any OpenGL or Vulkan work with it? Very frustrating.

React and HTML, though, pretty awesome.


On the other hand, I've been using Claude Code for the past several months at work in several C++ projects. It's been fine at understanding C++. It just generates a lot of boilerplate, doesn't follow DRY, and gets persnickety with tests.

I've started adding this to all of my new conversations and it seems to help:

    You are a principal software engineer. I report to you. Do not modify files. Do not write prose. Only provide observations and suggestions so that I can learn from you.
My question to the LLM then follows in the next paragraph. Foregoing most of the LLM's code-writing capabilities in favor of giving observations and ideas seems to be a much better choice for productivity. It can still lead me down rabbit holes or wrong directions, but at least I don't have to deal with 10 pages of prose in its output or 50 pages of ineffectual code.

Yeah, it's a decent rubber duck.

As soon as it starts trying to write actual code or generate a bunch of files it's less than helpful very quickly.

Perhaps I haven't tried enough, but I'm entirely unsold on this for anything lower level.


Gemini & ChatGPT have not done well at writing or analyzing OpenGL like rendering code for me, as well. And for many algorithms, it's not good at explaining them as well. And for some of the classical algorithms, like cascading shadow mapping, even articles written by people and example source code that I found is wrong or incomplete.

Learning "the old ways" is certainly valuable, because the AIs and the resources available are bad at these old ways.


Which models?

It's possible Opus 4.5 and GPT-5.2 are significantly less terrible with C++ than previous models. Those only came out within the past 2 months.

They also have significantly more recent knowledge cut-off dates.


I'll be specific:

I've been recently working with Opus 4.5 and GPT-5.2. Both have been unable to migrate a project from using ARB shaders to 3.3 and GLSL. And I don't mean migrating the shaders themselves, just changing all the boring glue code that tells the application to use GLSL and manage those instead of feeding the ARB shaders directly.

They have also failed spectacularly at implementing this paper: https://www.cse.chalmers.se/~uffe/soft_gfxhw2003.pdf

No matter how I sliced it, I could not get a simple cube to have the shadows as described in the paper.

I've also recently tried to get Opus 4.5 to move the Job system from Doom 3 BFG to the original codebase. Clean clone of dhewm3, pointed Opus to the BFG Job system codebase, and explained how it works. I have also fed it the Fabien Sanglard code review of the job system: https://fabiensanglard.net/doom3_bfg/threading.php

As well as the official notes that explain the engine differences: https://fabiensanglard.net/doom3_documentation/DOOM-3-BFG-Te...

I did that because, well, I had ported this job system before and knew it was something pretty "pluggable" and could be implemented by an LLM. Both have failed. I'm yet to find a model that does this.


It's funny, I've also been trying to use AI to implement (simpler) shadow mapping code and it has failed. I eventually formed a very solid understanding of the problem domain myself and achieved my goals with hand written code.

I might try to implement this paper, great find! I love this 2000-2010 stuff



Perfect. Now I can continue to be confused! Beginner mindset!

Thanks, that's very specific! Sounds like that's out of reach of the current generation of models.

Will be interesting to see if models in six months time can handle this, since they clearly can't do it today.


In what scenarios are they terrible? I hope not every scenario. I've found Codex adequate for refactoring and unit tests. I've not used it in anger to write any significant new code.

I suppose part of the problem is that training a model on publicly available C++ isn't going to be great because syntactically broken code gets posted to the web all the time, along with suboptimal solutions. I recall a talk saying that functional languages are better for agents because the code published publicly is formally correct.


I use ChatGPT with C++ but in very limited manner. So far it was overall win. I watch the code very closely of course and usually end up doing few iterations (mostly optimizing for speed, reliability, concurrency).

Also to generate boilerplate / repetitive.

Overall I consider it a win.


I use Claude to generate C++ 23, it usually performs well. It takes a bit of nudging to avoid repeating itself, reusing existing functionality, not altering huge portions without running tests, etc. But generally it is helpful and knows what to do.

I had the same experience. C++ doesn't even compile or I have to tell it all the time "use C++23 features". I tried to learn OpenGL with it. This worked out a bit, since I had to spot the errors :D

Same here. C++ changes fast and can be written in many styles so not a ton of training data I assume.

> Rust needs a mechanism to recover gracefully from OOM errors.

Linus also brought this up: https://lkml.org/lkml/2021/4/14/1099


None of Rust's language features allocate; not arrays, not closures, not iterators. Everything is stack-allocated by default. Rust code doesn't malloc unless you use a library that calls malloc; in other words, exactly like C. Rust fully supports turning off the parts of the standard library that perform allocation, and this is precisely what embedded code and the Linux kernel does. So Rust already gives you full control.

well, as an example, Vec::push doesn't have a way to report allocation failure. it will just panic, which is not acceptable in the kernel.

Sure, which is a perfectly acceptable default considering that most code is not in a position to observe allocation failures (because of OS-level overcommit, which is nearly always a good thing), and furthermore most code is not in a position to do anything other than crash in an OOM scenario. If you still want to have control over this without going full no_std, Rust has Vec::try_reserve to grow a Vec while checking for allocation errors, as well as several nightly functions coming down the pipe in the same vein (e.g. Vec::try_with_capacity, Vec::push_within_capacity).

Talking as a long time C++ programmer. I really don't get this mind set.

First off allocation failure (typically indicated by bad_alloc exception in C++ code, or nullptr in C style code) does not mean that the system (or even the process) as a whole is out of memory.

It just means that this particular allocator could not satisfy the allocation request. The allocator could have "ulimit" or such limit that is completely independent from actual process/system limitations.

Secondarily what reason is there to make an allocation failure any different than any other resource allocation failure?

A normal structure for a program is to catch these exceptions at a higher level in the stack close to some logical entry point, such as thread entry point, UI action handler etc. where they can be understood and possibly shown to the user or logged or whatever. It shouldn't really matter if the failure is about failing to allocate socket or failing to allocate memory.

You could make the case that if the system is out of memory the exception propagation itself is going to fail. Maybe..but IMHO on the code path that is taken when stack is unwound due to exception you should only release resources not allocate more anyway.


In rust you could use multiple allocators at the same time. Allocation failure handled by allocator, converting panic to some useful behavior. This logic is observable in WASM, as there are OOMs all the time, which handled transparently to application code

So I assume there is no real blockers as people in this tread assume, this is just not a conventional behavior, ad hoc, so we need to wait and well defined stable OOM handlers will appear


>does not mean that the system is out of memory. >"The allocator could have "ulimit" or such limit that is completely independent from actual process/system limitations."

Are we playing word games here? If a process has a set amount of memory, and it's out of it, then that process is OOM, if a VM is out of memory, it's OOM. Yes, OOM is typically used for OS OOM, and Linus is talking about rust in the kernel, so that's what OOM would mean.

>Secondarily what reason is there to make an allocation failure any different than any other resource allocation failure.

Of course there is, would you treat being out of bread similar to being out of oxygen? Again this can be explained by the context being kernel development and not application development.


"Are we playing word games here? If a process has a set amount of memory, and it's out of it, then that process is OOM, if a VM is out of memory, it's OOM. Yes, OOM is typically used for OS OOM, and Linus is talking about rust in the kernel, so that's what OOM would mean."

As I just explained an allocator can have its own limits.

A process can have multiple allocators. There's no direct logical step that says that because some allocator some failed some allocation, the process itself cannot allocate more ever.

"Of course there is, would you treat being out of bread similar to being out of oxygen? Again this can be explained by the context being kernel development and not application development."

The parent comment is talking about over commitment and OOM as if these are situations that are completely out of the programs control. They aren't.


> Are we playing word games here?

No. A single process can have several allocators, switch between them, or use temporary low limits to enforce some kind of safety. None of that has any relation to your system running out of memory.

You won't see any of that in a desktop or a server. In fact, I haven't seen people even discuss that in decades. But it exists, and there are real reasons to use it.


I am not well-versed in this area but have a doubt - when the OS sends a SIGKILL to a process because it has run of memory for it how can the program catch that before it is killed and deal with it "gracefully"? Does C provide any mechanism to deal with such scenario?

There are several levels here.

In your C++ (or C) program you have one (or more) allocators. These are just pieces of code that juggle blocks of memory into smaller chunks for the program to use. Typically the allocators get their memory from the OS in pages using some OS system call such as sbrk or mmap.

For the sake of argument, let's say I write an allocator that has a limit of 2MiB, while my system has 64Gib or RAM. The allocator can then fail some request when it's internal 2MiB has been exhausted. In C world it'd return a nullptr. In C++ world it would normally throw bad_alloc.

If this happens does this mean the process is out of memory? Or the system is out of memory? No, it doesn't.

That being said where things get murky is because there are allocators that in the absence of limits will just map more and more pages from the OS. The OS can "overcommit" which is to say it gives out more pages than can actually fit into the available physical memory (after taking into account what the OS itself uses etc). And when the overall system memory demand grows too high it will just kill some arbitrary process. On Linux this is the infamous OOM killer that uses the "niceness" score to determine what to kill.

And yes, for the OOM killer there's very little you can do.

But an allocation failure (nullptr or bad_alloc) does not mean OOM condition is happening in the system.


None of that matters: what is your application going to do if it tries to allocate 3mb of data from your 2mb allocator?

This is the far more meaningful part of the original comment:

> and furthermore most code is not in a position to do anything other than crash in an OOM scenario

Given that (unlike a language such as Zig) Rust doesn’t use a variety of different allocator types within a given system, choosing to reliably panic with a reasonable message and stack/trace is a very reasonable mindset to have.


Since we're talking about SQLite, by far the most memory it allocates is for the page cache.

If some allocation fails, the error bubbles up until a safe place, where some pages can be dropped from the cache, and the operation that failed can be tried again.

All this requires is that bubbling up this specific error condition doesn't allocate. Which SQLite purportedly tests.

I'll note that this is not entirely dissimilar to a system where an allocation that can't be immediately satisfied triggers a full garbage collection cycle before an OOM is raised (and where some data might be held through soft/weak pointers and dropped under pressure), just implemented in library code.


Sure, and this is completely sensible to do in a library.

But that’s not the point: what can most applications do when SQLite tells them that it encountered a memory error and couldn’t complete the transaction?

Abort and report an error to the user. In a CLI this would be a panic/abort, and in a service that would usually be implemented as a panic handler (which also catches other errors) that attempts to return an error response.

In this context, who cares if it’s an OOM error or another fatal exception? The outcome is the same.

Of course that’s not universal, but it covers 99% of use cases.


The topic is whether Rust should be used to re-implement SQLite.

If SQLite fails to allocate memory for a string or blob, it bubbles up the error, frees some data, and maybe tries again.

Your app may be "hopeless" if the error bubbles up all the way to it, that's your choice, but SQLite may have already handled the error internally, retried, and given your answer without you noticing.

Or it may at least have rolled back your transaction cleanly, instead of immediately crashing at the point of the failed allocation. And although crashing should not corrupt your database, a clean rollback is much faster to recover from, even if your app then decides to crash.

Your app, e.g. an HTTP server, might decide to drop the request, maybe close that SQLite connection, and stay alive to handle other ongoing and new requests.

SQLite wants to be programmed in a language were a failed allocation doesn't crash, and unlike most other code, SQLite is actually tested for how it behaves when malloc fails.


In C++ it will throw an exception which you can catch, and then gracefully report that the operation exceeded limits and/or perform some fallback.

Historically, a lot of C code fails to handle memory allocation failure properly because checking malloc etc for null result is too much work — C code tends to calm that a lot.

Bjarne Stroustrup added exceptions to C++ in part so that you could write programs that easily recover when memory allocation fails - that was the original motivation for exceptions.

In this one way, rust is a step backwards towards C. I hope that rust comes up with a better story around this, because in some applications it does matter.


I may be getting SIGKILL and SIGABORT mixed up, but one of them is not sent to the process, rather it's sent to the OS.

If it were any other way then processes could ignore signals and just make themselves permanent, like Maduro or Putin.


> most code is not in a position to do anything other than crash in an OOM scenario.

That's intentional; IOW the "most code" that is unable to handle OOM conditions are written that way.

You can write code that handles OOM conditions gracefully, but that way of writing code is the default only in C. In every other language you need to go off the beaten path to gracefully handle OOM conditions.


Handling OOM gracefully - i.e. doing anything other than immediately crashing and/or invoking undefined behaviour - is absolutely not the default in C.

It's possible. But very very few projects do.


I know one C++ library that caches data but never evicts. Instead, the library author expects you to restart your app every 24 hours.

> I know one C++ library that caches data but never evicts. Instead, the library author expects you to restart your app every 24 hours.

It may not be as simple as "that's our policy". I worked at one place (embedded C++ code, 2018) that simply reset the device every 24h because they never managed to track down all the leaks.

Finding memory leaks in C++ is a non-trivial and time-consuming task. It gets easier if your project doesn't use exceptions, but it's still very difficult.


Use Valgrind? Or are we talking projects that have become far too big for their own good, cause leaks aren't hard at all to find with the right tools and a bit of profiling... now crossing thread boundaries and weird dynamic programming tricks maybe, but thats a very different case and not really refecting on C++ itself, would likely trip up a GC lang as well.

> Use Valgrind?

Was not available for that specific device, but even with Valgrind and similar tools, you are still going to run into weird destructor issues with inheritance.

There are many possible combinations of virtual, non-virtual, base-class, derived-class, constructors and destructors; some of them will indeed cause a memory leak, and are allowed to by the standard.


> even with Valgrind and similar tools, you are still going to run into weird destructor issues with inheritance.

I love these folklore comments. Post an example.


In my experience that is usually the result of years and years of accumulation of shit code. The results is thousands of leaks. That makes detection of incremental leaks much more difficult. If you start with clean code and use ASAN or Valgrind then leak detection is not difficult.

> Handling OOM gracefully - i.e. doing anything other than immediately crashing and/or invoking undefined behaviour - is absolutely not the default in C.

What are you talking about? Every allocation must be checked at the point of allocation, which is "the default"

If you write non-idiomatically, then sure, in other languages you can jump through a couple of hoops and check every allocation, but that's not the default.

The default in C is to return an error when allocation fails.

The default in C++, Rust, etc is to throw an exception. The idiomatic way in C++, etc is to not handle that exception.


> Every allocation must be checked at the point of allocation, which is "the default"

C doesn't force you to check the allocation at all. The default behavior is to simply invoke undefined behavior the first time you use the returned allocation if it failed.

In practice I've found most people write their own wrappers around malloc that at least crash - for example: https://docs.gtk.org/glib/memory.html

PS. The current default in rust to print something and then abort the program, not panic (i.e. not throw an exception). Though the standard library reserves the right to change that to a panic in the future.


> C doesn't force you to check the allocation at all.

No one ever claimed it did; I said, and still do, that the in C, at any rate, the default is to check the returned value from memory allocations.

And, that is true.

The default in other language is not to recover.


> > C doesn't force you to check the allocation at all.

> No one ever claimed it did;

You specifically said

> Every allocation must be checked at the point of allocation

...

> the default is to check the returned value from memory allocations.

Default has a meaning, and it's what happens if you don't explicitly choose to do something else.

In libc - this is to invoke undefined behavior if the user uses the allocation.

In glib - the library that underpins half the linux desktop - this is to crash. This is an approach I've seen elsewhere as well to the point where I'm comfortable calling it "default" in the sense that people change their default behavior to it.

Nowhere that I've ever seen, in C, is it to make the user handle the error. I assume there are projects with santizers that do do that, I haven't worked on them, and they certainly don't make up the majority.


> Default has a meaning, and it's what happens if you don't explicitly choose to do something else.

It also has the meaning of doing the common thing: https://www.merriam-webster.com/dictionary/default

> : a selection made usually automatically or without active consideration

See that "without active consideration" there? The default usage of malloc includes, whether you want to acknowledge it or not, checking the returned value.

C doesn't have anything done automatically, so I am wondering why you would choose to think that by "default" one would mean that something automatically gets done.


I'm not saying "automatic", I'm including "sanitizer retursn an error" as default - that's not what happens in C (or at least any C project I've worked on). You have to actively remember and choose to check the error code. Of course things do happen automatically all the time in C, like bumping the stack pointer (another case of unhandled OOM) and decrementing it after the fact. And pushing return addresses - and returning at the end of functions. And so on.

"Ubiquitous" is a different word than default, checking the return code of malloc isn't even that. As an example - I've been having some issues with pipewire recently (unrelated) and happen to know it uses an unwrapped malloc. And it doesn't reliably check the return code. For example: https://github.com/PipeWire/pipewire/blob/6ed964546586e809f7...

And again, this isn't cherry picked, this is just "the last popular open source C code base I've looked at". This is the common case in C. Either you wrap malloc to crash, or you just accept undefined behavior if malloc fails. It is the rare project that doesn't do one of those two.


> I'm not saying "automatic", I'm including "sanitizer retursn an error" as default - that's not what happens in C (or at least any C project I've worked on). You have to actively remember and choose to check the error code.

Right. But this is what you initially responded to:

> You can write code that handles OOM conditions gracefully, but that way of writing code is the default only in C.

How did you get from "That way" to thinking I claimed that C, by default, handles allocation failures?

> As an example - I've been having some issues with pipewire recently (unrelated) and happen to know it uses an unwrapped malloc.

Correct. That does not mean that the default way of writing allocation in C is anything other than what I said.

Do programmers make mistakes? Sure. But that's not what asked - what was asked is how do you handle memory errors gracefully, and I pointed out that, in idiomatic C, handling memory errors gracefully is the default way of handling memory errors.

That is not the case for other languages.


> How did you get from "That way" to thinking I claimed that C, by default, handles allocation failures?

I think you might want to reread the line you quoted directly above this,

That way of writing code, i.e. "write[ing] code that handles OOM conditions gracefully" "is the default [...] in C".

This is what I am saying is not the case. The default in C is undefined behavior (libc) or crashing (a significant fraction of projects allocator wrappers). Not "handling OOM gracefully" - i.e. handling OOM errors.


> I think you might want to reread the line you quoted directly above this,

I am reading exactly what I said:

> You can write code that handles OOM conditions gracefully, but that way of writing code is the default only in C.

How is it possible to read that as anything other than "That Way Of Writing Code Is The Default Way In C"?

Are you saying that checking the result of malloc (and others) is not the default way of allocating memory?


> Are you saying that checking the result of malloc (and others) is not the default way of allocating memory?

In C - yes. I've said that repeatedly now...


>> Are you saying that checking the result of malloc (and others) is not the default way of allocating memory?

> In C - yes. I've said that repeatedly now...

Well, that's just not true. The instances of unchecked allocations are both few and far between, *and* treated as bugs when reported :-/

Maybe you should program in a language for a little bit before forming an opinion on it :-/


I have programmed in C plenty. Your assertion that unchecked allocations are few and far between is simply entirely incorrect. That they are treated as bugs when reported is incorrect in most C software.

For good reason. Most C software is not designed to run in a situation where malloc might fail.

I, unlike you, have provided evidence of this by pointing to major pieces of the linux desktop that do not do so.


From the parent comment:

because of OS-level overcommit, which is nearly always a good thing

It doesn't matter about the language you are writing in, because your OS can tell you that the allocation succeeded, but when you come to use it, only then do you find out that the memory isn't there.


Of course it matters, because you (the system admin) can tell your OS not to do that. Which is only helpful if your app knows how to handle the case. Most don't, so overcommit, in general, makes sense.

You can't really on linux. There's no way to do sparse allocations then because when you turn off overcommit MAP_NORESERVE still reserves memory...

It's a place where windows legitimately is better than linux.


> You can't really on linux. There's no way to do sparse allocations then because when you turn off overcommit MAP_NORESERVE still reserves memory...

Sure, but ... what does that have to do with this thread? Using `mmap` is not the same as using `malloc` and friends.

If you turn off overcommit, malloc will return NULL on failure to allocate. If you specifically request mmap to ignore overcommit, and it does, why are you surprised?


> If you specifically request mmap to ignore overcommit, and it does, why are you surprised?

You misunderstand, you specifically request mmap to ignore overcommit, and it doesn't, not does.

What it has to do with this thread is it makes turning off overcommit on linux an exceptionally unpalatable option because it makes a lot of correct software incorrect in an unfixable manner.


Zig puts OOM handling much more front and center than C. In C, you can handle OOM but it's easy to ignore NULL checks on mallocs &co because they almost never happen.

In Zig you must handle it. Even if handling means "don't care, panic", you have to spell that out.


It's also really ergonomic with `errdefer` and `try`.

> That's intentional; IOW the "most code" that is unable to handle OOM conditions are written that way.

No, this is wishful thinking. While plenty of programs out the are in the business of maintaining caches that could be optimistically evicted in order to proceed in low-memory situations, the vast majority of programs are not caching anything. If they're out of memory, they just can't proceed.


I used to think this way many years ago, then I saw my own code in production hit OOM errors and manage to recover, and even log what was happening so I could read about it later.

After those experiences I agree with the sibling comment that calls your position "bullshit". I think people come to your conclusion when they haven't experienced a system that can handle it, so they're biased to think it's impossible to do. Since being able to handle it is not the default in so many languages and one very prominent OS, fewer people understand it is possible.


I think this is bullshit. If you are running out of memory, you can, for example, choose to stop accepting more work ("backpressure"). I am always advocating for Rust, but this is one thing I really disagree on. I think Zig gets this right.

Now you need to ensure that your entire error path does not allocate or you have to deal with allocation errors in the error path as well.

Trying to apply backpressure from memory allocation failures which can appear anywhere completely disconnected from their source rather than capping the current in memory set seems like an incredibly hard path to make work reliably.


In Zig's case, the entire stdlib never allocates on failure, and most libraries follow the same pattern. The philosophy of Zig is allocation/creation can fail, but freeing/destroying must never fail. It's caused me to be really thoughtful with how I design my data structures, and often made me use better ways of representing metadata.

How do you log or tell the world about the state of the program without allocating?

Well if you've hit OOM, you're kinda screwed anyways. But, if you allocate a ring buffer at the beginning, you can always do a best attempt write.

Why screwed? It could just be that there is more load than your application can handle. Why should it necessarily crash because of that?

Maybe screwed was too strong of a term. In the scenario above, they wanted to log on resource cleanup, but that makes resource cleanup potentially fallible. The Zig philosophy is that cleanup must never fail, so having cleanup be fallible goes against that.

I was suggesting (though in retrospect not clearly) that logging should use a ring buffer instead of allocation, in order to make logging on cleanup a guaranteed best effort operation. You're right that you can recover from OOM, but logging OOM with an allocator is pretty self-defeating.


You need to apply backpressure before you hit memory limits, not after.

If you’re OOM your application is in a pretty unrecoverable state. Theoretically possible, practically not.


If you allocate a relatively big chunk of memory for each unit of work, and at some point your allocation fails, you can just drop that unit of work. What is not practical?

I think in that case overcommit will happily say the allocation worked. Unless you also zero the entire chunk of memory and then get OOM killed on the write.

I suppose you can try to reliable target "seriously wild allocation fails" without leaving too much memory on the table.

   0: Heuristic overcommit handling. Obvious overcommits of
      address space are refused. Used for a typical system. It
      ensures a seriously wild allocation fails while allowing
      overcommit to reduce swap usage.  root is allowed to 
      allocate slightly more memory in this mode. This is the 
   default.
https://www.kernel.org/doc/Documentation/vm/overcommit-accou...

Running in an environent without overcommit would allow you to handle it gracefully though, although bringing its own zoo of nasty footguns.

See this recent discussion on what can happen when turning off overcommit:

https://news.ycombinator.com/item?id=46300411


> See this recent discussion on what can happen when turning off overcommit:

What are you referring to specifically? Overcommit is only (presumably) useful if you are using Linux as a desktop OS.


Good grief all kinds of ways. Practically all the same countless possible paths as those that require allocating.

You don't have to allocate to print to stdout if that's what you're asking.

And then whatever thing that is collecting and forwarding(if applicable) the logs needs to be entirely allocation free?

It just needs to have whatever memory it needs statically allocated.

> Now you need to ensure that your entire error path does not allocate or you have to deal with allocation errors in the error path as well

And that is usually not too difficult in C (in my experience), where allocation is explicit.

In C++, on the other hand, this quickly gets hairy IMO.


That's why you don't use std in this case. You limit yourself to things defined in core.

That seems like an enormous question. Is anyone working on it?

There's experimental/nightly support for things like: `push_within_capacity()` which is a more manual way (user-space code would have to handle the return code and then increase the capacity manually if required) of trying to handle that situation.

And of course the kernel - which doesn't even use Rust's Vec but has its own entire allocator library because it is the kernel - likewise provides

https://rust.docs.kernel.org/next/kernel/alloc/kvec/struct.V...

Vec::push_within_capacity is a nice API to confront the reality of running out of memory. "Clever" ideas that don't actually work are obviously ineffective once we see this API. We need to do something with this T, we can't just say "Somebody else should ensure I have room to store it" because it's too late now. Here's your T back, there was no more space.


Yeah, it's not ideal that the standard library currently has blind spots around allocation and I hope those get resolved eventually, but if you're rolling everything from scratch in C anyway then it's not much of a stretch to likewise roll your own Rust primitives in places where the included batteries don't work for you.

Yep, the “split” between core and std is brilliant. It enables so many usecases: one example I ran into recently is compiling Rust (core) to eBPF.

you understand that stack allocation can OOM too?

Can C gracefully recover from running out of stack space?

Depends on your definition of graceful but the C standard doesn't preclude handling it and there's POSIX interfaces such as sigaltstack / sigsetjmp etc that fit and indeed some code like language runtimes use this to react to stack exhaustion (having first set up guard pages etc).

Or in short: no, C is no better than Rust at gracefully recovering from stack overflow, either in theory or in practice.

Great to know!

Off topic: This CSS improves the usability of that page:

    ul.threadlist li:hover > a {
        color: red;
    }

    ul.threadlist li.origin > a {
        display: block;
        background: rgb(205, 216, 216);
        font-weight: normal;
        padding: 1px 6px;
        margin: 1px -6px;
    }

Should be:

    ul.threadlist li.origin a {
      display: block;
      background: rgb(205, 216, 216);
      font-weight: normal;
      padding: 1px 6px;
      margin: 1px -6px;
    }

Good point.

He's not wrong, but needs to clean own house a bit.

Thank you for Zynthian, this is awesome. Any blogs or RSS feeds you track to get in the loop of similar projects?

PS - don’t discount Monome, it has a very different flavor than Zynthian and is in many ways more of a musical instrument. It’s definitely worth investigation if you’re a synth nerd.

There are plenty, but CDM is my favourite:

https://cdm.link/

See also, matrixsynth:

https://www.matrixsynth.com/

They both often have the same content, but CDM has a nicer vibe that I feel more comfortable suggesting ..


Maybe this is worth something to people involved with using SWF files, but the Doom 3 BFG codebase has an entire SWF parser/player included, which they used for the game UI: https://github.com/klaussilveira/chocolate-doom3-bfg/tree/ma...

You could call it Lightweight Scaleform. This same codebase was used in RAGE.


Custom Flash players were actually relatively common in game development during the mid to late 2000s, as Flash provided a ready-to-go authoring solution for UI and 2D animation that artists were already familiar with. Autodesk's Scaleform was probably the most popular implementation but a number of AAA developers had their own in-house libraries similar to Doom 3's; some of them, such as Konami's "AFP" [1], are still in use to this day (the latest game to use it, Sound Voltex Nabla, was released last month).

[1]: https://github.com/DragonMinded/bemaniutils/blob/trunk/beman...


Interesting - why did they chose to rebuild the menu system of Doom 3 OG?

> declarations for global variables

That's huge. I wish LuaJIT adopted this, or at least added a compile time flag to enable it.


Yeah, I wish someone would pick up LuaJIT development. From what I've heard it in practice isn't developed anymore and is stuck at Lua 5.1 still.


Not true. It's getting a constant stream of bugfixes. It's also not "stuck" on Lua 5.1, but is deliberately not following Lua's path, except for some backports. There's also a recent post about how a LuaJIT 3 might work.


Where is that post?


https://www.freelists.org/post/luajit/Question-about-LuaJIT-...

Warning: Ridiculous cookie consent banner, needs dozens of clicks to opt out.


This cookie consent banner is handled in 0 clicks thanks to Consent-O-Matic Firefox extension


OK, then I got some wrong info. If it's stuck at it deliberately, then it's worse. May be someone should fork it and bring it up to date with recent Lua versions. Why is this split needed?


My understanding is that there was a language fork after 5.1. One thing was a complete reworking of how math works. It used to be just floating point for everything but the new idea was to make it like Python 3. So most operations are float/integer with some weird exceptions.

As with any language fork there will be some who stay and others who switch to the new thing. Often a fork will drive people away from a particular language as in my case.


Lua's nature as a primarily embedded language means backwards compatibility is not guaranteed between any version. If 5.2 was a language fork then so was 5.3, 5.4, 5.5, etc. (5.2 did have some more significant changes though)

For that reason luajit staying at ~5.1 actually works in its favor. Rather than trying to follow the moving target of the newest version, it gives a robust focal point for the lua ecosystem, while modern versions can be developed and continue to serve their purpose in embedded systems and whatnot where appropriate.


I don't see a reason not to update LuaJIT still. Changes in Lua aren't just version numbers, it should be improving something, meaning that would be missing in LuaJIT.


Isn't it a bit naive to declare that, just because Lua created a new minor version, it should be somehow better? The author of LuaJIT has often written his arguments, including why he disagrees with the changes to the language, why they should have been implemented as libraries instead, that in his view LuaJIT is still more performant and more widely used than PUC Lua, and more.

As for forking, you can try, but I would warn you that one does not simply fork LuaJIT. Required is deep expertise in tracing JIT compilers, in assembly and in many different computer architectures. Nobody was really up to the task when Mike Pall announced that he was searching for a maintainer, before his eventual return.


LuaJIT does have some backported features from newer versions. But Mike Pall -- the mad genius behind LuaJIT -- has made it clear he doesn't agree with many of the changes made in newer versions, hence why it's staying where it's at.


the beauty of open source is there's nothing stopping you! this might be your calling. best of luck


Language fork is unfortunate. Python situation isn't much of a fork really. Python 2 is basically EOL.


There’s no “basically”. Stick a fork in it; it’s done: https://www.python.org/doc/sunset-python-2/


It might not be supported by the consortium, but python2 still lives, slowly, in one place or another:

> The RHEL 8 AppStream Lifecycle Page puts the end date of RHEL 8's Python 2.7 package at June 2024.

https://access.redhat.com/solutions/4455511

At this point in RHEL it is only "deprecated", not "obsolete".


In RHEL I would never touch system python at all, and would install what every version I needed in a venv and configure any software I installed to use what ever version I needed. I learned the hard way to never mess with system python.


Which is better than this mess with Lua situation.


I strenuously disagree. Not every language needs to chase trends and pile on unnecessary complexity because developers want the latest shiny language toys to play with. It's good to have a simple, stable language that works and that you can depend on to remain sane for the forseeable future.

C is a language like that but I fear the feature creep is coming (auto? AUTO??.) JS is a lost cause.


Languages are products as well, either they evolve to cater to new audiences, or they slowly die as the userbase shrinks away with the passing of each developer generation.


The language is different. The changes to environments in particular are a non-starter. Sandboxing is incredibly clunky in 5.2+, and we lost a lot of metaprogramming power and dynamic behavior.


> May be someone should fork it and bring it up to date with recent Lua versions. Why is this split needed?

Good news, you're someone. If you care, you're welcome to go for it.


If you need WASM, I think Candle is your current best bet: https://github.com/huggingface/candle


We jumped ship too. Forgejo has been amazing.


NVIDIA's NVRHI has been my favorite abstraction layer over the complexity that modern APIs bring.

In particular, this fork: https://github.com/RobertBeckebans/nvrhi which adds some niceties and quality of life improvements.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: