> It's easy to pick on undefined behavior in C when you focus on the more gratui...

anarazel · on May 20, 2021

See e.g. https://github.com/postgres/postgres/blob/8bdd6f563aa2456de6...

anarazel · on May 20, 2021

Another thing is that the wording around some of the UB issues is just plain bad. The most extreme probably is the rules around strict aliasing. That there, for quite a while, was uncertainty whether the rules allow type punning by reading a union member when the last write was to another member of a good example of not taking reality into account. Yes memcpy exists - but it is even less type safe!

jcranmer · on May 20, 2021

The union punning trick is UB in C89 and well-defined in C99 and later, although it was erroneously listed in the (non-normative) Annex listing UBs in C99 (removed by C11).

Strict aliasing is another category of UB that I'd consider gratuitous.

anarazel · on May 20, 2021

> The union punning trick is UB in C89 and well-defined in C99 and later, although it was erroneously listed in the (non-normative) Annex listing UBs in C99 (removed by C11).

Right, that's my point. If the standard folks can't understand their standard, how are mere mortals supposed to?

> Strict aliasing is another category of UB that I'd consider gratuitous.

I'm of a bit split minds on it. It can yield substantial speedups. But is also impractical in a lot of cases. And it's often but strong enough anyway, requiring explicit restrict annotations for the compiler to understand two pointers don't alias. Turns out two pointers of the same (or compatible) type aren't rare in performance critical sections...

Realistically it should have been opt-in.

nullc · on May 20, 2021

> Strict aliasing is another category of UB that I'd consider gratuitous.

Without it you cannot vectorize (or even internally re-order) many loops which are currently vectorizable because the compiler can't statically prove arguments won't alias otherwise.

vyodaiken · on May 21, 2021

That is completely false. Look up "restrict" and there are many other contexts. BTW, "prove" and "assume" are different things. This is an old argument, which in a just world would have been settled by Dennis Ritchie's comments.

nullc · on May 22, 2021

I'm quite familiar with restrict, having added it to many codebases. It's a moderately fragile construct which in no way replaces the normal c memory model. (It's also significantly under exploited by most compilers, presumably since very little code is restrict annotated.)

Prove and assume are indeed different things, but what the compiler is able to do is _prove_ the validity of the transform within the context of the C abstract machine.

Plenty of sibling comments in this thread show concrete examples of significant optimizations lost with alias analysis (and/or non-wrapping signed integers) disabled.

If the compiler is not allowed to assume that the source code is valid then very few optimizations are safe at all. Clearly that isn't what you want, so I suppose that you would argue that being able to vectorize something like 90% of all currently vectorizable loops in existing code isn't worth the cost, and instead compilers should be able to assume that anything my alias anything unless the programmer has manually restrict annotated every variable that is written to in a loop?

I expect in that world you'd see lots of code run needlessly lots slower, and lots of other code needlessly get slathered over with restrict annotations as restrict turns into a reflex because leaving it out results in poor performance so regularly... resulting in incorrect annotations and the miscompilation you were hoping to avoid-- arguably the worst of all worlds.