To do UB "optimizations", the compiler first needs to figure out that there is a...

nullc · on May 20, 2021

> To do UB "optimizations", the compiler first needs to figure out that there is an UB it can "optimize" anyway.

The compiler assumes UB will never happen and it makes transformations that will be valid if there happens to be no UB. This doesn't require any explicit detection of UB, and in some cases UB or not is simply undecidable at compile time (as in no compiler could detect it without incorrect results).

Without these assumptions the resulting compiled code would be much slower, though some optimizations have different danger vs speed impact and there certainly can be a case that there are some optimizations that should be eschewed because they're a poor trade-off.

There are many cases where current compilers will warn you when you've done something that is UB. It's probably not the case that they warn for every such detectable case and if so it would be reasonable to ask them to warn about more of them.

I think your irritation is just based on a misunderstanding of the situation.

Compiler authors are C(++) programmers too, they also don't like footguns. They're not trying to screw anyone over. They don't waste their time adding optimizations that don't make real performance improvements just to trip up invalid code.

rndgermandude · on May 21, 2021

Yes, some UB are not decidable at compile time, but a lot could be easily speced to have a defined behavior at runtime, such as overflows.

The main reason to not spec these things is because people would be arguing "this makes compiled code on my esoteric 9-bit 1-complement chip slower" or "there was this chip in the 70s that did things differently" or "but a short int on Cray was 64-bit". Great, so now the spec has avoidable unnecessary undefined behavior all over the place, and the code other people wrote still does not run correctly on your 9-bit chip. Brought to you by the same people who decided "NULL is not necessarily (void*)0", and who define those integer types everybody uses (instead of stdint) with an "at least this big".

Yes, a lot of that is legacy stuff and was added to accommodate and model things that already existed (the wrong way to go about it, IMO, but hindsight is 20/20), but that's my argument: fix this stuff once and for all and for good in an upcoming spec iteration.

>Without these assumptions the resulting compiled code would be much slower

In some cases, this is true (for different levels of "much slower"), but the trade off here is still "running code that works, but a little slower" vs "running code that does not work and will launch a nuclear strike at Switzerland by accident, but really fast".

In a lot of cases, it will not be slower, or at least not much slower.

>I think your irritation is just based on a misunderstanding of the situation.

Frankly, not really. I started writing my first C (and C++) in the early 90s, and I think I do understand the situation pretty well by now. But I should have been more precise in my initial ranting comment, I give you that.

>They're not trying to screw anyone over.

I didn't say that they are.

saagarjha · on May 21, 2021

Note that (void *)0 is always NULL, as mandated by the standard.

But, to address the content of your comment: defined behavior at runtime is not necessarily good behavior at runtime. Defining signed integer overflow to wrap, for example, is probably a bad idea, because this is rarely the intent of the code. Having all such operations trap might be a good idea, but now you're going to get the same "stop breaking my working programs" people angry at you.

rndgermandude · on May 21, 2021

Yes, thankfully at least with NULL they didn't fall into the legacy trap and messed up the standard with non-zero NULL that some machines before have been kind of using.

>Defining signed integer overflow to wrap, for example, is probably a bad idea

I wouldn't call it great behavior, but it's at least what most people expect will happen, and most people will be able to understand what's going on, and it's fast on most systems that matter. However, it's still undefined behavior. Just codifying overflows to be wrapping, would therefore be an improvement in my opinion, at least over what we have today.

gpderetta · on May 21, 2021

I would say that most people expect it not to happen. If they really had to mandate a behavior, it should be to trap.

account42 · on May 21, 2021

> Note that (void *)0 is always NULL, as mandated by the standard.

Also note that this is distinct from the memory representation of (void *)0 being all 0 bits, which is explicitly not mandated.

gpderetta · on May 20, 2021

> To do UB "optimizations", the compiler first needs to figure out that there is an UB it can "optimize" anyway.

That's not how compwillrs work. In fact in the general case it is impossible to figure out at compile time that "there is an UB".

The compiler instead assumes as an axiom that no UB can ever happen and uses the axioms to prove properties of the code.

These days if you want to catch UB, compile with -fsanitize=undefined-behaviour. The program wll then trap if UB is actually detected at runtime.

IcePic · on May 21, 2021

> These days if you want to catch UB, compile with -fsanitize=undefined-behaviour. The program wll then trap if UB is actually detected at runtime.

So, let me get this straight, someone wants to make sure pointer p is not null (in the wrong way), and codes something like the examples in posts above like if (!p) ... and if that doesn't trigger calls use(*p), but compiler decides p can never be null because that would be UB and hence removes the check.

The C coder dumps the code and gets upset because the check is removed and gets the hint to catch UB by adding -fsanitize .. that "catches UB" in the above scenario so that the program will "trap if UB is detected".

I think we just came full circle there.

Sure, the -f will catch ALL detected bugs and so on, but I still found it a bit funny.

gpderetta · on May 21, 2021

It is a bit different.

Ubsan will abort an invalid program if it detect ub. It doesn't let you handle it. So you shouldn't remove the erroneous check, but fix it so it is no longer erroneous, and ubsan will help you identify these errors.

Also ubsan adds significant overhead so it is not really appropriate for production builds unfortunately (hence my wish for a less powerful ubsan-lite but with lower overhead).

tom_mellior · on May 21, 2021

I think you are misunderstanding the situation. Given code like:

    if (!p) {
        use(*p);
    }

(given no previous knowledge about p) no compiler will remove the "if (!p)" part.

What people are complaining about is the opposite case:

    use(*p);

    /* The compiler reasons that if p == NULL, the program would have crashed by now,
       so if we got here, p != NULL must hold. */

    if (!p) {  // the compiler can remove this branch
        report_error();
    }