lichtung

Why signed (x*2)/2 folds to x but unsigned doesn't

· #compilers

Take a signed int x. Is (x*2)/2 - x always zero?

In C, yes — in every defined execution. Signed overflow is undefined behaviour, so the compiler may assume x*2 doesn't wrap; and if it doesn't wrap, dividing by two recovers x exactly. Clang folds the whole thing away. At -O2 (clang 21):

define i32 @sgn(i32 %0) {
  ret i32 0
}

Now make it unsigned:

unsigned uns(unsigned x){ return (x*2)/2 - x; }

This one does not fold to zero:

define i32 @uns(i32 %0) {
  %2 = and i32 %0, -2147483648   ; x & 0x80000000
  ret i32 %2
}

That's the top bit of x, and it's the right answer, not a missed optimization. Unsigned arithmetic is defined to wrap modulo 2³², so x*2 genuinely drops the top bit and (x*2)/2 is x & 0x7fffffff. Subtract x and you're left with -(x & 0x80000000), which modulo 2³² is exactly x & 0x80000000. The compiler can't fold to zero because the value isn't zero.

So the same source expression compiles to a constant for signed and to real work for unsigned. The whole difference is undefined behaviour: the signed version gets to assume something the unsigned version may not.

Where does the assumption live? In a one-word flag on the IR. Ask clang for the unoptimized IR (-O0 -Xclang -disable-llvm-passes) and look at the multiply:

; signed
%4 = mul nsw i32 %3, 2
; unsigned
%4 = mul i32 %3, 2

nsw is "no signed wrap". The Clang frontend stamps it on signed arithmetic precisely because signed overflow is UB — it is the compiler writing down "I'm allowed to assume this doesn't wrap." The unsigned multiply carries no such flag, because unsigned wrap is defined and must be preserved. Later, the optimizer reads nsw and proves (x*2)/2 == x; without it, on the unsigned path, it can prove no such thing and correctly leaves the masking in.

(long behaves like int here: (x*2)/2 - x folds to 0, for the same reason.)

It's a small thing, but it's undefined behaviour doing exactly what it's for. "Signed overflow is UB" is usually quoted as a hazard — a way to get bitten. Here it's the other face of the same coin: nsw is that UB cashed out as an optimization the unsigned code simply can't have. Same expression, two different programs, and the only difference is what the standard let the compiler assume.