Let's have a look on the following C program block:
if (x < UINT_MAX-1)
puts("foo");
else
puts("bar")
Piping clangs's output to llvm-dis gives us the following LLVM assembly representation:
%1 = load i32* %x, align 4
%2 = icmp ult i32 %1, -2
br i1 %2, label %bb, label %bb1
bb:
%3 = call i32 @puts(i8* getelementptr ([4 x i8]* @.str, i32 0, i32 0)) nounwind
br label %bb2
bb1:
%4 = call i32 @puts(i8* getelementptr ([4 x i8]* @.str1, i32 0, i32 0)) nounwind
br label %bb2
bb2:
Now, in the assembly code, it can be read that the %1 register is compared with -2, and the operands are to be interpreted as unsigned 32-bit integers. On the bit level, that is of course correct. However, a human needs to make a conversion in her head to understand on what concrete values of %1 the first branch is taken.
In the two's complement, the first bit represents the sign. For negative numbers, the sign bit is set, for positive, it is unset. The number zero is represented as a value where all bits are unset. The smallest negative value of an n-bit integer is represented as one followed by n-1 zeros. Semantically it can be thought as if the rightmost n-1 bits represent a positive value that is summed with the smallest negative value available (-2^(n-1)), and the result is something between -2^(n-1)..-1.
Assuming 4-bit integers, -2 is represented as 1110 (-2^3 + 6). When 1110 is interpreted as an unsigned integer, that is 14 (2^3 + 2^2 + 2^1). However, in our example, we have 32-bit integers (the i32 type in LLVM), where the min value for int is -2147483648. In the 32-bit world, -2 is represented by a number where there are 31 ones followed by a zero. Interpreted as unsigned, that is 2^31 + 2^30 ... 2^2 + 2^1 = 4294967294.
So we came to the conclusion, for values of %1 < 4294967294, the first branch is taken. Otherwise the second branch is taken. That was not obvious directly from the assembly code. We had to consider the integer width to deduce the concrete unsigned value.
It should be noted that one can decide whether to write integer literals as the llvm tools do or not. Since the bit representation is identical for signed i32 -2 and unsigned i32 4294967294, it does not matter which way it is written. I would still stick to the signed interpretation for consistency's sake. Otherwise it becomes a hopeless mess.
No comments:
Post a Comment