-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Open
Description
Take:
#include <vector>
void f1(std::vector<int> &a, int *c)
{
if (a.size() != 1)
return;
for(auto b : a)
{
*c += 1;
c++;
}
}
void f0(int *a, int *b, int *c)
{
auto t = b - a;
if (t == 1)
{
do {
a++;
*c += 1;
c++;
} while (a != b);
}
}
The loop should be unrolled/gone as it iterates only once but on aarch64 (the target does not matter really; both libstdc++ and libc++), LLVM produces for f1 at -O3:
ldp x9, x8, [x0]
sub x10, x8, x9
subs x10, x10, #4
ccmp x9, x8, #4, eq
b.eq .LBB1_7
cmp x10, #28
b.hs .LBB1_3
mov x10, x1
b .LBB1_6
For f0 the code generation is even more interesting:
sub x8, x1, x0
subs x8, x8, #4
b.ne .LBB0_7
cmp x8, #28
b.hs .LBB0_3
mov x8, x2
b .LBB0_6
It is obvious after the branch b.ne .LBB0_7
that x8 will be 4 but the cmp is not optimized out.
the label LBB0_3
leads to a vectorized version of the loop but that can never be reached.