[LV] Add test for vectorisation of SAXPY unrolled by 5 (NFC). #153039

rj-jesus · 2025-08-11T15:44:02Z

This patch adds an example of where vectorising a loop based on SAXPY manually unrolled by five is not profitable, as discussed in #148808.

llvmbot · 2025-08-11T15:44:39Z

@llvm/pr-subscribers-llvm-transforms

Author: Ricardo Jesus (rj-jesus)

Changes

This patch adds an example of where vectorising a loop based on SAXPY manually unrolled by five is not profitable, as discussed in #148808.

Full diff: https://github.com/llvm/llvm-project/pull/153039.diff

1 Files Affected:

(added) llvm/test/Transforms/LoopVectorize/AArch64/saxpy-5.ll (+249)

diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/saxpy-5.ll b/llvm/test/Transforms/LoopVectorize/AArch64/saxpy-5.ll
new file mode 100644
index 0000000000000..7ed6c16ead74f
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/saxpy-5.ll
@@ -0,0 +1,249 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -S -passes=loop-vectorize | FileCheck %s --check-prefix=CHECK
+; RUN: opt < %s -S -passes=loop-vectorize -mattr=+sve | FileCheck %s --check-prefix=CHECK-SVE
+
+target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
+target triple = "aarch64-unknown-linux-gnu"
+
+; This test contains an example of where vectorising a loop based on SAXPY
+; manually unrolled by five is not profitable:
+;
+;   void saxpy(long n, float a, float *x, float *y) {
+;     for (int i = 0; i < n; i += 5) {
+;       y[i] += a * x[i];
+;       y[i + 1] += a * x[i + 1];
+;       y[i + 2] += a * x[i + 2];
+;       y[i + 3] += a * x[i + 3];
+;       y[i + 4] += a * x[i + 4];
+;     }
+;   }
+;
+; Note: Even though the loop is not vectorised with scalable vectors, the issue
+; currently only manifests itself with +sve due to an interaction with
+; `prefersVectorizedAddressing'.
+
+define void @saxpy(i64 %n, float %a, ptr readonly %x, ptr noalias %y) {
+; CHECK-LABEL: define void @saxpy(
+; CHECK-SAME: i64 [[N:%.*]], float [[A:%.*]], ptr readonly [[X:%.*]], ptr noalias [[Y:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = icmp sgt i64 [[N]], 0
+; CHECK-NEXT:    br i1 [[TMP0]], label %[[LOOP_PREHEADER:.*]], label %[[EXIT:.*]]
+; CHECK:       [[LOOP_PREHEADER]]:
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    [[TMP1:%.*]] = phi i64 [ [[TMP36:%.*]], %[[LOOP]] ], [ 0, %[[LOOP_PREHEADER]] ]
+; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds nuw float, ptr [[X]], i64 [[TMP1]]
+; CHECK-NEXT:    [[TMP3:%.*]] = load float, ptr [[TMP2]], align 4
+; CHECK-NEXT:    [[TMP4:%.*]] = fmul fast float [[TMP3]], [[A]]
+; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds nuw float, ptr [[Y]], i64 [[TMP1]]
+; CHECK-NEXT:    [[TMP6:%.*]] = load float, ptr [[TMP5]], align 4
+; CHECK-NEXT:    [[TMP7:%.*]] = fadd fast float [[TMP6]], [[TMP4]]
+; CHECK-NEXT:    store float [[TMP7]], ptr [[TMP5]], align 4
+; CHECK-NEXT:    [[TMP8:%.*]] = add nuw nsw i64 [[TMP1]], 1
+; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr inbounds nuw float, ptr [[X]], i64 [[TMP8]]
+; CHECK-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP9]], align 4
+; CHECK-NEXT:    [[TMP11:%.*]] = fmul fast float [[TMP10]], [[A]]
+; CHECK-NEXT:    [[TMP12:%.*]] = getelementptr inbounds nuw float, ptr [[Y]], i64 [[TMP8]]
+; CHECK-NEXT:    [[TMP13:%.*]] = load float, ptr [[TMP12]], align 4
+; CHECK-NEXT:    [[TMP14:%.*]] = fadd fast float [[TMP13]], [[TMP11]]
+; CHECK-NEXT:    store float [[TMP14]], ptr [[TMP12]], align 4
+; CHECK-NEXT:    [[TMP15:%.*]] = add nuw nsw i64 [[TMP1]], 2
+; CHECK-NEXT:    [[TMP16:%.*]] = getelementptr inbounds nuw float, ptr [[X]], i64 [[TMP15]]
+; CHECK-NEXT:    [[TMP17:%.*]] = load float, ptr [[TMP16]], align 4
+; CHECK-NEXT:    [[TMP18:%.*]] = fmul fast float [[TMP17]], [[A]]
+; CHECK-NEXT:    [[TMP19:%.*]] = getelementptr inbounds nuw float, ptr [[Y]], i64 [[TMP15]]
+; CHECK-NEXT:    [[TMP20:%.*]] = load float, ptr [[TMP19]], align 4
+; CHECK-NEXT:    [[TMP21:%.*]] = fadd fast float [[TMP20]], [[TMP18]]
+; CHECK-NEXT:    store float [[TMP21]], ptr [[TMP19]], align 4
+; CHECK-NEXT:    [[TMP22:%.*]] = add nuw nsw i64 [[TMP1]], 3
+; CHECK-NEXT:    [[TMP23:%.*]] = getelementptr inbounds nuw float, ptr [[X]], i64 [[TMP22]]
+; CHECK-NEXT:    [[TMP24:%.*]] = load float, ptr [[TMP23]], align 4
+; CHECK-NEXT:    [[TMP25:%.*]] = fmul fast float [[TMP24]], [[A]]
+; CHECK-NEXT:    [[TMP26:%.*]] = getelementptr inbounds nuw float, ptr [[Y]], i64 [[TMP22]]
+; CHECK-NEXT:    [[TMP27:%.*]] = load float, ptr [[TMP26]], align 4
+; CHECK-NEXT:    [[TMP28:%.*]] = fadd fast float [[TMP27]], [[TMP25]]
+; CHECK-NEXT:    store float [[TMP28]], ptr [[TMP26]], align 4
+; CHECK-NEXT:    [[TMP29:%.*]] = add nuw nsw i64 [[TMP1]], 4
+; CHECK-NEXT:    [[TMP30:%.*]] = getelementptr inbounds nuw float, ptr [[X]], i64 [[TMP29]]
+; CHECK-NEXT:    [[TMP31:%.*]] = load float, ptr [[TMP30]], align 4
+; CHECK-NEXT:    [[TMP32:%.*]] = fmul fast float [[TMP31]], [[A]]
+; CHECK-NEXT:    [[TMP33:%.*]] = getelementptr inbounds nuw float, ptr [[Y]], i64 [[TMP29]]
+; CHECK-NEXT:    [[TMP34:%.*]] = load float, ptr [[TMP33]], align 4
+; CHECK-NEXT:    [[TMP35:%.*]] = fadd fast float [[TMP34]], [[TMP32]]
+; CHECK-NEXT:    store float [[TMP35]], ptr [[TMP33]], align 4
+; CHECK-NEXT:    [[TMP36]] = add nuw nsw i64 [[TMP1]], 5
+; CHECK-NEXT:    [[TMP37:%.*]] = icmp sgt i64 [[N]], [[TMP36]]
+; CHECK-NEXT:    br i1 [[TMP37]], label %[[LOOP]], label %[[EXIT_LOOPEXIT:.*]]
+; CHECK:       [[EXIT_LOOPEXIT]]:
+; CHECK-NEXT:    br label %[[EXIT]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret void
+;
+; CHECK-SVE-LABEL: define void @saxpy(
+; CHECK-SVE-SAME: i64 [[N:%.*]], float [[A:%.*]], ptr readonly [[X:%.*]], ptr noalias [[Y:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-SVE-NEXT:  [[ENTRY:.*:]]
+; CHECK-SVE-NEXT:    [[TMP0:%.*]] = icmp sgt i64 [[N]], 0
+; CHECK-SVE-NEXT:    br i1 [[TMP0]], label %[[LOOP_PREHEADER:.*]], label %[[EXIT:.*]]
+; CHECK-SVE:       [[LOOP_PREHEADER]]:
+; CHECK-SVE-NEXT:    [[TMP1:%.*]] = add i64 [[N]], -1
+; CHECK-SVE-NEXT:    [[TMP2:%.*]] = udiv i64 [[TMP1]], 5
+; CHECK-SVE-NEXT:    [[TMP3:%.*]] = add nuw nsw i64 [[TMP2]], 1
+; CHECK-SVE-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP3]], 2
+; CHECK-SVE-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK-SVE:       [[VECTOR_PH]]:
+; CHECK-SVE-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[TMP3]], 2
+; CHECK-SVE-NEXT:    [[N_VEC:%.*]] = sub i64 [[TMP3]], [[N_MOD_VF]]
+; CHECK-SVE-NEXT:    [[TMP4:%.*]] = mul i64 [[N_VEC]], 5
+; CHECK-SVE-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x float> poison, float [[A]], i64 0
+; CHECK-SVE-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x float> [[BROADCAST_SPLATINSERT]], <2 x float> poison, <2 x i32> zeroinitializer
+; CHECK-SVE-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK-SVE:       [[VECTOR_BODY]]:
+; CHECK-SVE-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-SVE-NEXT:    [[OFFSET_IDX:%.*]] = mul i64 [[INDEX]], 5
+; CHECK-SVE-NEXT:    [[TMP5:%.*]] = getelementptr inbounds nuw float, ptr [[X]], i64 [[OFFSET_IDX]]
+; CHECK-SVE-NEXT:    [[WIDE_VEC:%.*]] = load <10 x float>, ptr [[TMP5]], align 4
+; CHECK-SVE-NEXT:    [[STRIDED_VEC:%.*]] = shufflevector <10 x float> [[WIDE_VEC]], <10 x float> poison, <2 x i32> <i32 0, i32 5>
+; CHECK-SVE-NEXT:    [[STRIDED_VEC1:%.*]] = shufflevector <10 x float> [[WIDE_VEC]], <10 x float> poison, <2 x i32> <i32 1, i32 6>
+; CHECK-SVE-NEXT:    [[STRIDED_VEC2:%.*]] = shufflevector <10 x float> [[WIDE_VEC]], <10 x float> poison, <2 x i32> <i32 2, i32 7>
+; CHECK-SVE-NEXT:    [[STRIDED_VEC3:%.*]] = shufflevector <10 x float> [[WIDE_VEC]], <10 x float> poison, <2 x i32> <i32 3, i32 8>
+; CHECK-SVE-NEXT:    [[STRIDED_VEC4:%.*]] = shufflevector <10 x float> [[WIDE_VEC]], <10 x float> poison, <2 x i32> <i32 4, i32 9>
+; CHECK-SVE-NEXT:    [[TMP6:%.*]] = fmul fast <2 x float> [[STRIDED_VEC]], [[BROADCAST_SPLAT]]
+; CHECK-SVE-NEXT:    [[TMP7:%.*]] = getelementptr inbounds nuw float, ptr [[Y]], i64 [[OFFSET_IDX]]
+; CHECK-SVE-NEXT:    [[WIDE_VEC5:%.*]] = load <10 x float>, ptr [[TMP7]], align 4
+; CHECK-SVE-NEXT:    [[STRIDED_VEC6:%.*]] = shufflevector <10 x float> [[WIDE_VEC5]], <10 x float> poison, <2 x i32> <i32 0, i32 5>
+; CHECK-SVE-NEXT:    [[STRIDED_VEC7:%.*]] = shufflevector <10 x float> [[WIDE_VEC5]], <10 x float> poison, <2 x i32> <i32 1, i32 6>
+; CHECK-SVE-NEXT:    [[STRIDED_VEC8:%.*]] = shufflevector <10 x float> [[WIDE_VEC5]], <10 x float> poison, <2 x i32> <i32 2, i32 7>
+; CHECK-SVE-NEXT:    [[STRIDED_VEC9:%.*]] = shufflevector <10 x float> [[WIDE_VEC5]], <10 x float> poison, <2 x i32> <i32 3, i32 8>
+; CHECK-SVE-NEXT:    [[STRIDED_VEC10:%.*]] = shufflevector <10 x float> [[WIDE_VEC5]], <10 x float> poison, <2 x i32> <i32 4, i32 9>
+; CHECK-SVE-NEXT:    [[TMP8:%.*]] = fadd fast <2 x float> [[STRIDED_VEC6]], [[TMP6]]
+; CHECK-SVE-NEXT:    [[TMP9:%.*]] = fmul fast <2 x float> [[STRIDED_VEC1]], [[BROADCAST_SPLAT]]
+; CHECK-SVE-NEXT:    [[TMP10:%.*]] = fadd fast <2 x float> [[STRIDED_VEC7]], [[TMP9]]
+; CHECK-SVE-NEXT:    [[TMP11:%.*]] = fmul fast <2 x float> [[STRIDED_VEC2]], [[BROADCAST_SPLAT]]
+; CHECK-SVE-NEXT:    [[TMP12:%.*]] = fadd fast <2 x float> [[STRIDED_VEC8]], [[TMP11]]
+; CHECK-SVE-NEXT:    [[TMP13:%.*]] = fmul fast <2 x float> [[STRIDED_VEC3]], [[BROADCAST_SPLAT]]
+; CHECK-SVE-NEXT:    [[TMP14:%.*]] = fadd fast <2 x float> [[STRIDED_VEC9]], [[TMP13]]
+; CHECK-SVE-NEXT:    [[TMP15:%.*]] = fmul fast <2 x float> [[STRIDED_VEC4]], [[BROADCAST_SPLAT]]
+; CHECK-SVE-NEXT:    [[TMP16:%.*]] = fadd fast <2 x float> [[STRIDED_VEC10]], [[TMP15]]
+; CHECK-SVE-NEXT:    [[TMP17:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> [[TMP10]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-SVE-NEXT:    [[TMP18:%.*]] = shufflevector <2 x float> [[TMP12]], <2 x float> [[TMP14]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-SVE-NEXT:    [[TMP19:%.*]] = shufflevector <4 x float> [[TMP17]], <4 x float> [[TMP18]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+; CHECK-SVE-NEXT:    [[TMP20:%.*]] = shufflevector <2 x float> [[TMP16]], <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-SVE-NEXT:    [[TMP21:%.*]] = shufflevector <8 x float> [[TMP19]], <8 x float> [[TMP20]], <10 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9>
+; CHECK-SVE-NEXT:    [[INTERLEAVED_VEC:%.*]] = shufflevector <10 x float> [[TMP21]], <10 x float> poison, <10 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 1, i32 3, i32 5, i32 7, i32 9>
+; CHECK-SVE-NEXT:    store <10 x float> [[INTERLEAVED_VEC]], ptr [[TMP7]], align 4
+; CHECK-SVE-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
+; CHECK-SVE-NEXT:    [[TMP22:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-SVE-NEXT:    br i1 [[TMP22]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK-SVE:       [[MIDDLE_BLOCK]]:
+; CHECK-SVE-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]
+; CHECK-SVE-NEXT:    br i1 [[CMP_N]], label %[[EXIT_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-SVE:       [[SCALAR_PH]]:
+; CHECK-SVE-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[TMP4]], %[[MIDDLE_BLOCK]] ], [ 0, %[[LOOP_PREHEADER]] ]
+; CHECK-SVE-NEXT:    br label %[[LOOP:.*]]
+; CHECK-SVE:       [[LOOP]]:
+; CHECK-SVE-NEXT:    [[TMP23:%.*]] = phi i64 [ [[TMP58:%.*]], %[[LOOP]] ], [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ]
+; CHECK-SVE-NEXT:    [[TMP24:%.*]] = getelementptr inbounds nuw float, ptr [[X]], i64 [[TMP23]]
+; CHECK-SVE-NEXT:    [[TMP25:%.*]] = load float, ptr [[TMP24]], align 4
+; CHECK-SVE-NEXT:    [[TMP26:%.*]] = fmul fast float [[TMP25]], [[A]]
+; CHECK-SVE-NEXT:    [[TMP27:%.*]] = getelementptr inbounds nuw float, ptr [[Y]], i64 [[TMP23]]
+; CHECK-SVE-NEXT:    [[TMP28:%.*]] = load float, ptr [[TMP27]], align 4
+; CHECK-SVE-NEXT:    [[TMP29:%.*]] = fadd fast float [[TMP28]], [[TMP26]]
+; CHECK-SVE-NEXT:    store float [[TMP29]], ptr [[TMP27]], align 4
+; CHECK-SVE-NEXT:    [[TMP30:%.*]] = add nuw nsw i64 [[TMP23]], 1
+; CHECK-SVE-NEXT:    [[TMP31:%.*]] = getelementptr inbounds nuw float, ptr [[X]], i64 [[TMP30]]
+; CHECK-SVE-NEXT:    [[TMP32:%.*]] = load float, ptr [[TMP31]], align 4
+; CHECK-SVE-NEXT:    [[TMP33:%.*]] = fmul fast float [[TMP32]], [[A]]
+; CHECK-SVE-NEXT:    [[TMP34:%.*]] = getelementptr inbounds nuw float, ptr [[Y]], i64 [[TMP30]]
+; CHECK-SVE-NEXT:    [[TMP35:%.*]] = load float, ptr [[TMP34]], align 4
+; CHECK-SVE-NEXT:    [[TMP36:%.*]] = fadd fast float [[TMP35]], [[TMP33]]
+; CHECK-SVE-NEXT:    store float [[TMP36]], ptr [[TMP34]], align 4
+; CHECK-SVE-NEXT:    [[TMP37:%.*]] = add nuw nsw i64 [[TMP23]], 2
+; CHECK-SVE-NEXT:    [[TMP38:%.*]] = getelementptr inbounds nuw float, ptr [[X]], i64 [[TMP37]]
+; CHECK-SVE-NEXT:    [[TMP39:%.*]] = load float, ptr [[TMP38]], align 4
+; CHECK-SVE-NEXT:    [[TMP40:%.*]] = fmul fast float [[TMP39]], [[A]]
+; CHECK-SVE-NEXT:    [[TMP41:%.*]] = getelementptr inbounds nuw float, ptr [[Y]], i64 [[TMP37]]
+; CHECK-SVE-NEXT:    [[TMP42:%.*]] = load float, ptr [[TMP41]], align 4
+; CHECK-SVE-NEXT:    [[TMP43:%.*]] = fadd fast float [[TMP42]], [[TMP40]]
+; CHECK-SVE-NEXT:    store float [[TMP43]], ptr [[TMP41]], align 4
+; CHECK-SVE-NEXT:    [[TMP44:%.*]] = add nuw nsw i64 [[TMP23]], 3
+; CHECK-SVE-NEXT:    [[TMP45:%.*]] = getelementptr inbounds nuw float, ptr [[X]], i64 [[TMP44]]
+; CHECK-SVE-NEXT:    [[TMP46:%.*]] = load float, ptr [[TMP45]], align 4
+; CHECK-SVE-NEXT:    [[TMP47:%.*]] = fmul fast float [[TMP46]], [[A]]
+; CHECK-SVE-NEXT:    [[TMP48:%.*]] = getelementptr inbounds nuw float, ptr [[Y]], i64 [[TMP44]]
+; CHECK-SVE-NEXT:    [[TMP49:%.*]] = load float, ptr [[TMP48]], align 4
+; CHECK-SVE-NEXT:    [[TMP50:%.*]] = fadd fast float [[TMP49]], [[TMP47]]
+; CHECK-SVE-NEXT:    store float [[TMP50]], ptr [[TMP48]], align 4
+; CHECK-SVE-NEXT:    [[TMP51:%.*]] = add nuw nsw i64 [[TMP23]], 4
+; CHECK-SVE-NEXT:    [[TMP52:%.*]] = getelementptr inbounds nuw float, ptr [[X]], i64 [[TMP51]]
+; CHECK-SVE-NEXT:    [[TMP53:%.*]] = load float, ptr [[TMP52]], align 4
+; CHECK-SVE-NEXT:    [[TMP54:%.*]] = fmul fast float [[TMP53]], [[A]]
+; CHECK-SVE-NEXT:    [[TMP55:%.*]] = getelementptr inbounds nuw float, ptr [[Y]], i64 [[TMP51]]
+; CHECK-SVE-NEXT:    [[TMP56:%.*]] = load float, ptr [[TMP55]], align 4
+; CHECK-SVE-NEXT:    [[TMP57:%.*]] = fadd fast float [[TMP56]], [[TMP54]]
+; CHECK-SVE-NEXT:    store float [[TMP57]], ptr [[TMP55]], align 4
+; CHECK-SVE-NEXT:    [[TMP58]] = add nuw nsw i64 [[TMP23]], 5
+; CHECK-SVE-NEXT:    [[TMP59:%.*]] = icmp sgt i64 [[N]], [[TMP58]]
+; CHECK-SVE-NEXT:    br i1 [[TMP59]], label %[[LOOP]], label %[[EXIT_LOOPEXIT]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-SVE:       [[EXIT_LOOPEXIT]]:
+; CHECK-SVE-NEXT:    br label %[[EXIT]]
+; CHECK-SVE:       [[EXIT]]:
+; CHECK-SVE-NEXT:    ret void
+;
+entry:
+  %0 = icmp sgt i64 %n, 0
+  br i1 %0, label %loop, label %exit
+
+loop:
+  %1 = phi i64 [ %36, %loop ], [ 0, %entry ]
+  %2 = getelementptr inbounds nuw float, ptr %x, i64 %1
+  %3 = load float, ptr %2, align 4
+  %4 = fmul fast float %3, %a
+  %5 = getelementptr inbounds nuw float, ptr %y, i64 %1
+  %6 = load float, ptr %5, align 4
+  %7 = fadd fast float %6, %4
+  store float %7, ptr %5, align 4
+  %8 = add nuw nsw i64 %1, 1
+  %9 = getelementptr inbounds nuw float, ptr %x, i64 %8
+  %10 = load float, ptr %9, align 4
+  %11 = fmul fast float %10, %a
+  %12 = getelementptr inbounds nuw float, ptr %y, i64 %8
+  %13 = load float, ptr %12, align 4
+  %14 = fadd fast float %13, %11
+  store float %14, ptr %12, align 4
+  %15 = add nuw nsw i64 %1, 2
+  %16 = getelementptr inbounds nuw float, ptr %x, i64 %15
+  %17 = load float, ptr %16, align 4
+  %18 = fmul fast float %17, %a
+  %19 = getelementptr inbounds nuw float, ptr %y, i64 %15
+  %20 = load float, ptr %19, align 4
+  %21 = fadd fast float %20, %18
+  store float %21, ptr %19, align 4
+  %22 = add nuw nsw i64 %1, 3
+  %23 = getelementptr inbounds nuw float, ptr %x, i64 %22
+  %24 = load float, ptr %23, align 4
+  %25 = fmul fast float %24, %a
+  %26 = getelementptr inbounds nuw float, ptr %y, i64 %22
+  %27 = load float, ptr %26, align 4
+  %28 = fadd fast float %27, %25
+  store float %28, ptr %26, align 4
+  %29 = add nuw nsw i64 %1, 4
+  %30 = getelementptr inbounds nuw float, ptr %x, i64 %29
+  %31 = load float, ptr %30, align 4
+  %32 = fmul fast float %31, %a
+  %33 = getelementptr inbounds nuw float, ptr %y, i64 %29
+  %34 = load float, ptr %33, align 4
+  %35 = fadd fast float %34, %32
+  store float %35, ptr %33, align 4
+  %36 = add nuw nsw i64 %1, 5
+  %37 = icmp sgt i64 %n, %36
+  br i1 %37, label %loop, label %exit
+
+exit:
+  ret void
+}
+;.
+; CHECK-SVE: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; CHECK-SVE: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK-SVE: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK-SVE: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
+;.

david-arm · 2025-08-21T10:35:14Z

llvm/test/Transforms/LoopVectorize/AArch64/saxpy-5.ll

+; CHECK-SVE-NEXT:    ret void
+;
+entry:
+  %0 = icmp sgt i64 %n, 0


It would be good if you could give some human readable names for the variables, e.g. %xgep1, %ygep1, %xgep2, etc. Thanks!

Hi @david-arm, as it turns out, #149047 seems to have improved the code generated for loops like this quite a bit (https://godbolt.org/z/198WK4fsK), so I think I'll close #148808.

Do you still see value in adding this test (in which case I'll happily name the variables properly), or should I close it too? Thanks very much!

Oh that's good to hear. #149047 should ideally be happening inside the vectorizer (the deinterleaving), so that all the costs can be more correct as it vectorizes. (And it could learn different tricks). The test are probably worth having if we don't have them elsewhere.

I've just updated the tests to give the variables better names as suggested by @david-arm, although I wonder if the test should be made a codegen test so that we also test the interaction with #149047?

To do that, you could add it to PhaseOrdering

Maybe add this variant to llvm/test/Transforms/PhaseOrdering/AArch64/interleave_vec.ll? It might be worth making sure there are specific tests for the vectorizer too, but they might already exist or could be added later if needed.

Thanks for the suggestions, I've now moved the test to Transforms/PhaseOrdering/AArch64/interleave_vec.ll. :)

This test contains a vectorisation example of a loop based on SAXPY manually unrolled by five, as discussed in llvm#148808.

davemgreen

LGTM

fhahn

Thanks for the update, LGTM

rj-jesus requested review from fhahn, Mel-Chen and david-arm August 11, 2025 15:44

llvmbot added the llvm:transforms label Aug 11, 2025

david-arm reviewed Aug 21, 2025

View reviewed changes

[LV] Pre-commit test for vectorisation of SAXPY unrolled by 5 (NFC).

1cf49b9

This test contains a vectorisation example of a loop based on SAXPY manually unrolled by five, as discussed in llvm#148808.

rj-jesus force-pushed the rjj/vp-add-test-saxpy-5 branch from c0d697d to 1cf49b9 Compare August 26, 2025 09:17

rj-jesus mentioned this pull request Aug 26, 2025

[VPlan] Consider address computation cost in VPInterleaveRecipe. #148808

Closed

Move test to interleave_vec.ll.

e3ef062

davemgreen approved these changes Aug 26, 2025

View reviewed changes

fhahn approved these changes Aug 26, 2025

View reviewed changes

rj-jesus changed the title ~~[LV] Pre-commit test for vectorisation of SAXPY unrolled by 5 (NFC).~~ [LV] Add test for vectorisation of SAXPY unrolled by 5 (NFC). Aug 27, 2025

rj-jesus merged commit 6551f7f into llvm:main Aug 27, 2025
9 checks passed

rj-jesus deleted the rjj/vp-add-test-saxpy-5 branch August 27, 2025 08:02

aemerson mentioned this pull request Aug 30, 2025

[NFC] [clangd] [Modules] remove dot in log #156207

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV] Add test for vectorisation of SAXPY unrolled by 5 (NFC). #153039

[LV] Add test for vectorisation of SAXPY unrolled by 5 (NFC). #153039

Uh oh!

rj-jesus commented Aug 11, 2025

Uh oh!

llvmbot commented Aug 11, 2025

Uh oh!

david-arm Aug 21, 2025

Uh oh!

rj-jesus Aug 22, 2025

Uh oh!

davemgreen Aug 24, 2025

Uh oh!

rj-jesus Aug 26, 2025

Uh oh!

fhahn Aug 26, 2025

Uh oh!

davemgreen Aug 26, 2025

Uh oh!

rj-jesus Aug 26, 2025

Uh oh!

davemgreen left a comment

Uh oh!

fhahn left a comment

Uh oh!

Uh oh!

Uh oh!

[LV] Add test for vectorisation of SAXPY unrolled by 5 (NFC). #153039

[LV] Add test for vectorisation of SAXPY unrolled by 5 (NFC). #153039

Uh oh!

Conversation

rj-jesus commented Aug 11, 2025

Uh oh!

llvmbot commented Aug 11, 2025

Uh oh!

david-arm Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

rj-jesus Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

davemgreen Aug 24, 2025

Choose a reason for hiding this comment

Uh oh!

rj-jesus Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

fhahn Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

davemgreen Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

rj-jesus Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!