-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[LAA] Fix WAW dependency analysis with negative distances #155266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Previously, LAA would incorrectly classify Write-After-Write dependencies with negative distances as safe Forward dependencies, allowing inappropriate vectorization of loops with bidirectional WAW dependencies. The issue occurred in loops like: for(int i = 0; i < n; ++i) { A[(i+1)*4] = 10; // First store A[i] = 100; // Second store } The dependence distance from first store to second store is negative: {-16,+,-12}. However, this represents a bidirectional WAW dependency that DependenceAnalysis would report as 'output [<>]!', indicating dependency in both directions. This patch fixes LAA to properly detect WAW dependencies with negative distances as unsafe, preventing incorrect vectorization. The fix adds a check specifically for Write-After-Write dependencies before the general negative distance handling, ensuring they are classified as Unknown (unsafe) rather than Forward (safe). A proper fix, not implemented here because it would be a major rework of LAA, is to implement bidirectional dependence checking that computes distances in both directions and detects inconsistent direction vectors.
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: Sebastian Pop (sebpop) ChangesPreviously, LAA would incorrectly classify Write-After-Write dependencies with negative distances as safe Forward dependencies, allowing inappropriate vectorization of loops with bidirectional WAW dependencies. The issue occurred in loops like: The dependence distance from first store to second store is negative: {-16,+,-12}. However, this represents a bidirectional WAW dependency that DependenceAnalysis would report as 'output [<>]!', indicating dependency in both directions. This patch fixes LAA to properly detect WAW dependencies with negative distances as unsafe, preventing incorrect vectorization. The fix adds a check specifically for Write-After-Write dependencies before the general negative distance handling, ensuring they are classified as Unknown (unsafe) rather than Forward (safe). A proper fix, not implemented here because it would be a major rework of LAA, is to implement bidirectional dependence checking that computes distances in both directions and detects inconsistent direction vectors. Patch is 28.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155266.diff 7 Files Affected:
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index bceddd0325276..e522d1392f7f9 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -2196,6 +2196,21 @@ MemoryDepChecker::isDependent(const MemAccessInfo &A, unsigned AIdx,
return Dependence::Unknown;
}
+ // For WAW (Write-After-Write) dependencies, negative distances in one
+ // direction can still represent unsafe dependencies. Since we only check
+ // dependencies in program order (AIdx < BIdx), a negative distance means
+ // the later write accesses memory locations before the earlier write.
+ // However, in a vectorized loop, both writes could execute simultaneously,
+ // potentially causing incorrect behavior. Therefore, WAW with negative
+ // distances should be treated as unsafe.
+ bool IsWriteAfterWrite = (AIsWrite && BIsWrite);
+ if (IsWriteAfterWrite) {
+ LLVM_DEBUG(
+ dbgs() << "LAA: WAW dependence with negative distance is unsafe\n");
+ return CheckCompletelyBeforeOrAfter() ? Dependence::NoDep
+ : Dependence::Unknown;
+ }
+
bool IsTrueDataDependence = (AIsWrite && !BIsWrite);
// Check if the first access writes to a location that is read in a later
// iteration, where the distance between them is not a multiple of a vector
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/depend_diff_types.ll b/llvm/test/Analysis/LoopAccessAnalysis/depend_diff_types.ll
index 023a8c056968f..00a2ce7337d0d 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/depend_diff_types.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/depend_diff_types.ll
@@ -1,4 +1,4 @@
-; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 4
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5
; RUN: opt -S -disable-output -passes='print<access-info>' < %s 2>&1 | FileCheck %s
@@ -449,9 +449,10 @@ exit:
define void @different_type_sizes_forward(ptr %dst) {
; CHECK-LABEL: 'different_type_sizes_forward'
; CHECK-NEXT: loop:
-; CHECK-NEXT: Memory dependences are safe
+; CHECK-NEXT: Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
+; CHECK-NEXT: Unknown data dependence.
; CHECK-NEXT: Dependences:
-; CHECK-NEXT: Forward:
+; CHECK-NEXT: Unknown:
; CHECK-NEXT: store i32 0, ptr %gep.10.iv, align 4 ->
; CHECK-NEXT: store i16 1, ptr %gep.iv, align 2
; CHECK-EMPTY:
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/forward-loop-carried.ll b/llvm/test/Analysis/LoopAccessAnalysis/forward-loop-carried.ll
index adfd19923e921..8b00ff0ad2a59 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/forward-loop-carried.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/forward-loop-carried.ll
@@ -70,7 +70,7 @@ define void @forward_different_access_sizes(ptr readnone %end, ptr %start) {
; CHECK-NEXT: store i32 0, ptr %gep.2, align 4 ->
; CHECK-NEXT: %l = load i24, ptr %gep.1, align 1
; CHECK-EMPTY:
-; CHECK-NEXT: Forward:
+; CHECK-NEXT: Unknown:
; CHECK-NEXT: store i32 0, ptr %gep.2, align 4 ->
; CHECK-NEXT: store i24 %l, ptr %ptr.iv, align 1
; CHECK-EMPTY:
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/forward-loop-independent.ll b/llvm/test/Analysis/LoopAccessAnalysis/forward-loop-independent.ll
index 7fc9958dba552..218166526d7c0 100644
--- a/llvm/test/Analysis/LoopAccessAnalysis/forward-loop-independent.ll
+++ b/llvm/test/Analysis/LoopAccessAnalysis/forward-loop-independent.ll
@@ -35,7 +35,7 @@ define void @f(ptr noalias %A, ptr noalias %B, ptr noalias %C, i64 %N) {
; CHECK-NEXT: store i32 %b_p2, ptr %Aidx_next, align 4 ->
; CHECK-NEXT: %a = load i32, ptr %Aidx, align 4
; CHECK-EMPTY:
-; CHECK-NEXT: Forward:
+; CHECK-NEXT: Unknown:
; CHECK-NEXT: store i32 %b_p2, ptr %Aidx_next, align 4 ->
; CHECK-NEXT: store i32 %b_p1, ptr %Aidx, align 4
; CHECK-EMPTY:
diff --git a/llvm/test/Analysis/LoopAccessAnalysis/waw-negative-dependence.ll b/llvm/test/Analysis/LoopAccessAnalysis/waw-negative-dependence.ll
new file mode 100644
index 0000000000000..be49af7d75460
--- /dev/null
+++ b/llvm/test/Analysis/LoopAccessAnalysis/waw-negative-dependence.ll
@@ -0,0 +1,109 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes='print<access-info>' -disable-output %s 2>&1 | FileCheck %s
+
+; Test that LAA correctly identifies Write-After-Write dependencies with negative
+; distances as unsafe. Previously, LAA would incorrectly classify negative distance
+; WAW dependencies as safe Forward dependencies, allowing inappropriate vectorization.
+;
+; This corresponds to the loop:
+; for(int i = 0; i < n; ++i) {
+; A[(i+1)*4] = 10; // First store: A[4, 8, 12, 16, ...]
+; A[i] = 100; // Second store: A[0, 1, 2, 3, 4, ...]
+; }
+;
+; The dependence distance from first store to second store is negative:
+; A[i] - A[(i+1)*4] = {0,+,4} - {16,+,16} = {-16,+,-12}
+; However, the dependence from second store to first store in the next iteration
+; would be positive: A[(i+1)*4] - A[i] = {16,+,16} - {0,+,4} = {16,+,12}
+;
+; This bidirectional dependence pattern (negative in one direction, positive in the
+; other) creates a Write-After-Write dependency that is unsafe for vectorization.
+; DependenceAnalysis would report this as "output [<>]!" indicating the complex
+; dependence direction. LAA must detect this as unsafe even when only checking
+; the negative distance direction.
+
+define void @test_waw_negative_dependence(i64 %n, ptr nocapture %A) {
+; CHECK-LABEL: 'test_waw_negative_dependence'
+; CHECK-NEXT: loop:
+; CHECK-NEXT: Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
+; CHECK-NEXT: Unknown data dependence.
+; CHECK-NEXT: Dependences:
+; CHECK-NEXT: Unknown:
+; CHECK-NEXT: store i32 10, ptr %arrayidx1, align 4 ->
+; CHECK-NEXT: store i32 100, ptr %arrayidx2, align 4
+; CHECK-EMPTY:
+; CHECK-NEXT: Run-time memory checks:
+; CHECK-NEXT: Grouped accesses:
+; CHECK-EMPTY:
+; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
+; CHECK-NEXT: SCEV assumptions:
+; CHECK-EMPTY:
+; CHECK-NEXT: Expressions re-written:
+;
+entry:
+ %cmp8 = icmp sgt i64 %n, 0
+ br i1 %cmp8, label %loop, label %exit
+
+loop:
+ %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %loop ]
+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+
+ ; First store: A[(i+1)*4] = 10
+ %0 = shl nsw i64 %indvars.iv.next, 2 ; (i+1)*4
+ %arrayidx1 = getelementptr inbounds i32, ptr %A, i64 %0
+ store i32 10, ptr %arrayidx1, align 4
+
+ ; Second store: A[i] = 100
+ %arrayidx2 = getelementptr inbounds i32, ptr %A, i64 %indvars.iv
+ store i32 100, ptr %arrayidx2, align 4
+
+ %exitcond.not = icmp eq i64 %indvars.iv.next, %n
+ br i1 %exitcond.not, label %exit, label %loop
+
+exit:
+ ret void
+}
+
+; Test a similar case but with different stride to ensure the fix is general.
+define void @test_waw_negative_dependence_different_stride(i64 %n, ptr nocapture %A) {
+; CHECK-LABEL: 'test_waw_negative_dependence_different_stride'
+; CHECK-NEXT: loop:
+; CHECK-NEXT: Report: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
+; CHECK-NEXT: Unknown data dependence.
+; CHECK-NEXT: Dependences:
+; CHECK-NEXT: Unknown:
+; CHECK-NEXT: store i32 10, ptr %arrayidx1, align 4 ->
+; CHECK-NEXT: store i32 100, ptr %arrayidx2, align 4
+; CHECK-EMPTY:
+; CHECK-NEXT: Run-time memory checks:
+; CHECK-NEXT: Grouped accesses:
+; CHECK-EMPTY:
+; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
+; CHECK-NEXT: SCEV assumptions:
+; CHECK-EMPTY:
+; CHECK-NEXT: Expressions re-written:
+;
+entry:
+ %cmp8 = icmp sgt i64 %n, 0
+ br i1 %cmp8, label %loop, label %exit
+
+loop:
+ %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %loop ]
+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+
+ ; First store: A[(i+2)*2] = 10
+ %0 = add nsw i64 %indvars.iv, 2 ; i+2
+ %1 = shl nsw i64 %0, 1 ; (i+2)*2
+ %arrayidx1 = getelementptr inbounds i32, ptr %A, i64 %1
+ store i32 10, ptr %arrayidx1, align 4
+
+ ; Second store: A[i] = 100
+ %arrayidx2 = getelementptr inbounds i32, ptr %A, i64 %indvars.iv
+ store i32 100, ptr %arrayidx2, align 4
+
+ %exitcond.not = icmp eq i64 %indvars.iv.next, %n
+ br i1 %exitcond.not, label %exit, label %loop
+
+exit:
+ ret void
+}
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll
index fd0bc0b6c20ef..e68744b8a6f37 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll
@@ -1158,53 +1158,22 @@ for.end:
define void @PR27626_5(ptr %a, i32 %x, i32 %y, i32 %z, i64 %n) #1 {
; CHECK-LABEL: @PR27626_5(
; CHECK-NEXT: entry:
-; CHECK-NEXT: [[SMAX:%.*]] = call i64 @llvm.smax.i64(i64 [[N:%.*]], i64 5)
-; CHECK-NEXT: [[TMP0:%.*]] = add nsw i64 [[SMAX]], -4
-; CHECK-NEXT: [[TMP1:%.*]] = lshr i64 [[TMP0]], 1
-; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1
-; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT: [[TMP4:%.*]] = shl nuw nsw i64 [[TMP3]], 2
-; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp samesign ult i64 [[TMP2]], [[TMP4]]
-; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
-; CHECK: vector.ph:
-; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT: [[TMP8:%.*]] = shl nuw nsw i64 [[TMP7]], 2
-; CHECK-NEXT: [[DOTNOT:%.*]] = sub nsw i64 0, [[TMP8]]
-; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[TMP2]], [[DOTNOT]]
-; CHECK-NEXT: [[TMP11:%.*]] = shl nuw i64 [[N_VEC]], 1
-; CHECK-NEXT: [[IND_END:%.*]] = or disjoint i64 [[TMP11]], 3
-; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[X:%.*]], i64 0
-; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
-; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[Y:%.*]], i64 0
-; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT1]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
-; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[Z:%.*]], i64 0
-; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT3]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
-; CHECK-NEXT: [[TMP10:%.*]] = call <vscale x 4 x i64> @llvm.stepvector.nxv4i64()
-; CHECK-NEXT: [[TMP21:%.*]] = shl <vscale x 4 x i64> [[TMP10]], splat (i64 1)
-; CHECK-NEXT: [[INDUCTION:%.*]] = add <vscale x 4 x i64> [[TMP21]], splat (i64 3)
-; CHECK-NEXT: [[TMP12:%.*]] = shl nuw nsw i64 [[TMP7]], 3
-; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP12]], i64 0
-; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
-; CHECK: vector.body:
-; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
-; CHECK-NEXT: [[VEC_IND:%.*]] = phi <vscale x 4 x i64> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], [[VECTOR_BODY]] ]
-; CHECK-NEXT: [[TMP13:%.*]] = add <vscale x 4 x i64> [[VEC_IND]], splat (i64 -1)
-; CHECK-NEXT: [[TMP14:%.*]] = add <vscale x 4 x i64> [[VEC_IND]], splat (i64 -3)
-; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], <vscale x 4 x i64> [[VEC_IND]]
-; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds i32, ptr [[A]], <vscale x 4 x i64> [[TMP13]]
-; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[A]], <vscale x 4 x i64> [[TMP14]]
-; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[BROADCAST_SPLAT]], <vscale x 4 x ptr> [[TMP16]], i32 4, <vscale x 4 x i1> splat (i1 true))
-; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[BROADCAST_SPLAT2]], <vscale x 4 x ptr> [[TMP17]], i32 4, <vscale x 4 x i1> splat (i1 true))
-; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i32.nxv4p0(<vscale x 4 x i32> [[BROADCAST_SPLAT4]], <vscale x 4 x ptr> [[TMP15]], i32 4, <vscale x 4 x i1> splat (i1 true))
-; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP8]]
-; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]
-; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP32:![0-9]+]]
-; CHECK: middle.block:
-; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP2]], [[N_VEC]]
-; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
-; CHECK: scalar.ph:
+; CHECK: for.body:
+; CHECK-NEXT: [[I:%.*]] = phi i64 [ [[I_NEXT:%.*]], [[VECTOR_BODY]] ], [ 3, [[ENTRY:%.*]] ]
+; CHECK-NEXT: [[A_I:%.*]] = getelementptr inbounds nuw i32, ptr [[A:%.*]], i64 [[I]]
+; CHECK-NEXT: [[TMP0:%.*]] = getelementptr i32, ptr [[A]], i64 [[I]]
+; CHECK-NEXT: [[A_I_MINUS_1:%.*]] = getelementptr i8, ptr [[TMP0]], i64 -4
+; CHECK-NEXT: [[TMP1:%.*]] = getelementptr i32, ptr [[A]], i64 [[I]]
+; CHECK-NEXT: [[A_I_MINUS_3:%.*]] = getelementptr i8, ptr [[TMP1]], i64 -12
+; CHECK-NEXT: store i32 [[X:%.*]], ptr [[A_I_MINUS_1]], align 4
+; CHECK-NEXT: store i32 [[Y:%.*]], ptr [[A_I_MINUS_3]], align 4
+; CHECK-NEXT: store i32 [[Z:%.*]], ptr [[A_I]], align 4
+; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 2
+; CHECK-NEXT: [[COND:%.*]] = icmp slt i64 [[I_NEXT]], [[N:%.*]]
+; CHECK-NEXT: br i1 [[COND]], label [[VECTOR_BODY]], label [[FOR_END:%.*]]
+; CHECK: for.end:
+; CHECK-NEXT: ret void
;
entry:
br label %for.body
@@ -1281,21 +1250,21 @@ define void @PR34743(ptr %a, ptr %b, i64 %n) #1 {
; CHECK-NEXT: [[TMP18:%.*]] = add nuw nsw <vscale x 4 x i64> [[VEC_IND]], splat (i64 1)
; CHECK-NEXT: [[TMP19:%.*]] = add nuw nsw <vscale x 4 x i64> [[VEC_IND]], splat (i64 2)
; CHECK-NEXT: [[TMP20:%.*]] = getelementptr inbounds i16, ptr [[A]], <vscale x 4 x i64> [[TMP18]]
-; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16.nxv4p0(<vscale x 4 x ptr> [[TMP20]], i32 4, <vscale x 4 x i1> splat (i1 true), <vscale x 4 x i16> poison), !alias.scope [[META34:![0-9]+]]
+; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16.nxv4p0(<vscale x 4 x ptr> [[TMP20]], i32 4, <vscale x 4 x i1> splat (i1 true), <vscale x 4 x i16> poison), !alias.scope [[META32:![0-9]+]]
; CHECK-NEXT: [[TMP21:%.*]] = sext <vscale x 4 x i16> [[WIDE_MASKED_GATHER]] to <vscale x 4 x i32>
; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i16, ptr [[A]], <vscale x 4 x i64> [[TMP19]]
-; CHECK-NEXT: [[WIDE_MASKED_GATHER4]] = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16.nxv4p0(<vscale x 4 x ptr> [[TMP22]], i32 4, <vscale x 4 x i1> splat (i1 true), <vscale x 4 x i16> poison), !alias.scope [[META34]]
+; CHECK-NEXT: [[WIDE_MASKED_GATHER4]] = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16.nxv4p0(<vscale x 4 x ptr> [[TMP22]], i32 4, <vscale x 4 x i1> splat (i1 true), <vscale x 4 x i16> poison), !alias.scope [[META32]]
; CHECK-NEXT: [[TMP23:%.*]] = call <vscale x 4 x i16> @llvm.vector.splice.nxv4i16(<vscale x 4 x i16> [[VECTOR_RECUR]], <vscale x 4 x i16> [[WIDE_MASKED_GATHER4]], i32 -1)
; CHECK-NEXT: [[TMP24:%.*]] = sext <vscale x 4 x i16> [[TMP23]] to <vscale x 4 x i32>
; CHECK-NEXT: [[TMP25:%.*]] = sext <vscale x 4 x i16> [[WIDE_MASKED_GATHER4]] to <vscale x 4 x i32>
; CHECK-NEXT: [[TMP26:%.*]] = mul nsw <vscale x 4 x i32> [[TMP24]], [[TMP21]]
; CHECK-NEXT: [[TMP27:%.*]] = mul nsw <vscale x 4 x i32> [[TMP26]], [[TMP25]]
; CHECK-NEXT: [[TMP28:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDEX]]
-; CHECK-NEXT: store <vscale x 4 x i32> [[TMP27]], ptr [[TMP28]], align 4, !alias.scope [[META37:![0-9]+]], !noalias [[META34]]
+; CHECK-NEXT: store <vscale x 4 x i32> [[TMP27]], ptr [[TMP28]], align 4, !alias.scope [[META35:![0-9]+]], !noalias [[META32]]
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP10]]
; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]
; CHECK-NEXT: [[TMP29:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; CHECK-NEXT: br i1 [[TMP29]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP39:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP29]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP37:![0-9]+]]
; CHECK: middle.block:
; CHECK-NEXT: [[TMP30:%.*]] = call i32 @llvm.vscale.i32()
; CHECK-NEXT: [[TMP31:%.*]] = shl nuw nsw i32 [[TMP30]], 2
@@ -1388,7 +1357,7 @@ define void @interleave_deinterleave_factor3(ptr writeonly noalias %dst, ptr rea
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP3]]
; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]
; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP41:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP39:![0-9]+]]
; CHECK: middle.block:
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
@@ -1475,7 +1444,7 @@ define void @interleave_deinterleave(ptr writeonly noalias %dst, ptr readonly %a
; CHECK-NEXT: store <vscale x 16 x i32> [[INTERLEAVED_VEC13]], ptr [[TMP21]], align 4
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP3]]
; CHECK-NEXT: [[TMP25:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
-; CHECK-NEXT: br i1 [[TMP25]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP43:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP25]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP41:![0-9]+]]
; CHECK: middle.block:
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
@@ -1584,7 +1553,7 @@ define void @interleave_deinterleave_reverse(ptr noalias nocapture readonly %A,
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP1]]
; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i32> [[VEC_IND]], [[DOTSPLAT]]
; CHECK-NEXT: [[TMP27:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
-; CHECK-NEXT: br i1 [[TMP27]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP45:![0-9]+]]
+; CHECK-NEXT: br i1 [[TMP27]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP43:![0-9]+]]
; CHECK: middle.block:
; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:
diff --git a/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll b/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
index add58758788f9..aab21965d6877 100644
--- a/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
+++ b/llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll
@@ -1348,82 +1348,20 @@ for.end:
define void @PR27626_5(ptr %a, i32 %x, i32 %y, i32 %z, i64 %n) {
; CHECK-LABEL: @PR27626_5(...
[truncated]
|
Previously, LAA would incorrectly classify Write-After-Write dependencies with negative distances as safe Forward dependencies, allowing inappropriate vectorization of loops with bidirectional WAW dependencies.
The issue occurred in loops like:
for(int i = 0; i < n; ++i) {
A[(i+1)*4] = 10; // First store
A[i] = 100; // Second store
}
The dependence distance from first store to second store is negative: {-16,+,-12}. However, this represents a bidirectional WAW dependency that DependenceAnalysis would report as 'output [<>]!', indicating dependency in both directions.
This patch fixes LAA to properly detect WAW dependencies with negative distances as unsafe, preventing incorrect vectorization.
The fix adds a check specifically for Write-After-Write dependencies before the general negative distance handling, ensuring they are classified as Unknown (unsafe) rather than Forward (safe).
A proper fix, not implemented here because it would be a major rework of LAA, is to implement bidirectional dependence checking that computes distances in both directions and detects inconsistent direction vectors.