-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[IndVarSimplify] Fix Masking Issue by Adding nsw/nuw Flags to Trunc Instruction #150179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
TruncInst->setHasNoSignedWrap(); | ||
TruncInst->setHasNoUnsignedWrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think NUW and NSW imply NW, not the other way around, so I don't think we can apply NUW/NSW on trunc here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I understand your point. However, TruncIV
only has the nw
flag, and TruncAR->hasNoUnsignedWrap()
= TruncAR->hasNoSignedWrap()
= false
. so the masking issue remains unresolved.
CmpIndVar = Builder.CreateTrunc(
CmpIndVar, ExitCnt->getType(), "lftr.wideiv",
TruncAR->hasNoUnsignedWrap(), TruncAR->hasNoSignedWrap());
Given that TruncIV
is marked as nw
and truncation does not change the sign, can we assume that the nw
flag of TruncIV
retains the sign of the original IV? Specifically, if the original IV is signed, does TruncIV
's nw
imply that it behaves like nsw
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that TruncIV is marked as nw and truncation does not change the sign, can we assume that the nw flag of TruncIV retains the sign of the original IV? Specifically, if the original IV is signed, does TruncIV's nw imply that it behaves like nsw?
I think in general truncation can change the sign, e.g. i16 0x8000 truncated to i8 will be 0x00, so the sign can change.
I tried out just adding the nuw/nsw wrap flags if the condition predicate is unsigned or signed, but it looks like that's unsound:
--- a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
+++ b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
@@ -1050,8 +1050,10 @@ linearFunctionTestReplace(Loop *L, BasicBlock *ExitingBB,
bool Discard;
L->makeLoopInvariant(ExitCnt, Discard);
} else
- CmpIndVar = Builder.CreateTrunc(CmpIndVar, ExitCnt->getType(),
- "lftr.wideiv");
+ CmpIndVar =
+ Builder.CreateTrunc(CmpIndVar, ExitCnt->getType(), "lftr.wideiv",
+ cast<ICmpInst>(BI->getCondition())->isUnsigned(),
+ cast<ICmpInst>(BI->getCondition())->isSigned());
}
diff --git a/llvm/test/Transforms/IndVarSimplify/lftr.ll b/llvm/test/Transforms/IndVarSimplify/lftr.ll
index 5ee62ba357ab..3825a49563a6 100644
--- a/llvm/test/Transforms/IndVarSimplify/lftr.ll
+++ b/llvm/test/Transforms/IndVarSimplify/lftr.ll
@@ -415,7 +415,7 @@ define void @wide_trip_count_test1(ptr %autoc,
; CHECK-NEXT: [[ADD3:%.*]] = fadd float [[TEMP2]], [[MUL]]
; CHECK-NEXT: store float [[ADD3]], ptr [[ARRAYIDX2]], align 4
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
-; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
+; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc nuw i64 [[INDVARS_IV_NEXT]] to i32
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i32 [[LFTR_WIDEIV]], [[SUB]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT:%.*]]
; CHECK: for.end.loopexit:
Passing the change to wide_trip_count_test1 through alive2 shows that this eventually triggers poison where it didn't before.
I wonder if it's possible to infer the NUW/NSW flags on TruncatedIV where possible? I think part of the NUW/NSW information is being lost when the checks are pulled to outside the loop, e.g. in your C example from the original issue:
; Function Attrs: nounwind vscale_range(2,1024)
define dso_local void @func(ptr noundef captures(none) %result, i32 noundef signext %start) local_unnamed_addr #0 {
entry:
%cmp3 = icmp slt i32 %start, 100
br i1 %cmp3, label %for.body.preheader, label %for.cond.cleanup
for.body.preheader: ; preds = %entry
%0 = sext i32 %start to i64
br label %for.body
for.cond.cleanup.loopexit: ; preds = %for.body
br label %for.cond.cleanup
for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
ret void
for.body: ; preds = %for.body.preheader, %for.body
%indvars.iv = phi i64 [ %0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
%i.04 = phi i32 [ %inc, %for.body ], [ %start, %for.body.preheader ]
%idxprom = sext i32 %i.04 to i64
%arrayidx = getelementptr inbounds i32, ptr %result, i64 %indvars.iv
%1 = load i32, ptr %arrayidx, align 4, !tbaa !6
%add = add nsw i32 %1, 1
store i32 %add, ptr %arrayidx, align 4, !tbaa !6
%indvars.iv.next = add nsw i64 %indvars.iv, 1
%inc = add nsw i32 %i.04, 1
%cmp = icmp slt i64 %indvars.iv, 99
br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !10
}
I don't think SCEV sees the %cmp3 = icmp slt i32 %start, 100
condition and so it doesn't realise that %indvars.iv.next can't signed wrap in i32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that TruncIV is marked as nw and truncation does not change the sign, can we assume that the nw flag of TruncIV retains the sign of the original IV? Specifically, if the original IV is signed, does TruncIV's nw imply that it behaves like nsw?
I think in general truncation can change the sign, e.g. i16 0x8000 truncated to i8 will be 0x00, so the sign can change.
I tried out just adding the nuw/nsw wrap flags if the condition predicate is unsigned or signed, but it looks like that's unsound:
--- a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp +++ b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp @@ -1050,8 +1050,10 @@ linearFunctionTestReplace(Loop *L, BasicBlock *ExitingBB, bool Discard; L->makeLoopInvariant(ExitCnt, Discard); } else - CmpIndVar = Builder.CreateTrunc(CmpIndVar, ExitCnt->getType(), - "lftr.wideiv"); + CmpIndVar = + Builder.CreateTrunc(CmpIndVar, ExitCnt->getType(), "lftr.wideiv", + cast<ICmpInst>(BI->getCondition())->isUnsigned(), + cast<ICmpInst>(BI->getCondition())->isSigned()); } diff --git a/llvm/test/Transforms/IndVarSimplify/lftr.ll b/llvm/test/Transforms/IndVarSimplify/lftr.ll index 5ee62ba357ab..3825a49563a6 100644 --- a/llvm/test/Transforms/IndVarSimplify/lftr.ll +++ b/llvm/test/Transforms/IndVarSimplify/lftr.ll @@ -415,7 +415,7 @@ define void @wide_trip_count_test1(ptr %autoc, ; CHECK-NEXT: [[ADD3:%.*]] = fadd float [[TEMP2]], [[MUL]] ; CHECK-NEXT: store float [[ADD3]], ptr [[ARRAYIDX2]], align 4 ; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1 -; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32 +; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc nuw i64 [[INDVARS_IV_NEXT]] to i32 ; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i32 [[LFTR_WIDEIV]], [[SUB]] ; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT:%.*]] ; CHECK: for.end.loopexit:Passing the change to wide_trip_count_test1 through alive2 shows that this eventually triggers poison where it didn't before.
I wonder if it's possible to infer the NUW/NSW flags on TruncatedIV where possible? I think part of the NUW/NSW information is being lost when the checks are pulled to outside the loop, e.g. in your C example from the original issue:
; Function Attrs: nounwind vscale_range(2,1024) define dso_local void @func(ptr noundef captures(none) %result, i32 noundef signext %start) local_unnamed_addr #0 { entry: %cmp3 = icmp slt i32 %start, 100 br i1 %cmp3, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry %0 = sext i32 %start to i64 br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret void for.body: ; preds = %for.body.preheader, %for.body %indvars.iv = phi i64 [ %0, %for.body.preheader ], [ %indvars.iv.next, %for.body ] %i.04 = phi i32 [ %inc, %for.body ], [ %start, %for.body.preheader ] %idxprom = sext i32 %i.04 to i64 %arrayidx = getelementptr inbounds i32, ptr %result, i64 %indvars.iv %1 = load i32, ptr %arrayidx, align 4, !tbaa !6 %add = add nsw i32 %1, 1 store i32 %add, ptr %arrayidx, align 4, !tbaa !6 %indvars.iv.next = add nsw i64 %indvars.iv, 1 %inc = add nsw i32 %i.04, 1 %cmp = icmp slt i64 %indvars.iv, 99 br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit, !llvm.loop !10 }I don't think SCEV sees the
%cmp3 = icmp slt i32 %start, 100
condition and so it doesn't realise that %indvars.iv.next can't signed wrap in i32?
We found the linearFunctionTestReplace()
's design can help determine whether truncation will change the sign:
- The induction variable (
IV
) must be aLoopCounter
, so its step is guaranteed to be1
. - The
ICmpInst::Predicate
can only beeq
orne
, which means thatExitCnt
must be the final value ofIV
.
Therefore, when the initial value Start
of IV
does not exceed ExitCntSize
, the range [start, end)
of IV
will not cause signed or unsigned wrap.
For instance, in llvm/test/Transforms/IndVarSimplify/lftr.ll
, the initial value start = 68719476736 = 2^9
is within the range of i32
, the step is 1
, and the final value %sub
is also i32
. Thus, the IV
remains within the range of i32. temp3 = trunc nuw i64 %indvars.iv.next to i32
is valid.
So I propose to add a check for the type width of the initial IV
value. If it matches the target type for truncation, then annotate the Trunc Instruction with nsw
or nuw
flag. Looking forward to your reply :)
if (const SCEVAddRecExpr *TruncAR = | ||
dyn_cast<SCEVAddRecExpr>(TruncatedIV)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's an induction variable, is TruncatedIV
always guaranteed to be a SCEVAddRecExpr? Does it work if we do a cast instead?
if (const SCEVAddRecExpr *TruncAR = | |
dyn_cast<SCEVAddRecExpr>(TruncatedIV)) { | |
auto *TruncAR = cast<SCEVAddRecExpr>(TruncatedIV); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's an induction variable, is
TruncatedIV
always guaranteed to be a SCEVAddRecExpr? Does it work if we do a cast instead?
For safety, I prefer using dyn_cast
(even though in the genLoopLimit
function of this file, it directly uses cast<SCEVAddRecExpr>(SE->getSCEV(IndVar))
)
@llvm/pr-subscribers-llvm-transforms Author: bernadate (buggfg) ChangesAs I mentioned in the discussion, this patch specifically addresses the masking issue found in common loop structures. This patch resolves the masking issue by adding the nsw/nuw flags to the With this patch, the following common loop successfully undergoes 8 iterations of loop unrolling, resulting in a remarkable 2x performance improvement (without vectorization): void func(int result[], int start) {
for (int i = start; i < 100; i++)
result[i] += 1;
} Additionally, we have validated the functional correctness and effectiveness of this patch through testing on the SPEC CPU2006 and SPEC CPU2017 benchmarks. Thank you for considering this change! Full diff: https://github.com/llvm/llvm-project/pull/150179.diff 5 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
index 334c911191cb8..04a1f4831b8d8 100644
--- a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
+++ b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
@@ -1049,9 +1049,28 @@ linearFunctionTestReplace(Loop *L, BasicBlock *ExitingBB,
if (Extended) {
bool Discard;
L->makeLoopInvariant(ExitCnt, Discard);
- } else
+ } else{
CmpIndVar = Builder.CreateTrunc(CmpIndVar, ExitCnt->getType(),
"lftr.wideiv");
+
+ // Set the correct wrap flag to avoid the masking issue.
+ Instruction *TruncInst = dyn_cast<Instruction>(CmpIndVar);
+
+ // The TruncatedIV is incrementing.
+ if (const SCEVAddRecExpr *TruncAR =
+ dyn_cast<SCEVAddRecExpr>(TruncatedIV)) {
+ // If TruncIV does not cause self-wrap, explicitly add the nsw and nuw
+ // flags to TruncInst.
+ if (TruncAR->hasNoSelfWrap()) {
+ TruncInst->setHasNoSignedWrap();
+ TruncInst->setHasNoUnsignedWrap();
+ } else if (TruncAR->hasNoSignedWrap()) {
+ TruncInst->setHasNoSignedWrap();
+ } else if (TruncAR->hasNoUnsignedWrap()) {
+ TruncInst->setHasNoUnsignedWrap();
+ }
+ }
+ }
}
LLVM_DEBUG(dbgs() << "INDVARS: Rewriting loop exit condition to:\n"
<< " LHS:" << *CmpIndVar << '\n'
diff --git a/llvm/test/Transforms/IndVarSimplify/X86/eliminate-trunc.ll b/llvm/test/Transforms/IndVarSimplify/X86/eliminate-trunc.ll
index 565ac5c8743d4..7e7a3f192f998 100644
--- a/llvm/test/Transforms/IndVarSimplify/X86/eliminate-trunc.ll
+++ b/llvm/test/Transforms/IndVarSimplify/X86/eliminate-trunc.ll
@@ -227,7 +227,7 @@ define void @test_01_unsigned(i32 %n) {
; CHECK: loop:
; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
-; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[IV_NEXT]] to i32
+; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc nuw nsw i64 [[IV_NEXT]] to i32
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i32 [[LFTR_WIDEIV]], [[TMP0]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOP]], label [[EXIT:%.*]]
; CHECK: exit:
@@ -255,7 +255,7 @@ define void @test_02_unsigned(i32 %n) {
; CHECK: loop:
; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 4294967294, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
-; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[IV_NEXT]] to i32
+; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc nuw nsw i64 [[IV_NEXT]] to i32
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i32 [[LFTR_WIDEIV]], [[TMP0]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOP]], label [[EXIT:%.*]]
; CHECK: exit:
@@ -304,7 +304,7 @@ define void @test_04_unsigned(i32 %n) {
; CHECK: loop:
; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
-; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[IV_NEXT]] to i32
+; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc nuw nsw i64 [[IV_NEXT]] to i32
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i32 [[LFTR_WIDEIV]], [[TMP0]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOP]], label [[EXIT:%.*]]
; CHECK: exit:
@@ -332,7 +332,7 @@ define void @test_05_unsigned(i32 %n) {
; CHECK: loop:
; CHECK-NEXT: [[IV:%.*]] = phi i64 [ 1, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
-; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[IV_NEXT]] to i32
+; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc nuw nsw i64 [[IV_NEXT]] to i32
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i32 [[LFTR_WIDEIV]], [[TMP0]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[LOOP]], label [[EXIT:%.*]]
; CHECK: exit:
diff --git a/llvm/test/Transforms/IndVarSimplify/lftr-pr41998.ll b/llvm/test/Transforms/IndVarSimplify/lftr-pr41998.ll
index b7f4756b2757f..376ef1ac5ffac 100644
--- a/llvm/test/Transforms/IndVarSimplify/lftr-pr41998.ll
+++ b/llvm/test/Transforms/IndVarSimplify/lftr-pr41998.ll
@@ -13,7 +13,7 @@ define void @test_int(i32 %start, ptr %p) {
; CHECK-NEXT: [[I2:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[I2_INC:%.*]], [[LOOP]] ]
; CHECK-NEXT: [[I2_INC]] = add nuw nsw i32 [[I2]], 1
; CHECK-NEXT: store volatile i32 [[I2_INC]], ptr [[P:%.*]], align 4
-; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i32 [[I2_INC]] to i3
+; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc nuw nsw i32 [[I2_INC]] to i3
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i3 [[LFTR_WIDEIV]], [[TMP1]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[END:%.*]], label [[LOOP]]
; CHECK: end:
diff --git a/llvm/test/Transforms/IndVarSimplify/lftr.ll b/llvm/test/Transforms/IndVarSimplify/lftr.ll
index 5ee62ba357ab6..cfa4baa2d3b11 100644
--- a/llvm/test/Transforms/IndVarSimplify/lftr.ll
+++ b/llvm/test/Transforms/IndVarSimplify/lftr.ll
@@ -415,7 +415,7 @@ define void @wide_trip_count_test1(ptr %autoc,
; CHECK-NEXT: [[ADD3:%.*]] = fadd float [[TEMP2]], [[MUL]]
; CHECK-NEXT: store float [[ADD3]], ptr [[ARRAYIDX2]], align 4
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
-; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
+; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc nuw nsw i64 [[INDVARS_IV_NEXT]] to i32
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i32 [[LFTR_WIDEIV]], [[SUB]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT:%.*]]
; CHECK: for.end.loopexit:
diff --git a/llvm/test/Transforms/PhaseOrdering/AArch64/constraint-elimination-placement.ll b/llvm/test/Transforms/PhaseOrdering/AArch64/constraint-elimination-placement.ll
index bbdbd95c6017a..ddd98f21c36a4 100644
--- a/llvm/test/Transforms/PhaseOrdering/AArch64/constraint-elimination-placement.ll
+++ b/llvm/test/Transforms/PhaseOrdering/AArch64/constraint-elimination-placement.ll
@@ -33,8 +33,7 @@ define i1 @test_order_1(ptr %this, ptr noalias %other, i1 %tobool9.not, i32 %cal
; CHECK-NEXT: br i1 [[CMP44]], label [[FOR_BODY45]], label [[FOR_COND]]
; CHECK: for.inc57:
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], 1
-; CHECK-NEXT: [[TMP1:%.*]] = and i64 [[INDVARS_IV_NEXT]], 4294967295
-; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[TMP1]], 1
+; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV]], 0
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND41_PREHEADER_PREHEADER]], label [[FOR_COND41_PREHEADER]]
; CHECK: exit:
; CHECK-NEXT: ret i1 false
|
Co-authored-by: Luke Lau <luke_lau@icloud.com>
… Trunc Instruction" This reverts commit c6a39d8. # Conflicts: # llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
Hi @lukel97, I've updated the code. We found that if the initial value of theIV is within the range of the target type for the Trunc instruction, we can reasonably add the nsw/nuw flag to it. Could you take a look whenever you have a chance? |
As I mentioned in the discussion, this patch specifically addresses the masking issue found in common loop structures.
This patch resolves the masking issue by adding the nsw/nuw flags to the
trunc
instruction, allowing the InstCombinePass to subsequently remove that Trunc instruction.With this patch, the following common loop successfully undergoes 8 iterations of loop unrolling, resulting in a remarkable 2x performance improvement (without vectorization):
Additionally, we have validated the functional correctness and effectiveness of this patch through testing on the SPEC CPU2006 and SPEC CPU2017 benchmarks.
Thank you for considering this change!