-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[VectorCombine] fix division by zero in foldSelectShuffle #156779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[VectorCombine] fix division by zero in foldSelectShuffle #156779
Conversation
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: David Berard (davidberard98) ChangesIn pytorch/pytorch#161371, we see that MaxElementsInVector can be 0, causing a division by zero in This will set MaxElementsInVector to 1 to avoid division by zero. Full diff: https://github.com/llvm/llvm-project/pull/156779.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 6e46547b15b2b..3a17332274617 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -3900,7 +3900,7 @@ bool VectorCombine::foldSelectShuffle(Instruction &I, bool FromReduction) {
unsigned ElementSize = VT->getElementType()->getPrimitiveSizeInBits();
unsigned MaxVectorSize =
TTI.getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector);
- unsigned MaxElementsInVector = MaxVectorSize / ElementSize;
+ unsigned MaxElementsInVector = std::max<unsigned>(1, MaxVectorSize / ElementSize);
// When there are multiple shufflevector operations on the same input,
// especially when the vector length is larger than the register size,
// identical shuffle patterns may occur across different groups of elements.
diff --git a/llvm/test/Transforms/VectorCombine/fold-select-shuffle.ll b/llvm/test/Transforms/VectorCombine/fold-select-shuffle.ll
new file mode 100644
index 0000000000000..e898689b8f61b
--- /dev/null
+++ b/llvm/test/Transforms/VectorCombine/fold-select-shuffle.ll
@@ -0,0 +1,21 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes=vector-combine -S < %s | FileCheck %s
+
+define ptx_kernel void @shuffle_ptx_i64() {
+; CHECK-LABEL: define ptx_kernel void @shuffle_ptx_i64() {
+; CHECK-NEXT: [[_LR_PH:.*:]]
+; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <8 x i64> zeroinitializer, <8 x i64> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i64> zeroinitializer, <8 x i64> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT: [[TMP2:%.*]] = or <8 x i64> [[TMP0]], [[TMP1]]
+; CHECK-NEXT: [[TMP3:%.*]] = shl <8 x i64> [[TMP0]], [[TMP1]]
+; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <8 x i64> [[TMP2]], <8 x i64> [[TMP3]], <8 x i32> <i32 8, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT: ret void
+;
+.lr.ph:
+ %0 = shufflevector <8 x i64> zeroinitializer, <8 x i64> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
+ %1 = shufflevector <8 x i64> zeroinitializer, <8 x i64> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
+ %2 = or <8 x i64> %0, %1
+ %3 = shl <8 x i64> %0, %1
+ %4 = shufflevector <8 x i64> %2, <8 x i64> %3, <8 x i32> <i32 8, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+ ret void
+}
|
In pytorch/pytorch#161371, we see that MaxElementsInVector can be 0, causing a division by zero in `AddShuffleMaskAdjustedCost`. This will set MaxElementsInVector to 1 to avoid division by zero.
1f7fc9d
to
0548cb2
Compare
@@ -3900,7 +3900,8 @@ bool VectorCombine::foldSelectShuffle(Instruction &I, bool FromReduction) { | |||
unsigned ElementSize = VT->getElementType()->getPrimitiveSizeInBits(); | |||
unsigned MaxVectorSize = | |||
TTI.getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector); | |||
unsigned MaxElementsInVector = MaxVectorSize / ElementSize; | |||
unsigned MaxElementsInVector = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we early exit if MaxElementsInVector <= 1
? The trivial case isn't profitable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 an early-out seems the better approach.
@@ -0,0 +1,21 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5 | |||
; RUN: opt -passes=vector-combine -S < %s | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
; RUN: opt -passes=vector-combine -S < %s | FileCheck %s | |
; RUN: opt -passes=vector-combine -mtriple=nvptx-- -S < %s | FileCheck %s |
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5 | ||
; RUN: opt -passes=vector-combine -S < %s | FileCheck %s | ||
|
||
define ptx_kernel void @shuffle_ptx_i64() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this test into the NVPTX
subdir. I cannot reproduce the issue with other targets (e.g., -mtriple=x86_64
).
llvm/test/Transforms/VectorCombine
NVPTX/
fold-select-shuffle.ll
lit.local.cfg
Content of NVPTX/lit.local.cfg
:
if not "NVPTX" in config.root.targets:
config.unsupported = True
In pytorch/pytorch#161371, we see that MaxElementsInVector can be 0, causing a division by zero in
AddShuffleMaskAdjustedCost
.This will set MaxElementsInVector to 1 to avoid division by zero.