Skip to content

Conversation

davidberard98
Copy link
Contributor

In pytorch/pytorch#161371, we see that MaxElementsInVector can be 0, causing a division by zero in AddShuffleMaskAdjustedCost.

This will set MaxElementsInVector to 1 to avoid division by zero.

@llvmbot
Copy link
Member

llvmbot commented Sep 4, 2025

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: David Berard (davidberard98)

Changes

In pytorch/pytorch#161371, we see that MaxElementsInVector can be 0, causing a division by zero in AddShuffleMaskAdjustedCost.

This will set MaxElementsInVector to 1 to avoid division by zero.


Full diff: https://github.com/llvm/llvm-project/pull/156779.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VectorCombine.cpp (+1-1)
  • (added) llvm/test/Transforms/VectorCombine/fold-select-shuffle.ll (+21)
diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 6e46547b15b2b..3a17332274617 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -3900,7 +3900,7 @@ bool VectorCombine::foldSelectShuffle(Instruction &I, bool FromReduction) {
   unsigned ElementSize = VT->getElementType()->getPrimitiveSizeInBits();
   unsigned MaxVectorSize =
       TTI.getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector);
-  unsigned MaxElementsInVector = MaxVectorSize / ElementSize;
+  unsigned MaxElementsInVector = std::max<unsigned>(1, MaxVectorSize / ElementSize);
   // When there are multiple shufflevector operations on the same input,
   // especially when the vector length is larger than the register size,
   // identical shuffle patterns may occur across different groups of elements.
diff --git a/llvm/test/Transforms/VectorCombine/fold-select-shuffle.ll b/llvm/test/Transforms/VectorCombine/fold-select-shuffle.ll
new file mode 100644
index 0000000000000..e898689b8f61b
--- /dev/null
+++ b/llvm/test/Transforms/VectorCombine/fold-select-shuffle.ll
@@ -0,0 +1,21 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes=vector-combine -S < %s | FileCheck %s
+
+define ptx_kernel void @shuffle_ptx_i64() {
+; CHECK-LABEL: define ptx_kernel void @shuffle_ptx_i64() {
+; CHECK-NEXT:  [[_LR_PH:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = shufflevector <8 x i64> zeroinitializer, <8 x i64> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <8 x i64> zeroinitializer, <8 x i64> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT:    [[TMP2:%.*]] = or <8 x i64> [[TMP0]], [[TMP1]]
+; CHECK-NEXT:    [[TMP3:%.*]] = shl <8 x i64> [[TMP0]], [[TMP1]]
+; CHECK-NEXT:    [[TMP4:%.*]] = shufflevector <8 x i64> [[TMP2]], <8 x i64> [[TMP3]], <8 x i32> <i32 8, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT:    ret void
+;
+.lr.ph:
+  %0 = shufflevector <8 x i64> zeroinitializer, <8 x i64> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
+  %1 = shufflevector <8 x i64> zeroinitializer, <8 x i64> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 4, i32 5, i32 6, i32 7>
+  %2 = or <8 x i64> %0, %1
+  %3 = shl <8 x i64> %0, %1
+  %4 = shufflevector <8 x i64> %2, <8 x i64> %3, <8 x i32> <i32 8, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+  ret void
+}

In pytorch/pytorch#161371, we see that MaxElementsInVector can be 0, causing a division by zero in `AddShuffleMaskAdjustedCost`.

This will set MaxElementsInVector to 1 to avoid division by zero.
@davidberard98 davidberard98 force-pushed the divide-zero-fold-select-shuffle branch from 1f7fc9d to 0548cb2 Compare September 4, 2025 00:17
@dtcxzyw dtcxzyw requested a review from RKSimon September 4, 2025 02:24
@@ -3900,7 +3900,8 @@ bool VectorCombine::foldSelectShuffle(Instruction &I, bool FromReduction) {
unsigned ElementSize = VT->getElementType()->getPrimitiveSizeInBits();
unsigned MaxVectorSize =
TTI.getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector);
unsigned MaxElementsInVector = MaxVectorSize / ElementSize;
unsigned MaxElementsInVector =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we early exit if MaxElementsInVector <= 1? The trivial case isn't profitable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 an early-out seems the better approach.

@@ -0,0 +1,21 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
; RUN: opt -passes=vector-combine -S < %s | FileCheck %s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
; RUN: opt -passes=vector-combine -S < %s | FileCheck %s
; RUN: opt -passes=vector-combine -mtriple=nvptx-- -S < %s | FileCheck %s

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
; RUN: opt -passes=vector-combine -S < %s | FileCheck %s

define ptx_kernel void @shuffle_ptx_i64() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this test into the NVPTX subdir. I cannot reproduce the issue with other targets (e.g., -mtriple=x86_64).

llvm/test/Transforms/VectorCombine
  NVPTX/
    fold-select-shuffle.ll
    lit.local.cfg

Content of NVPTX/lit.local.cfg:

if not "NVPTX" in config.root.targets:
    config.unsupported = True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants