[AMDGPU] SIFoldOperands check that shrunk op is valid #156298

dstutt · 2025-09-01T09:17:55Z

Folding results in a global ABS64 reloc used in an add instruction that doesn't
support 64 bit immediates.
This can cause issues as the reloc overflows to the next instruction
and corrupts it.
Checking for a valid shrunk instruction and not performing the fold if invalid
fixes the issue.

This fixes #153812

Folding results in a global ABS64 reloc used in an add instruction that doesn't support 64 bit immediates. This can cause issues as the reloc overflows to the next instruction and corrupts it. Checking for a valid shrunk instruction and not performing the fold if invalid fixes the issue. This fixes llvm#153812

llvmbot · 2025-09-01T09:18:30Z

@llvm/pr-subscribers-backend-amdgpu

Author: David Stuttard (dstutt)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/156298.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (+2)
(added) llvm/test/CodeGen/AMDGPU/fold-abs64.mir (+41)

diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 3979e1e0c44aa..9bb94e6dd2ad4 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -868,6 +868,8 @@ bool SIFoldOperandsImpl::tryAddToFoldList(
       // Make sure to get the 32-bit version of the commuted opcode.
       unsigned MaybeCommutedOpc = MI->getOpcode();
       Op32 = AMDGPU::getVOPe32(MaybeCommutedOpc);
+      if (TII->pseudoToMCOpcode(Op32) == -1)
+        return false;
     }
 
     appendFoldCandidate(FoldList, MI, CommuteOpNo, OpToFold, /*Commuted=*/true,
diff --git a/llvm/test/CodeGen/AMDGPU/fold-abs64.mir b/llvm/test/CodeGen/AMDGPU/fold-abs64.mir
new file mode 100644
index 0000000000000..d78456afc2be2
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/fold-abs64.mir
@@ -0,0 +1,41 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1250 -run-pass=si-fold-operands %s -o - | FileCheck %s
+
+--- |
+  @sym = external constant i32
+  define void @fn() { ret void }
+  define void @fn2() { ret void }
+...
+
+---
+name: fn
+tracksRegLiveness: true
+body: |
+  bb.0:
+    ; CHECK-LABEL: name: fn
+    ; CHECK: [[DEF:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
+    ; CHECK-NEXT: [[S_MOV_B64_:%[0-9]+]]:sreg_64 = S_MOV_B64 target-flags(amdgpu-abs64) @sym
+    ; CHECK-NEXT: [[V_ADD_CO_U32_e64_:%[0-9]+]]:vgpr_32, [[V_ADD_CO_U32_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 undef [[DEF]].sub0, undef [[S_MOV_B64_]].sub0, 0, implicit $exec
+    ; CHECK-NEXT: S_ENDPGM 0, implicit [[V_ADD_CO_U32_e64_]]
+    %0:vreg_64 = IMPLICIT_DEF
+    %1:sreg_64 = S_MOV_B64 target-flags(amdgpu-abs64) @sym
+    %2:vgpr_32, %3:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 undef %1.sub0, undef %0.sub0, 0, implicit $exec
+    S_ENDPGM 0, implicit %2
+...
+
+---
+name: fn2
+tracksRegLiveness: true
+body: |
+  bb.0:
+    ; CHECK-LABEL: name: fn2
+    ; CHECK: [[DEF:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
+    ; CHECK-NEXT: [[V_ADD_CO_U32_e64_:%[0-9]+]]:vgpr_32, [[V_ADD_CO_U32_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 591751049, undef [[DEF]].sub0, 0, implicit $exec
+    ; CHECK-NEXT: S_ENDPGM 0, implicit [[V_ADD_CO_U32_e64_]]
+    %0:vreg_64 = IMPLICIT_DEF
+    %1:sreg_64 = S_MOV_B64 4886718345
+    %2:vgpr_32, %3:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 undef %1.sub0, undef %0.sub0, 0, implicit $exec
+    S_ENDPGM 0, implicit %2
+...
+## NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+# CHECK: {{.*}}

llvm/test/CodeGen/AMDGPU/fold-abs64.mir

arsenm · 2025-09-02T04:20:43Z

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

@@ -868,6 +868,8 @@ bool SIFoldOperandsImpl::tryAddToFoldList(
      // Make sure to get the 32-bit version of the commuted opcode.
      unsigned MaybeCommutedOpc = MI->getOpcode();
      Op32 = AMDGPU::getVOPe32(MaybeCommutedOpc);
+      if (TII->pseudoToMCOpcode(Op32) == -1)


SIInstrInfo::commuteOpcode already checks this, is this being skipped somehow?

I think the difference here is that it is trying to find the VOPe32 variant - which might not exist, even if the commuteOpcode did. (Part of the shrink).

I think this check should be pulled into a helper function in SIInstrInfo instead of directly using pseudoToMCOpcode. We already have hasVALU32BitEncoding, but it returns a bool instead of the new opcode. Can you generalize that to return the shrunk opcode if valid?

jayfoad

A possible alternative fix is to implement this TODO, so that the operand is legal for the e64 instruction so this code won't even try to shrink it:

llvm-project/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

Line 133 in be1e50f

// TODO: Try to apply DefSubReg, for global address we can extract

dstutt · 2025-09-04T07:38:58Z

A possible alternative fix is to implement this TODO, so that the operand is legal for the e64 instruction so this code won't even try to shrink it:

llvm-project/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

Line 133 in be1e50f

// TODO: Try to apply DefSubReg, for global address we can extract

If that was done - would it make sense to keep the check anyway? Without it, the shrink code is by-passing checks.

jayfoad · 2025-09-04T08:33:31Z

A possible alternative fix is to implement this TODO, so that the operand is legal for the e64 instruction so this code won't even try to shrink it:

llvm-project/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

Line 133 in be1e50f

// TODO: Try to apply DefSubReg, for global address we can extract

If that was done - would it make sense to keep the check anyway? Without it, the shrink code is by-passing checks.

If that was done then we could probably replace the check with an assert, on the grounds that every supported target either has an e32 encoding of V_ADD_CO_U32 or it supports VOP3 literals.

dstutt added 2 commits September 1, 2025 09:49

[AMDGPU] Pre-commit test for folding abs64 error

28cc8d8

dstutt requested a review from jayfoad September 1, 2025 09:17

llvmbot added the backend:AMDGPU label Sep 1, 2025

dstutt changed the title ~~Shrink fold fix~~ [AMDGPU] SIFoldOperands check that shrunk op is valid Sep 1, 2025

arsenm reviewed Sep 2, 2025

View reviewed changes

llvm/test/CodeGen/AMDGPU/fold-abs64.mir Outdated Show resolved Hide resolved

arsenm reviewed Sep 2, 2025

View reviewed changes

Fix test

78798d4

dstutt requested a review from arsenm September 3, 2025 08:06

jayfoad reviewed Sep 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] SIFoldOperands check that shrunk op is valid #156298

[AMDGPU] SIFoldOperands check that shrunk op is valid #156298

Uh oh!

dstutt commented Sep 1, 2025 •

edited

Loading

Uh oh!

llvmbot commented Sep 1, 2025

Uh oh!

Uh oh!

arsenm Sep 2, 2025

Uh oh!

dstutt Sep 2, 2025

Uh oh!

arsenm Sep 3, 2025

Uh oh!

jayfoad left a comment

Uh oh!

dstutt commented Sep 4, 2025

Uh oh!

jayfoad commented Sep 4, 2025

Uh oh!

Uh oh!

[AMDGPU] SIFoldOperands check that shrunk op is valid #156298

Are you sure you want to change the base?

[AMDGPU] SIFoldOperands check that shrunk op is valid #156298

Uh oh!

Conversation

dstutt commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 1, 2025

Uh oh!

Uh oh!

arsenm Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

dstutt Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

jayfoad left a comment

Choose a reason for hiding this comment

Uh oh!

dstutt commented Sep 4, 2025

Uh oh!

jayfoad commented Sep 4, 2025

Uh oh!

Uh oh!

dstutt commented Sep 1, 2025 •

edited

Loading