Skip to content

Conversation

dstutt
Copy link
Collaborator

@dstutt dstutt commented Sep 1, 2025

Folding results in a global ABS64 reloc used in an add instruction that doesn't
support 64 bit immediates.
This can cause issues as the reloc overflows to the next instruction
and corrupts it.
Checking for a valid shrunk instruction and not performing the fold if invalid
fixes the issue.

This fixes #153812

Folding results in a global ABS64 reloc used in an add instruction that doesn't
support 64 bit immediates.
This can cause issues as the reloc overflows to the next instruction
and corrupts it.
Checking for a valid shrunk instruction and not performing the fold if invalid
fixes the issue.

This fixes llvm#153812
@dstutt dstutt requested a review from jayfoad September 1, 2025 09:17
@dstutt dstutt changed the title Shrink fold fix [AMDGPU] SIFoldOperands check that shrunk op is valid Sep 1, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 1, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: David Stuttard (dstutt)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/156298.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (+2)
  • (added) llvm/test/CodeGen/AMDGPU/fold-abs64.mir (+41)
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 3979e1e0c44aa..9bb94e6dd2ad4 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -868,6 +868,8 @@ bool SIFoldOperandsImpl::tryAddToFoldList(
       // Make sure to get the 32-bit version of the commuted opcode.
       unsigned MaybeCommutedOpc = MI->getOpcode();
       Op32 = AMDGPU::getVOPe32(MaybeCommutedOpc);
+      if (TII->pseudoToMCOpcode(Op32) == -1)
+        return false;
     }
 
     appendFoldCandidate(FoldList, MI, CommuteOpNo, OpToFold, /*Commuted=*/true,
diff --git a/llvm/test/CodeGen/AMDGPU/fold-abs64.mir b/llvm/test/CodeGen/AMDGPU/fold-abs64.mir
new file mode 100644
index 0000000000000..d78456afc2be2
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/fold-abs64.mir
@@ -0,0 +1,41 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1250 -run-pass=si-fold-operands %s -o - | FileCheck %s
+
+--- |
+  @sym = external constant i32
+  define void @fn() { ret void }
+  define void @fn2() { ret void }
+...
+
+---
+name: fn
+tracksRegLiveness: true
+body: |
+  bb.0:
+    ; CHECK-LABEL: name: fn
+    ; CHECK: [[DEF:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
+    ; CHECK-NEXT: [[S_MOV_B64_:%[0-9]+]]:sreg_64 = S_MOV_B64 target-flags(amdgpu-abs64) @sym
+    ; CHECK-NEXT: [[V_ADD_CO_U32_e64_:%[0-9]+]]:vgpr_32, [[V_ADD_CO_U32_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 undef [[DEF]].sub0, undef [[S_MOV_B64_]].sub0, 0, implicit $exec
+    ; CHECK-NEXT: S_ENDPGM 0, implicit [[V_ADD_CO_U32_e64_]]
+    %0:vreg_64 = IMPLICIT_DEF
+    %1:sreg_64 = S_MOV_B64 target-flags(amdgpu-abs64) @sym
+    %2:vgpr_32, %3:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 undef %1.sub0, undef %0.sub0, 0, implicit $exec
+    S_ENDPGM 0, implicit %2
+...
+
+---
+name: fn2
+tracksRegLiveness: true
+body: |
+  bb.0:
+    ; CHECK-LABEL: name: fn2
+    ; CHECK: [[DEF:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
+    ; CHECK-NEXT: [[V_ADD_CO_U32_e64_:%[0-9]+]]:vgpr_32, [[V_ADD_CO_U32_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 591751049, undef [[DEF]].sub0, 0, implicit $exec
+    ; CHECK-NEXT: S_ENDPGM 0, implicit [[V_ADD_CO_U32_e64_]]
+    %0:vreg_64 = IMPLICIT_DEF
+    %1:sreg_64 = S_MOV_B64 4886718345
+    %2:vgpr_32, %3:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 undef %1.sub0, undef %0.sub0, 0, implicit $exec
+    S_ENDPGM 0, implicit %2
+...
+## NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+# CHECK: {{.*}}

@@ -868,6 +868,8 @@ bool SIFoldOperandsImpl::tryAddToFoldList(
// Make sure to get the 32-bit version of the commuted opcode.
unsigned MaybeCommutedOpc = MI->getOpcode();
Op32 = AMDGPU::getVOPe32(MaybeCommutedOpc);
if (TII->pseudoToMCOpcode(Op32) == -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIInstrInfo::commuteOpcode already checks this, is this being skipped somehow?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the difference here is that it is trying to find the VOPe32 variant - which might not exist, even if the commuteOpcode did. (Part of the shrink).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check should be pulled into a helper function in SIInstrInfo instead of directly using pseudoToMCOpcode. We already have hasVALU32BitEncoding, but it returns a bool instead of the new opcode. Can you generalize that to return the shrunk opcode if valid?

@dstutt dstutt requested a review from arsenm September 3, 2025 08:06
Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible alternative fix is to implement this TODO, so that the operand is legal for the e64 instruction so this code won't even try to shrink it:

// TODO: Try to apply DefSubReg, for global address we can extract

@dstutt
Copy link
Collaborator Author

dstutt commented Sep 4, 2025

A possible alternative fix is to implement this TODO, so that the operand is legal for the e64 instruction so this code won't even try to shrink it:

// TODO: Try to apply DefSubReg, for global address we can extract

If that was done - would it make sense to keep the check anyway? Without it, the shrink code is by-passing checks.

@jayfoad
Copy link
Contributor

jayfoad commented Sep 4, 2025

A possible alternative fix is to implement this TODO, so that the operand is legal for the e64 instruction so this code won't even try to shrink it:

// TODO: Try to apply DefSubReg, for global address we can extract

If that was done - would it make sense to keep the check anyway? Without it, the shrink code is by-passing checks.

If that was done then we could probably replace the check with an assert, on the grounds that every supported target either has an e32 encoding of V_ADD_CO_U32 or it supports VOP3 literals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SIFoldOperands generates unsupported instruction V_ADD_CO_U32_e32
4 participants