-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[AMDGPU] SIFoldOperands check that shrunk op is valid #156298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Folding results in a global ABS64 reloc used in an add instruction that doesn't support 64 bit immediates. This can cause issues as the reloc overflows to the next instruction and corrupts it. Checking for a valid shrunk instruction and not performing the fold if invalid fixes the issue. This fixes llvm#153812
@llvm/pr-subscribers-backend-amdgpu Author: David Stuttard (dstutt) ChangesFull diff: https://github.com/llvm/llvm-project/pull/156298.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 3979e1e0c44aa..9bb94e6dd2ad4 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -868,6 +868,8 @@ bool SIFoldOperandsImpl::tryAddToFoldList(
// Make sure to get the 32-bit version of the commuted opcode.
unsigned MaybeCommutedOpc = MI->getOpcode();
Op32 = AMDGPU::getVOPe32(MaybeCommutedOpc);
+ if (TII->pseudoToMCOpcode(Op32) == -1)
+ return false;
}
appendFoldCandidate(FoldList, MI, CommuteOpNo, OpToFold, /*Commuted=*/true,
diff --git a/llvm/test/CodeGen/AMDGPU/fold-abs64.mir b/llvm/test/CodeGen/AMDGPU/fold-abs64.mir
new file mode 100644
index 0000000000000..d78456afc2be2
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/fold-abs64.mir
@@ -0,0 +1,41 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1250 -run-pass=si-fold-operands %s -o - | FileCheck %s
+
+--- |
+ @sym = external constant i32
+ define void @fn() { ret void }
+ define void @fn2() { ret void }
+...
+
+---
+name: fn
+tracksRegLiveness: true
+body: |
+ bb.0:
+ ; CHECK-LABEL: name: fn
+ ; CHECK: [[DEF:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[S_MOV_B64_:%[0-9]+]]:sreg_64 = S_MOV_B64 target-flags(amdgpu-abs64) @sym
+ ; CHECK-NEXT: [[V_ADD_CO_U32_e64_:%[0-9]+]]:vgpr_32, [[V_ADD_CO_U32_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 undef [[DEF]].sub0, undef [[S_MOV_B64_]].sub0, 0, implicit $exec
+ ; CHECK-NEXT: S_ENDPGM 0, implicit [[V_ADD_CO_U32_e64_]]
+ %0:vreg_64 = IMPLICIT_DEF
+ %1:sreg_64 = S_MOV_B64 target-flags(amdgpu-abs64) @sym
+ %2:vgpr_32, %3:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 undef %1.sub0, undef %0.sub0, 0, implicit $exec
+ S_ENDPGM 0, implicit %2
+...
+
+---
+name: fn2
+tracksRegLiveness: true
+body: |
+ bb.0:
+ ; CHECK-LABEL: name: fn2
+ ; CHECK: [[DEF:%[0-9]+]]:vreg_64 = IMPLICIT_DEF
+ ; CHECK-NEXT: [[V_ADD_CO_U32_e64_:%[0-9]+]]:vgpr_32, [[V_ADD_CO_U32_e64_1:%[0-9]+]]:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 591751049, undef [[DEF]].sub0, 0, implicit $exec
+ ; CHECK-NEXT: S_ENDPGM 0, implicit [[V_ADD_CO_U32_e64_]]
+ %0:vreg_64 = IMPLICIT_DEF
+ %1:sreg_64 = S_MOV_B64 4886718345
+ %2:vgpr_32, %3:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 undef %1.sub0, undef %0.sub0, 0, implicit $exec
+ S_ENDPGM 0, implicit %2
+...
+## NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+# CHECK: {{.*}}
|
@@ -868,6 +868,8 @@ bool SIFoldOperandsImpl::tryAddToFoldList( | |||
// Make sure to get the 32-bit version of the commuted opcode. | |||
unsigned MaybeCommutedOpc = MI->getOpcode(); | |||
Op32 = AMDGPU::getVOPe32(MaybeCommutedOpc); | |||
if (TII->pseudoToMCOpcode(Op32) == -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SIInstrInfo::commuteOpcode already checks this, is this being skipped somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the difference here is that it is trying to find the VOPe32 variant - which might not exist, even if the commuteOpcode did. (Part of the shrink).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check should be pulled into a helper function in SIInstrInfo instead of directly using pseudoToMCOpcode. We already have hasVALU32BitEncoding, but it returns a bool instead of the new opcode. Can you generalize that to return the shrunk opcode if valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A possible alternative fix is to implement this TODO, so that the operand is legal for the e64 instruction so this code won't even try to shrink it:
// TODO: Try to apply DefSubReg, for global address we can extract |
If that was done - would it make sense to keep the check anyway? Without it, the shrink code is by-passing checks. |
If that was done then we could probably replace the check with an assert, on the grounds that every supported target either has an e32 encoding of V_ADD_CO_U32 or it supports VOP3 literals. |
Folding results in a global ABS64 reloc used in an add instruction that doesn't
support 64 bit immediates.
This can cause issues as the reloc overflows to the next instruction
and corrupts it.
Checking for a valid shrunk instruction and not performing the fold if invalid
fixes the issue.
This fixes #153812