Skip to content

Conversation

broxigarchen
Copy link
Contributor

@broxigarchen broxigarchen commented Aug 12, 2025

In true16 flow, we cannot simply replace v2f16 to its Lo16 when Lo == Hi in a vop3p packed inst, since the register size is mismatched. This trigger functional errors in the downstream branch and this is caused by illegal VGPR_32 = COPY VGPR_16 created by ISel and hit the rewrite virtual reg and coalescer pass

Correctly insert reg_sequence/s_mov in true16 flow

@broxigarchen broxigarchen changed the title fix true16 vop3p mod [AMDGPU][True16][CodeGen] insert vgpr32 for 16bit data type in vop3p insts Aug 12, 2025
@broxigarchen broxigarchen changed the title [AMDGPU][True16][CodeGen] insert vgpr32 for 16bit data type in vop3p insts [AMDGPU][True16][CodeGen] insert proper register for 16bit data type in vop3p insts Aug 12, 2025
@broxigarchen
Copy link
Contributor Author

Adding a test case

@broxigarchen broxigarchen marked this pull request as ready for review August 12, 2025 14:29
@broxigarchen broxigarchen requested a review from arsenm August 12, 2025 14:29
@llvmbot
Copy link
Member

llvmbot commented Aug 12, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Brox Chen (broxigarchen)

Changes

In true16 flow, we cannot simply replace Lo16 of a v2f16 when Lo == Hi in a vop3p packed inst, since the register size is mismatched. This causes a functional error that ISel insert a COPY from vpgr16 to vpgr32 and the Hi16 is discarded

Correctly insert reg_sequence/s_mov in true16 flow


Full diff: https://github.com/llvm/llvm-project/pull/153143.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp (+27-1)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
index 9d6584ad3faa0..b74c14a5565a9 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
@@ -3412,8 +3412,34 @@ bool AMDGPUDAGToDAGISel::SelectVOP3PMods(SDValue In, SDValue &Src,
       // Really a scalar input. Just select from the low half of the register to
       // avoid packing.
 
-      if (VecSize == 32 || VecSize == Lo.getValueSizeInBits()) {
+      if (VecSize == Lo.getValueSizeInBits()) {
         Src = Lo;
+      } else if (VecSize == 32) {
+        if (!Subtarget->useRealTrue16Insts()) {
+          Src = Lo;
+        } else {
+          SDLoc SL(In);
+
+          if (Lo->isDivergent()) {
+            SDValue Undef =
+                SDValue(CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, SL,
+                                               Lo.getValueType()),
+                        0);
+            const SDValue Ops[] = {
+                CurDAG->getTargetConstant(AMDGPU::VGPR_32RegClassID, SL,
+                                          MVT::i32),
+                Lo, CurDAG->getTargetConstant(AMDGPU::lo16, SL, MVT::i16),
+                Undef, CurDAG->getTargetConstant(AMDGPU::hi16, SL, MVT::i16)};
+
+            Src = SDValue(CurDAG->getMachineNode(TargetOpcode::REG_SEQUENCE, SL,
+                                                 Src.getValueType(), Ops),
+                          0);
+          } else {
+            Src = SDValue(CurDAG->getMachineNode(AMDGPU::S_MOV_B32, SL,
+                                                 Src.getValueType(), Lo),
+                          0);
+          }
+        }
       } else {
         assert(Lo.getValueSizeInBits() == 32 && VecSize == 64);
 

Copy link
Contributor

@Sisyph Sisyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please add the test cases. Perhaps gfx11 runlines could be added to packed-op-sel.ll ?
That was the original test for this code I see.
And how about globalisel? Can you implement the same change there?

@broxigarchen
Copy link
Contributor Author

broxigarchen commented Aug 12, 2025

Yes please add the test cases. Perhaps gfx11 runlines could be added to packed-op-sel.ll ? That was the original test for this code I see. And how about globalisel? Can you implement the same change there?

Sure. The illegal copy should not impact the functional correctness by itself but somehow trigger the following pass to do wrong things. I am still trying to get the test for this but it's not very straightforward.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing tests

@broxigarchen broxigarchen force-pushed the main-fix-vop3p-modifier branch from d827eb2 to 855293c Compare August 13, 2025 16:28
@broxigarchen
Copy link
Contributor Author

broxigarchen commented Aug 13, 2025

Added a mir test.

It seems it requires quite a large ll test to trigger the error (this wrong behavior seems from rewrite virtual reg and coalescer pass) and it's quite difficult to locate and trim down the failing test case from the downstream branch.

But since we know this is caused by the vgpr_32 = copy vgpr_16 generated from isel, created a mir test instead. The purpose of this patch is to stop isel from generating this invalid copy.

For gisel, we might do it in a seperate patch.

@broxigarchen broxigarchen requested review from Sisyph and arsenm August 13, 2025 16:37
@broxigarchen broxigarchen force-pushed the main-fix-vop3p-modifier branch from 855293c to 7c0afc2 Compare August 13, 2025 16:42
@broxigarchen
Copy link
Contributor Author

ping!

This is blocking downstream branch so need some help to get this in asap. Thanks!

@broxigarchen broxigarchen force-pushed the main-fix-vop3p-modifier branch from 7c0afc2 to 5f65554 Compare August 14, 2025 05:18
Copy link
Contributor

@Sisyph Sisyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
For Gisel, creating a separate patch is ok.

@broxigarchen broxigarchen merged commit ec237da into llvm:main Aug 14, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants