[AMDGPU][True16][CodeGen] insert proper register for 16bit data type in vop3p insts #153143

broxigarchen · 2025-08-12T06:14:54Z

In true16 flow, we cannot simply replace v2f16 to its Lo16 when Lo == Hi in a vop3p packed inst, since the register size is mismatched. This trigger functional errors in the downstream branch and this is caused by illegal VGPR_32 = COPY VGPR_16 created by ISel and hit the rewrite virtual reg and coalescer pass

Correctly insert reg_sequence/s_mov in true16 flow

broxigarchen · 2025-08-12T06:46:24Z

Adding a test case

llvmbot · 2025-08-12T14:30:07Z

@llvm/pr-subscribers-backend-amdgpu

Author: Brox Chen (broxigarchen)

Changes

In true16 flow, we cannot simply replace Lo16 of a v2f16 when Lo == Hi in a vop3p packed inst, since the register size is mismatched. This causes a functional error that ISel insert a COPY from vpgr16 to vpgr32 and the Hi16 is discarded

Correctly insert reg_sequence/s_mov in true16 flow

Full diff: https://github.com/llvm/llvm-project/pull/153143.diff

1 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp (+27-1)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
index 9d6584ad3faa0..b74c14a5565a9 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
@@ -3412,8 +3412,34 @@ bool AMDGPUDAGToDAGISel::SelectVOP3PMods(SDValue In, SDValue &Src,
       // Really a scalar input. Just select from the low half of the register to
       // avoid packing.
 
-      if (VecSize == 32 || VecSize == Lo.getValueSizeInBits()) {
+      if (VecSize == Lo.getValueSizeInBits()) {
         Src = Lo;
+      } else if (VecSize == 32) {
+        if (!Subtarget->useRealTrue16Insts()) {
+          Src = Lo;
+        } else {
+          SDLoc SL(In);
+
+          if (Lo->isDivergent()) {
+            SDValue Undef =
+                SDValue(CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, SL,
+                                               Lo.getValueType()),
+                        0);
+            const SDValue Ops[] = {
+                CurDAG->getTargetConstant(AMDGPU::VGPR_32RegClassID, SL,
+                                          MVT::i32),
+                Lo, CurDAG->getTargetConstant(AMDGPU::lo16, SL, MVT::i16),
+                Undef, CurDAG->getTargetConstant(AMDGPU::hi16, SL, MVT::i16)};
+
+            Src = SDValue(CurDAG->getMachineNode(TargetOpcode::REG_SEQUENCE, SL,
+                                                 Src.getValueType(), Ops),
+                          0);
+          } else {
+            Src = SDValue(CurDAG->getMachineNode(AMDGPU::S_MOV_B32, SL,
+                                                 Src.getValueType(), Lo),
+                          0);
+          }
+        }
       } else {
         assert(Lo.getValueSizeInBits() == 32 && VecSize == 64);

Sisyph

Yes please add the test cases. Perhaps gfx11 runlines could be added to packed-op-sel.ll ?
That was the original test for this code I see.
And how about globalisel? Can you implement the same change there?

broxigarchen · 2025-08-12T21:09:28Z

Yes please add the test cases. Perhaps gfx11 runlines could be added to packed-op-sel.ll ? That was the original test for this code I see. And how about globalisel? Can you implement the same change there?

Sure. The illegal copy should not impact the functional correctness by itself but somehow trigger the following pass to do wrong things. I am still trying to get the test for this but it's not very straightforward.

arsenm

Missing tests

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

broxigarchen · 2025-08-13T16:36:06Z

Added a mir test.

It seems it requires quite a large ll test to trigger the error (this wrong behavior seems from rewrite virtual reg and coalescer pass) and it's quite difficult to locate and trim down the failing test case from the downstream branch.

But since we know this is caused by the vgpr_32 = copy vgpr_16 generated from isel, created a mir test instead. The purpose of this patch is to stop isel from generating this invalid copy.

For gisel, we might do it in a seperate patch.

broxigarchen · 2025-08-13T20:07:25Z

ping!

This is blocking downstream branch so need some help to get this in asap. Thanks!

Sisyph

LGTM!
For Gisel, creating a separate patch is ok.

broxigarchen changed the title ~~fix true16 vop3p mod~~ [AMDGPU][True16][CodeGen] insert vgpr32 for 16bit data type in vop3p insts Aug 12, 2025

broxigarchen changed the title ~~[AMDGPU][True16][CodeGen] insert vgpr32 for 16bit data type in vop3p insts~~ [AMDGPU][True16][CodeGen] insert proper register for 16bit data type in vop3p insts Aug 12, 2025

broxigarchen marked this pull request as ready for review August 12, 2025 14:29

broxigarchen requested a review from arsenm August 12, 2025 14:29

llvmbot added the backend:AMDGPU label Aug 12, 2025

broxigarchen requested review from Sisyph, kosarev and jayfoad August 12, 2025 14:29

Sisyph reviewed Aug 12, 2025

View reviewed changes

arsenm requested changes Aug 13, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp Outdated Show resolved Hide resolved

broxigarchen force-pushed the main-fix-vop3p-modifier branch from d827eb2 to 855293c Compare August 13, 2025 16:28

broxigarchen requested review from Sisyph and arsenm August 13, 2025 16:37

broxigarchen force-pushed the main-fix-vop3p-modifier branch from 855293c to 7c0afc2 Compare August 13, 2025 16:42

fix true16 vop3p mod

5f65554

broxigarchen force-pushed the main-fix-vop3p-modifier branch from 7c0afc2 to 5f65554 Compare August 14, 2025 05:18

Sisyph approved these changes Aug 14, 2025

View reviewed changes

broxigarchen merged commit ec237da into llvm:main Aug 14, 2025
9 checks passed

broxigarchen mentioned this pull request Aug 25, 2025

[AMDGPU][True16][Codegen] remove another build_vector pattern from true16 #149861

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU][True16][CodeGen] insert proper register for 16bit data type in vop3p insts #153143

[AMDGPU][True16][CodeGen] insert proper register for 16bit data type in vop3p insts #153143

Uh oh!

broxigarchen commented Aug 12, 2025 •

edited

Loading

Uh oh!

broxigarchen commented Aug 12, 2025

Uh oh!

llvmbot commented Aug 12, 2025

Uh oh!

Sisyph left a comment

Uh oh!

broxigarchen commented Aug 12, 2025 •

edited

Loading

Uh oh!

arsenm left a comment

Uh oh!

Uh oh!

broxigarchen commented Aug 13, 2025 •

edited

Loading

Uh oh!

broxigarchen commented Aug 13, 2025

Uh oh!

Sisyph left a comment

Uh oh!

Uh oh!

Uh oh!

[AMDGPU][True16][CodeGen] insert proper register for 16bit data type in vop3p insts #153143

[AMDGPU][True16][CodeGen] insert proper register for 16bit data type in vop3p insts #153143

Uh oh!

Conversation

broxigarchen commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

broxigarchen commented Aug 12, 2025

Uh oh!

llvmbot commented Aug 12, 2025

Uh oh!

Sisyph left a comment

Choose a reason for hiding this comment

Uh oh!

broxigarchen commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

broxigarchen commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

broxigarchen commented Aug 13, 2025

Uh oh!

Sisyph left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

broxigarchen commented Aug 12, 2025 •

edited

Loading

broxigarchen commented Aug 12, 2025 •

edited

Loading

broxigarchen commented Aug 13, 2025 •

edited

Loading