-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[AMDGPU][True16][CodeGen] insert proper register for 16bit data type in vop3p insts #153143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adding a test case |
@llvm/pr-subscribers-backend-amdgpu Author: Brox Chen (broxigarchen) ChangesIn true16 flow, we cannot simply replace Lo16 of a v2f16 when Lo == Hi in a vop3p packed inst, since the register size is mismatched. This causes a functional error that ISel insert a COPY from vpgr16 to vpgr32 and the Hi16 is discarded Correctly insert reg_sequence/s_mov in true16 flow Full diff: https://github.com/llvm/llvm-project/pull/153143.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
index 9d6584ad3faa0..b74c14a5565a9 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
@@ -3412,8 +3412,34 @@ bool AMDGPUDAGToDAGISel::SelectVOP3PMods(SDValue In, SDValue &Src,
// Really a scalar input. Just select from the low half of the register to
// avoid packing.
- if (VecSize == 32 || VecSize == Lo.getValueSizeInBits()) {
+ if (VecSize == Lo.getValueSizeInBits()) {
Src = Lo;
+ } else if (VecSize == 32) {
+ if (!Subtarget->useRealTrue16Insts()) {
+ Src = Lo;
+ } else {
+ SDLoc SL(In);
+
+ if (Lo->isDivergent()) {
+ SDValue Undef =
+ SDValue(CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF, SL,
+ Lo.getValueType()),
+ 0);
+ const SDValue Ops[] = {
+ CurDAG->getTargetConstant(AMDGPU::VGPR_32RegClassID, SL,
+ MVT::i32),
+ Lo, CurDAG->getTargetConstant(AMDGPU::lo16, SL, MVT::i16),
+ Undef, CurDAG->getTargetConstant(AMDGPU::hi16, SL, MVT::i16)};
+
+ Src = SDValue(CurDAG->getMachineNode(TargetOpcode::REG_SEQUENCE, SL,
+ Src.getValueType(), Ops),
+ 0);
+ } else {
+ Src = SDValue(CurDAG->getMachineNode(AMDGPU::S_MOV_B32, SL,
+ Src.getValueType(), Lo),
+ 0);
+ }
+ }
} else {
assert(Lo.getValueSizeInBits() == 32 && VecSize == 64);
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please add the test cases. Perhaps gfx11 runlines could be added to packed-op-sel.ll ?
That was the original test for this code I see.
And how about globalisel? Can you implement the same change there?
Sure. The illegal copy should not impact the functional correctness by itself but somehow trigger the following pass to do wrong things. I am still trying to get the test for this but it's not very straightforward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing tests
d827eb2
to
855293c
Compare
Added a mir test. It seems it requires quite a large ll test to trigger the error (this wrong behavior seems from rewrite virtual reg and coalescer pass) and it's quite difficult to locate and trim down the failing test case from the downstream branch. But since we know this is caused by the For gisel, we might do it in a seperate patch. |
855293c
to
7c0afc2
Compare
ping! This is blocking downstream branch so need some help to get this in asap. Thanks! |
7c0afc2
to
5f65554
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
For Gisel, creating a separate patch is ok.
In true16 flow, we cannot simply replace v2f16 to its Lo16 when Lo == Hi in a vop3p packed inst, since the register size is mismatched. This trigger functional errors in the downstream branch and this is caused by illegal
VGPR_32 = COPY VGPR_16
created by ISel and hit the rewrite virtual reg and coalescer passCorrectly insert reg_sequence/s_mov in true16 flow