-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[AArch64][SDAG] Lower f16->s16 FP_TO_INT_SAT to *v1f16 #154822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: Kajetan Puchalski (mrkajetanp) ChangesConversions from f16 to s16 performed by FP_TO_INT_SAT can be done directly within FPRs, e.g. Related to #154343. Full diff: https://github.com/llvm/llvm-project/pull/154822.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index d168cc8d1bd06..f7cb76ed64b13 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -4912,6 +4912,24 @@ SDValue AArch64TargetLowering::LowerFP_TO_INT_SAT(SDValue Op,
if (DstWidth < SatWidth)
return SDValue();
+ if (SrcVT == MVT::f16 && SatVT == MVT::i16 && DstVT == MVT::i32) {
+ auto Opcode = (Op.getOpcode() == ISD::FP_TO_SINT_SAT)
+ ? AArch64::FCVTZSv1f16
+ : AArch64::FCVTZUv1f16;
+ auto Cvt = SDValue(DAG.getMachineNode(Opcode, DL, MVT::f16, SrcVal), 0);
+ auto Sign = DAG.getTargetConstant(-1, DL, MVT::i64);
+ auto Hsub = DAG.getTargetConstant(AArch64::hsub, DL, MVT::i32);
+ auto SubregToReg =
+ SDValue(DAG.getMachineNode(TargetOpcode::SUBREG_TO_REG, DL, MVT::v8f16,
+ Sign, Cvt, Hsub),
+ 0);
+ auto Ssub = DAG.getTargetConstant(AArch64::ssub, DL, MVT::i32);
+ auto Extract = SDValue(DAG.getMachineNode(TargetOpcode::EXTRACT_SUBREG, DL,
+ MVT::f32, SubregToReg, Ssub),
+ 0);
+ return DAG.getBitcast(MVT::i32, Extract);
+ }
+
SDValue NativeCvt =
DAG.getNode(Op.getOpcode(), DL, DstVT, SrcVal, DAG.getValueType(DstVT));
SDValue Sat;
diff --git a/llvm/test/CodeGen/AArch64/fptosi-sat-scalar.ll b/llvm/test/CodeGen/AArch64/fptosi-sat-scalar.ll
index e3aef487890f9..a5f6ac628403c 100644
--- a/llvm/test/CodeGen/AArch64/fptosi-sat-scalar.ll
+++ b/llvm/test/CodeGen/AArch64/fptosi-sat-scalar.ll
@@ -670,13 +670,8 @@ define i16 @test_signed_i16_f16(half %f) nounwind {
;
; CHECK-SD-FP16-LABEL: test_signed_i16_f16:
; CHECK-SD-FP16: // %bb.0:
-; CHECK-SD-FP16-NEXT: fcvtzs w8, h0
-; CHECK-SD-FP16-NEXT: mov w9, #32767 // =0x7fff
-; CHECK-SD-FP16-NEXT: cmp w8, w9
-; CHECK-SD-FP16-NEXT: csel w8, w8, w9, lt
-; CHECK-SD-FP16-NEXT: mov w9, #-32768 // =0xffff8000
-; CHECK-SD-FP16-NEXT: cmn w8, #8, lsl #12 // =32768
-; CHECK-SD-FP16-NEXT: csel w0, w8, w9, gt
+; CHECK-SD-FP16-NEXT: fcvtzs h0, h0
+; CHECK-SD-FP16-NEXT: fmov w0, s0
; CHECK-SD-FP16-NEXT: ret
;
; CHECK-GI-CVT-LABEL: test_signed_i16_f16:
diff --git a/llvm/test/CodeGen/AArch64/fptoui-sat-scalar.ll b/llvm/test/CodeGen/AArch64/fptoui-sat-scalar.ll
index 07e49e331415e..2613f8337a918 100644
--- a/llvm/test/CodeGen/AArch64/fptoui-sat-scalar.ll
+++ b/llvm/test/CodeGen/AArch64/fptoui-sat-scalar.ll
@@ -531,10 +531,8 @@ define i16 @test_unsigned_i16_f16(half %f) nounwind {
;
; CHECK-SD-FP16-LABEL: test_unsigned_i16_f16:
; CHECK-SD-FP16: // %bb.0:
-; CHECK-SD-FP16-NEXT: fcvtzu w8, h0
-; CHECK-SD-FP16-NEXT: mov w9, #65535 // =0xffff
-; CHECK-SD-FP16-NEXT: cmp w8, w9
-; CHECK-SD-FP16-NEXT: csel w0, w8, w9, lo
+; CHECK-SD-FP16-NEXT: fcvtzu h0, h0
+; CHECK-SD-FP16-NEXT: fmov w0, s0
; CHECK-SD-FP16-NEXT: ret
;
; CHECK-GI-CVT-LABEL: test_unsigned_i16_f16:
|
It feels like there should be a better way to do this, but SDAG was really unhappy with me trying to do anything that involved having an |
Conversions from f16 to s16 performed by FP_TO_INT_SAT can be done directly within FPRs, e.g. `fcvtzs h0, h0`. Generating this format reduces the number of instruction required for correct behaviour, as it sidesteps the issues with incorrect saturation that arise when using `fcvtzs w0, h0` for the same casts. Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
5bf55ab
to
604902c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM
Conversions from f16 to s16 performed by FP_TO_INT_SAT can be done directly within FPRs, e.g.
fcvtzs h0, h0
.Generating this format reduces the number of instruction required for correct behaviour, as it sidesteps the issues with incorrect saturation that arise when using
fcvtzs w0, h0
for the same casts.Add new AArch64ISD::FCVTZS_HALF and AArch64ISD::FCVTZU_HALF nodes to represent the necessary instruction sequence.
Related to #154343.