Skip to content

Conversation

mrkajetanp
Copy link
Contributor

@mrkajetanp mrkajetanp commented Aug 21, 2025

Conversions from f16 to s16 performed by FP_TO_INT_SAT can be done directly within FPRs, e.g. fcvtzs h0, h0.
Generating this format reduces the number of instruction required for correct behaviour, as it sidesteps the issues with incorrect saturation that arise when using fcvtzs w0, h0 for the same casts.
Add new AArch64ISD::FCVTZS_HALF and AArch64ISD::FCVTZU_HALF nodes to represent the necessary instruction sequence.

Related to #154343.

@llvmbot
Copy link
Member

llvmbot commented Aug 21, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Kajetan Puchalski (mrkajetanp)

Changes

Conversions from f16 to s16 performed by FP_TO_INT_SAT can be done directly within FPRs, e.g. fcvtzs h0, h0.
Generating this format reduces the number of instruction required for correct behaviour, as it sidesteps the issues with incorrect saturation that arise when using fcvtzs w0, h0 for the same casts.

Related to #154343.


Full diff: https://github.com/llvm/llvm-project/pull/154822.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+18)
  • (modified) llvm/test/CodeGen/AArch64/fptosi-sat-scalar.ll (+2-7)
  • (modified) llvm/test/CodeGen/AArch64/fptoui-sat-scalar.ll (+2-4)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index d168cc8d1bd06..f7cb76ed64b13 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -4912,6 +4912,24 @@ SDValue AArch64TargetLowering::LowerFP_TO_INT_SAT(SDValue Op,
   if (DstWidth < SatWidth)
     return SDValue();
 
+  if (SrcVT == MVT::f16 && SatVT == MVT::i16 && DstVT == MVT::i32) {
+    auto Opcode = (Op.getOpcode() == ISD::FP_TO_SINT_SAT)
+                      ? AArch64::FCVTZSv1f16
+                      : AArch64::FCVTZUv1f16;
+    auto Cvt = SDValue(DAG.getMachineNode(Opcode, DL, MVT::f16, SrcVal), 0);
+    auto Sign = DAG.getTargetConstant(-1, DL, MVT::i64);
+    auto Hsub = DAG.getTargetConstant(AArch64::hsub, DL, MVT::i32);
+    auto SubregToReg =
+        SDValue(DAG.getMachineNode(TargetOpcode::SUBREG_TO_REG, DL, MVT::v8f16,
+                                   Sign, Cvt, Hsub),
+                0);
+    auto Ssub = DAG.getTargetConstant(AArch64::ssub, DL, MVT::i32);
+    auto Extract = SDValue(DAG.getMachineNode(TargetOpcode::EXTRACT_SUBREG, DL,
+                                              MVT::f32, SubregToReg, Ssub),
+                           0);
+    return DAG.getBitcast(MVT::i32, Extract);
+  }
+
   SDValue NativeCvt =
       DAG.getNode(Op.getOpcode(), DL, DstVT, SrcVal, DAG.getValueType(DstVT));
   SDValue Sat;
diff --git a/llvm/test/CodeGen/AArch64/fptosi-sat-scalar.ll b/llvm/test/CodeGen/AArch64/fptosi-sat-scalar.ll
index e3aef487890f9..a5f6ac628403c 100644
--- a/llvm/test/CodeGen/AArch64/fptosi-sat-scalar.ll
+++ b/llvm/test/CodeGen/AArch64/fptosi-sat-scalar.ll
@@ -670,13 +670,8 @@ define i16 @test_signed_i16_f16(half %f) nounwind {
 ;
 ; CHECK-SD-FP16-LABEL: test_signed_i16_f16:
 ; CHECK-SD-FP16:       // %bb.0:
-; CHECK-SD-FP16-NEXT:    fcvtzs w8, h0
-; CHECK-SD-FP16-NEXT:    mov w9, #32767 // =0x7fff
-; CHECK-SD-FP16-NEXT:    cmp w8, w9
-; CHECK-SD-FP16-NEXT:    csel w8, w8, w9, lt
-; CHECK-SD-FP16-NEXT:    mov w9, #-32768 // =0xffff8000
-; CHECK-SD-FP16-NEXT:    cmn w8, #8, lsl #12 // =32768
-; CHECK-SD-FP16-NEXT:    csel w0, w8, w9, gt
+; CHECK-SD-FP16-NEXT:    fcvtzs h0, h0
+; CHECK-SD-FP16-NEXT:    fmov w0, s0
 ; CHECK-SD-FP16-NEXT:    ret
 ;
 ; CHECK-GI-CVT-LABEL: test_signed_i16_f16:
diff --git a/llvm/test/CodeGen/AArch64/fptoui-sat-scalar.ll b/llvm/test/CodeGen/AArch64/fptoui-sat-scalar.ll
index 07e49e331415e..2613f8337a918 100644
--- a/llvm/test/CodeGen/AArch64/fptoui-sat-scalar.ll
+++ b/llvm/test/CodeGen/AArch64/fptoui-sat-scalar.ll
@@ -531,10 +531,8 @@ define i16 @test_unsigned_i16_f16(half %f) nounwind {
 ;
 ; CHECK-SD-FP16-LABEL: test_unsigned_i16_f16:
 ; CHECK-SD-FP16:       // %bb.0:
-; CHECK-SD-FP16-NEXT:    fcvtzu w8, h0
-; CHECK-SD-FP16-NEXT:    mov w9, #65535 // =0xffff
-; CHECK-SD-FP16-NEXT:    cmp w8, w9
-; CHECK-SD-FP16-NEXT:    csel w0, w8, w9, lo
+; CHECK-SD-FP16-NEXT:    fcvtzu h0, h0
+; CHECK-SD-FP16-NEXT:    fmov w0, s0
 ; CHECK-SD-FP16-NEXT:    ret
 ;
 ; CHECK-GI-CVT-LABEL: test_unsigned_i16_f16:

@mrkajetanp
Copy link
Contributor Author

It feels like there should be a better way to do this, but SDAG was really unhappy with me trying to do anything that involved having an i16 type on any operation, so this was the only way I've found (so far) of having it not crash.

@mrkajetanp mrkajetanp requested a review from davemgreen August 26, 2025 14:05
Conversions from f16 to s16 performed by FP_TO_INT_SAT can be done
directly within FPRs, e.g. `fcvtzs h0, h0`.
Generating this format reduces the number of instruction required for
correct behaviour, as it sidesteps the issues with incorrect saturation
that arise when using `fcvtzs w0, h0` for the same casts.

Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mrkajetanp mrkajetanp requested a review from davemgreen August 28, 2025 12:40
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM

@mrkajetanp mrkajetanp merged commit 6dd67f8 into llvm:main Aug 28, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants