[AArch64] Give a higher cost for more expensive SVE FCMP instructions #153816

davemgreen · 2025-08-15T14:54:32Z

This tries to add a higher cost for SVE FCM** comparison instructions that often have a lower throughput than the Neon equivalents that can be exectured on more vector pipelines.

This patch takes the slightly unorthodox approach of using the information in the scheduling model to compare the throughput of a FCMEQ_PPzZZ_S (SVE) and a FCMEQv4f32 (Neon). This isn't how things will (probably) want to work in the long run, where all the information comes more directly from the scheduling model, but that still needs to be proven out. The downsides of this approach of using the scheduling model info is if the core does not have a scheduling model but wants a different cost - then an alternative approach will be needed (but then maybe that is a good reason to create a new scheduling model).

The alternative would either be to make a subtarget feature for the affected cores (as in 17bf27d) or just always enable it, although that might need to change in the future.

llvmbot · 2025-08-15T14:55:02Z

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-backend-aarch64

Author: David Green (davemgreen)

Changes

This tries to add a higher cost for SVE FCM** comparison instructions that often have a lower throughput than the Neon equivalents that can be exectured on more vector pipelines.

This patch takes the slightly unorthodox approach of using the information in the scheduling model to compare the throughput of a FCMEQ_PPzZZ_S (SVE) and a FCMEQv4f32 (Neon). This isn't how things will (probably) want to look in the long run, where all the information comes more directly from the scheduling model, but that still needs to be proven out. The downsides of this approach of using the scheduling model info is if the core does not have a scheduling model, but wants a different cost then an alternative approach will be needed (but then maybe that is a good reason to create a new scheduling model).

The alternative would either be to make a subtarget feature for the affected cores (as in 17bf27d) or just always enable it, although that might need to change in the future.

Patch is 365.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/153816.diff

16 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64SchedA510.td (+1-1)
(modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+34)
(modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h (+5)
(modified) llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll (+5-5)
(modified) llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll (+197-197)
(modified) llvm/test/CodeGen/AArch64/sve-bf16-converts.ll (+2-2)
(modified) llvm/test/CodeGen/AArch64/sve-fptosi-sat.ll (+50-50)
(modified) llvm/test/CodeGen/AArch64/sve-fptoui-sat.ll (+33-33)
(modified) llvm/test/CodeGen/AArch64/sve-llrint.ll (+1030-1031)
(modified) llvm/test/CodeGen/AArch64/sve-lrint.ll (+1030-1031)
(modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-compares.ll (+21-21)
(modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-ptest.ll (+74-74)
(modified) llvm/test/CodeGen/AArch64/sve2-bf16-converts.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll (+9-16)
(modified) llvm/test/Transforms/SLPVectorizer/AArch64/slp-fma-loss.ll (+7-7)
(modified) llvm/test/tools/llvm-mca/AArch64/Cortex/A510-sve-instructions.s (+103-103)

diff --git a/llvm/lib/Target/AArch64/AArch64SchedA510.td b/llvm/lib/Target/AArch64/AArch64SchedA510.td
index b93d67f3091e7..356e3fa39c53f 100644
--- a/llvm/lib/Target/AArch64/AArch64SchedA510.td
+++ b/llvm/lib/Target/AArch64/AArch64SchedA510.td
@@ -1016,7 +1016,7 @@ def : InstRW<[CortexA510MCWrite<16, 13, CortexA510UnitVALU>], (instrs FADDA_VPZ_
 def : InstRW<[CortexA510MCWrite<8, 5, CortexA510UnitVALU>], (instrs FADDA_VPZ_D)>;
 
 // Floating point compare
-def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FACG[ET]_PPzZZ_[HSD]",
+def : InstRW<[CortexA510MCWrite<4, 2, CortexA510UnitVALU>], (instregex "^FACG[ET]_PPzZZ_[HSD]",
                                             "^FCM(EQ|GE|GT|NE)_PPzZ[0Z]_[HSD]",
                                             "^FCM(LE|LT)_PPzZ0_[HSD]",
                                             "^FCMUO_PPzZZ_[HSD]")>;
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 3042251cf754d..3ec45178af68e 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -4365,6 +4365,34 @@ AArch64TTIImpl::getAddressComputationCost(Type *PtrTy, ScalarEvolution *SE,
   return 1;
 }
 
+/// Check whether Opcode1 has less throughput according to the scheduling
+/// model than Opcode2.
+bool AArch64TTIImpl::hasLessThroughputFromSchedulingModel(
+    unsigned Opcode1, unsigned Opcode2) const {
+  const MCSchedModel &Sched = ST->getSchedModel();
+  const TargetInstrInfo *TII = ST->getInstrInfo();
+  if (!Sched.hasInstrSchedModel())
+    return false;
+
+  auto ResolveVariant = [&](unsigned Opcode) {
+    unsigned SCIdx = TII->get(Opcode).getSchedClass();
+    while (SCIdx && Sched.getSchedClassDesc(SCIdx)->isVariant())
+      SCIdx = ST->resolveVariantSchedClass(SCIdx, nullptr, TII,
+                                           Sched.getProcessorID());
+    assert(SCIdx);
+    const MCSchedClassDesc &SCDesc = Sched.SchedClassTable[SCIdx];
+    const MCWriteProcResEntry *B = ST->getWriteProcResBegin(&SCDesc);
+    const MCWriteProcResEntry *E = ST->getWriteProcResEnd(&SCDesc);
+    unsigned Min = Sched.IssueWidth;
+    for (const MCWriteProcResEntry *I = B; I != E; I++)
+      Min = std::min(Min, Sched.getProcResource(I->ProcResourceIdx)->NumUnits /
+                              (I->ReleaseAtCycle - I->AcquireAtCycle));
+    return Min;
+  };
+
+  return ResolveVariant(Opcode1) < ResolveVariant(Opcode2);
+}
+
 InstructionCost AArch64TTIImpl::getCmpSelInstrCost(
     unsigned Opcode, Type *ValTy, Type *CondTy, CmpInst::Predicate VecPred,
     TTI::TargetCostKind CostKind, TTI::OperandValueInfo Op1Info,
@@ -4462,6 +4490,12 @@ InstructionCost AArch64TTIImpl::getCmpSelInstrCost(
              (VecPred == FCmpInst::FCMP_ONE || VecPred == FCmpInst::FCMP_UEQ))
       Factor = 3; // fcmxx+fcmyy+or
 
+    if (isa<ScalableVectorType>(ValTy) &&
+        CostKind == TTI::TCK_RecipThroughput &&
+        hasLessThroughputFromSchedulingModel(AArch64::FCMEQ_PPzZZ_S,
+                                             AArch64::FCMEQv4f32))
+      Factor *= 2;
+
     return Factor * (CostKind == TTI::TCK_Latency ? 2 : LT.first);
   }
 
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
index 9c96fdd427814..fe36e43c616c1 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
@@ -174,6 +174,11 @@ class AArch64TTIImpl final : public BasicTTIImplBase<AArch64TTIImpl> {
 
   bool prefersVectorizedAddressing() const override;
 
+  /// Check whether Opcode1 has less throughput according to the scheduling
+  /// model than Opcode2.
+  bool hasLessThroughputFromSchedulingModel(unsigned Opcode1,
+                                            unsigned Opcode2) const;
+
   InstructionCost
   getMaskedMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
                         unsigned AddressSpace,
diff --git a/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll b/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll
index 0789707a53b90..5029ef974681d 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll
@@ -58,10 +58,10 @@ define <vscale x 32 x i1> @cmp_nxv32i1() {
 ; Check fcmp for legal FP vectors
 define void @cmp_legal_fp() #0 {
 ; CHECK-LABEL: 'cmp_legal_fp'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %1 = fcmp oge <vscale x 2 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %2 = fcmp oge <vscale x 4 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %3 = fcmp oge <vscale x 8 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:11 CodeSize:5 Lat:5 SizeLat:5 for: %4 = fcmp oge <vscale x 8 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %1 = fcmp oge <vscale x 2 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %2 = fcmp oge <vscale x 4 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %3 = fcmp oge <vscale x 8 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:13 CodeSize:5 Lat:5 SizeLat:5 for: %4 = fcmp oge <vscale x 8 x bfloat> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %1 = fcmp oge <vscale x 2 x double> undef, undef
@@ -74,7 +74,7 @@ define void @cmp_legal_fp() #0 {
 ; Check fcmp for an illegal FP vector
 define <vscale x 16 x i1> @cmp_nxv16f16() {
 ; CHECK-LABEL: 'cmp_nxv16f16'
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %res = fcmp oge <vscale x 16 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %res = fcmp oge <vscale x 16 x half> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret <vscale x 16 x i1> %res
 ;
   %res = fcmp oge <vscale x 16 x half> undef, undef
diff --git a/llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll b/llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll
index a14176cd6d532..c48ca20949c21 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll
@@ -3,15 +3,15 @@
 
 define void @fcmp_oeq(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_oeq'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp oeq <vscale x 2 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp oeq <vscale x 4 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v8f32 = fcmp oeq <vscale x 8 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp oeq <vscale x 2 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v4f64 = fcmp oeq <vscale x 4 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp oeq <vscale x 2 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp oeq <vscale x 4 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp oeq <vscale x 8 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v16f16 = fcmp oeq <vscale x 16 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp oeq <vscale x 2 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp oeq <vscale x 4 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v8f32 = fcmp oeq <vscale x 8 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp oeq <vscale x 2 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v4f64 = fcmp oeq <vscale x 4 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp oeq <vscale x 2 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp oeq <vscale x 4 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp oeq <vscale x 8 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v16f16 = fcmp oeq <vscale x 16 x half> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2f32 = fcmp oeq <vscale x 2 x float> undef, undef
@@ -28,9 +28,9 @@ define void @fcmp_oeq(i32 %arg) {
 
 define void @fcmp_oeq_bfloat(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_oeq_bfloat'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp oeq <vscale x 2 x bfloat> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp oeq <vscale x 4 x bfloat> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:11 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp oeq <vscale x 8 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp oeq <vscale x 2 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp oeq <vscale x 4 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:13 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp oeq <vscale x 8 x bfloat> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2bf16 = fcmp oeq <vscale x 2 x bfloat> undef, undef
@@ -41,15 +41,15 @@ define void @fcmp_oeq_bfloat(i32 %arg) {
 
 define void @fcmp_ogt(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_ogt'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp ogt <vscale x 2 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp ogt <vscale x 4 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v8f32 = fcmp ogt <vscale x 8 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp ogt <vscale x 2 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v4f64 = fcmp ogt <vscale x 4 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp ogt <vscale x 2 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp ogt <vscale x 4 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp ogt <vscale x 8 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v16f16 = fcmp ogt <vscale x 16 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp ogt <vscale x 2 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp ogt <vscale x 4 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v8f32 = fcmp ogt <vscale x 8 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp ogt <vscale x 2 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v4f64 = fcmp ogt <vscale x 4 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp ogt <vscale x 2 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp ogt <vscale x 4 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp ogt <vscale x 8 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v16f16 = fcmp ogt <vscale x 16 x half> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2f32 = fcmp ogt <vscale x 2 x float> undef, undef
@@ -66,9 +66,9 @@ define void @fcmp_ogt(i32 %arg) {
 
 define void @fcmp_ogt_bfloat(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_ogt_bfloat'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp ogt <vscale x 2 x bfloat> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp ogt <vscale x 4 x bfloat> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:11 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp ogt <vscale x 8 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp ogt <vscale x 2 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp ogt <vscale x 4 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:13 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp ogt <vscale x 8 x bfloat> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2bf16 = fcmp ogt <vscale x 2 x bfloat> undef, undef
@@ -79,15 +79,15 @@ define void @fcmp_ogt_bfloat(i32 %arg) {
 
 define void @fcmp_oge(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_oge'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp oge <vscale x 2 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp oge <vscale x 4 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v8f32 = fcmp oge <vscale x 8 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp oge <vscale x 2 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v4f64 = fcmp oge <vscale x 4 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp oge <vscale x 2 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp oge <vscale x 4 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp oge <vscale x 8 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v16f16 = fcmp oge <vscale x 16 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp oge <vscale x 2 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp oge <vscale x 4 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v8f32 = fcmp oge <vscale x 8 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp oge <vscale x 2 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v4f64 = fcmp oge <vscale x 4 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp oge <vscale x 2 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp oge <vscale x 4 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp oge <vscale x 8 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v16f16 = fcmp oge <vscale x 16 x half> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2f32 = fcmp oge <vscale x 2 x float> undef, undef
@@ -104,9 +104,9 @@ define void @fcmp_oge(i32 %arg) {
 
 define void @fcmp_oge_bfloat(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_oge_bfloat'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp oge <vscale x 2 x bfloat> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp oge <vscale x 4 x bfloat> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:11 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp oge <vscale x 8 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp oge <vscale x 2 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp oge <vscale x 4 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:13 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp oge <vscale x 8 x bfloat> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2bf16 = fcmp oge <vscale x 2 x bfloat> undef, undef
@@ -117,15 +117,15 @@ define void @fcmp_oge_bfloat(i32 %arg) {
 
 define void @fcmp_olt(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_olt'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp olt <vscale x 2 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp olt <vscale x 4 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v8f32 = fcmp olt <vscale x 8 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp olt <vscale x 2 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v4f64 = fcmp olt <vscale x 4 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp olt <vscale x 2 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp olt <vscale x 4 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp olt <vscale x 8 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v16f16 = fcmp olt <vscale x 16 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp olt <vscale x 2 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp olt <vscale x 4 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v8f32 = fcmp olt <vscale x 8 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp olt <vscale x 2 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v4f64 = fcmp olt <vscale x 4 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp olt <vscale x 2 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp olt <vscale x 4 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp olt <vscale x 8 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v16f16 = fcmp olt <vscale x 16 x half> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2f32 = fcmp olt <vscale x 2 x float> undef, undef
@@ -142,9 +142,9 @@ define void @fcmp_olt(i32 %arg) {
 
 define void @fcmp_olt_bfloat(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_olt_bfloat'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp olt <vscale x 2 x bfloat> undef, unde...
[truncated]

llvmbot · 2025-08-15T14:55:02Z

@llvm/pr-subscribers-llvm-transforms

Author: David Green (davemgreen)

Changes

This tries to add a higher cost for SVE FCM** comparison instructions that often have a lower throughput than the Neon equivalents that can be exectured on more vector pipelines.

This patch takes the slightly unorthodox approach of using the information in the scheduling model to compare the throughput of a FCMEQ_PPzZZ_S (SVE) and a FCMEQv4f32 (Neon). This isn't how things will (probably) want to look in the long run, where all the information comes more directly from the scheduling model, but that still needs to be proven out. The downsides of this approach of using the scheduling model info is if the core does not have a scheduling model, but wants a different cost then an alternative approach will be needed (but then maybe that is a good reason to create a new scheduling model).

The alternative would either be to make a subtarget feature for the affected cores (as in 17bf27d) or just always enable it, although that might need to change in the future.

Patch is 365.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/153816.diff

16 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64SchedA510.td (+1-1)
(modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+34)
(modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h (+5)
(modified) llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll (+5-5)
(modified) llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll (+197-197)
(modified) llvm/test/CodeGen/AArch64/sve-bf16-converts.ll (+2-2)
(modified) llvm/test/CodeGen/AArch64/sve-fptosi-sat.ll (+50-50)
(modified) llvm/test/CodeGen/AArch64/sve-fptoui-sat.ll (+33-33)
(modified) llvm/test/CodeGen/AArch64/sve-llrint.ll (+1030-1031)
(modified) llvm/test/CodeGen/AArch64/sve-lrint.ll (+1030-1031)
(modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-compares.ll (+21-21)
(modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-ptest.ll (+74-74)
(modified) llvm/test/CodeGen/AArch64/sve2-bf16-converts.ll (+12-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/masked-call.ll (+9-16)
(modified) llvm/test/Transforms/SLPVectorizer/AArch64/slp-fma-loss.ll (+7-7)
(modified) llvm/test/tools/llvm-mca/AArch64/Cortex/A510-sve-instructions.s (+103-103)

diff --git a/llvm/lib/Target/AArch64/AArch64SchedA510.td b/llvm/lib/Target/AArch64/AArch64SchedA510.td
index b93d67f3091e7..356e3fa39c53f 100644
--- a/llvm/lib/Target/AArch64/AArch64SchedA510.td
+++ b/llvm/lib/Target/AArch64/AArch64SchedA510.td
@@ -1016,7 +1016,7 @@ def : InstRW<[CortexA510MCWrite<16, 13, CortexA510UnitVALU>], (instrs FADDA_VPZ_
 def : InstRW<[CortexA510MCWrite<8, 5, CortexA510UnitVALU>], (instrs FADDA_VPZ_D)>;
 
 // Floating point compare
-def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FACG[ET]_PPzZZ_[HSD]",
+def : InstRW<[CortexA510MCWrite<4, 2, CortexA510UnitVALU>], (instregex "^FACG[ET]_PPzZZ_[HSD]",
                                             "^FCM(EQ|GE|GT|NE)_PPzZ[0Z]_[HSD]",
                                             "^FCM(LE|LT)_PPzZ0_[HSD]",
                                             "^FCMUO_PPzZZ_[HSD]")>;
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 3042251cf754d..3ec45178af68e 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -4365,6 +4365,34 @@ AArch64TTIImpl::getAddressComputationCost(Type *PtrTy, ScalarEvolution *SE,
   return 1;
 }
 
+/// Check whether Opcode1 has less throughput according to the scheduling
+/// model than Opcode2.
+bool AArch64TTIImpl::hasLessThroughputFromSchedulingModel(
+    unsigned Opcode1, unsigned Opcode2) const {
+  const MCSchedModel &Sched = ST->getSchedModel();
+  const TargetInstrInfo *TII = ST->getInstrInfo();
+  if (!Sched.hasInstrSchedModel())
+    return false;
+
+  auto ResolveVariant = [&](unsigned Opcode) {
+    unsigned SCIdx = TII->get(Opcode).getSchedClass();
+    while (SCIdx && Sched.getSchedClassDesc(SCIdx)->isVariant())
+      SCIdx = ST->resolveVariantSchedClass(SCIdx, nullptr, TII,
+                                           Sched.getProcessorID());
+    assert(SCIdx);
+    const MCSchedClassDesc &SCDesc = Sched.SchedClassTable[SCIdx];
+    const MCWriteProcResEntry *B = ST->getWriteProcResBegin(&SCDesc);
+    const MCWriteProcResEntry *E = ST->getWriteProcResEnd(&SCDesc);
+    unsigned Min = Sched.IssueWidth;
+    for (const MCWriteProcResEntry *I = B; I != E; I++)
+      Min = std::min(Min, Sched.getProcResource(I->ProcResourceIdx)->NumUnits /
+                              (I->ReleaseAtCycle - I->AcquireAtCycle));
+    return Min;
+  };
+
+  return ResolveVariant(Opcode1) < ResolveVariant(Opcode2);
+}
+
 InstructionCost AArch64TTIImpl::getCmpSelInstrCost(
     unsigned Opcode, Type *ValTy, Type *CondTy, CmpInst::Predicate VecPred,
     TTI::TargetCostKind CostKind, TTI::OperandValueInfo Op1Info,
@@ -4462,6 +4490,12 @@ InstructionCost AArch64TTIImpl::getCmpSelInstrCost(
              (VecPred == FCmpInst::FCMP_ONE || VecPred == FCmpInst::FCMP_UEQ))
       Factor = 3; // fcmxx+fcmyy+or
 
+    if (isa<ScalableVectorType>(ValTy) &&
+        CostKind == TTI::TCK_RecipThroughput &&
+        hasLessThroughputFromSchedulingModel(AArch64::FCMEQ_PPzZZ_S,
+                                             AArch64::FCMEQv4f32))
+      Factor *= 2;
+
     return Factor * (CostKind == TTI::TCK_Latency ? 2 : LT.first);
   }
 
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
index 9c96fdd427814..fe36e43c616c1 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
@@ -174,6 +174,11 @@ class AArch64TTIImpl final : public BasicTTIImplBase<AArch64TTIImpl> {
 
   bool prefersVectorizedAddressing() const override;
 
+  /// Check whether Opcode1 has less throughput according to the scheduling
+  /// model than Opcode2.
+  bool hasLessThroughputFromSchedulingModel(unsigned Opcode1,
+                                            unsigned Opcode2) const;
+
   InstructionCost
   getMaskedMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
                         unsigned AddressSpace,
diff --git a/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll b/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll
index 0789707a53b90..5029ef974681d 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll
@@ -58,10 +58,10 @@ define <vscale x 32 x i1> @cmp_nxv32i1() {
 ; Check fcmp for legal FP vectors
 define void @cmp_legal_fp() #0 {
 ; CHECK-LABEL: 'cmp_legal_fp'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %1 = fcmp oge <vscale x 2 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %2 = fcmp oge <vscale x 4 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %3 = fcmp oge <vscale x 8 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:11 CodeSize:5 Lat:5 SizeLat:5 for: %4 = fcmp oge <vscale x 8 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %1 = fcmp oge <vscale x 2 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %2 = fcmp oge <vscale x 4 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %3 = fcmp oge <vscale x 8 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:13 CodeSize:5 Lat:5 SizeLat:5 for: %4 = fcmp oge <vscale x 8 x bfloat> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %1 = fcmp oge <vscale x 2 x double> undef, undef
@@ -74,7 +74,7 @@ define void @cmp_legal_fp() #0 {
 ; Check fcmp for an illegal FP vector
 define <vscale x 16 x i1> @cmp_nxv16f16() {
 ; CHECK-LABEL: 'cmp_nxv16f16'
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %res = fcmp oge <vscale x 16 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %res = fcmp oge <vscale x 16 x half> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret <vscale x 16 x i1> %res
 ;
   %res = fcmp oge <vscale x 16 x half> undef, undef
diff --git a/llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll b/llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll
index a14176cd6d532..c48ca20949c21 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll
@@ -3,15 +3,15 @@
 
 define void @fcmp_oeq(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_oeq'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp oeq <vscale x 2 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp oeq <vscale x 4 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v8f32 = fcmp oeq <vscale x 8 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp oeq <vscale x 2 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v4f64 = fcmp oeq <vscale x 4 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp oeq <vscale x 2 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp oeq <vscale x 4 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp oeq <vscale x 8 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v16f16 = fcmp oeq <vscale x 16 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp oeq <vscale x 2 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp oeq <vscale x 4 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v8f32 = fcmp oeq <vscale x 8 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp oeq <vscale x 2 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v4f64 = fcmp oeq <vscale x 4 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp oeq <vscale x 2 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp oeq <vscale x 4 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp oeq <vscale x 8 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v16f16 = fcmp oeq <vscale x 16 x half> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2f32 = fcmp oeq <vscale x 2 x float> undef, undef
@@ -28,9 +28,9 @@ define void @fcmp_oeq(i32 %arg) {
 
 define void @fcmp_oeq_bfloat(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_oeq_bfloat'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp oeq <vscale x 2 x bfloat> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp oeq <vscale x 4 x bfloat> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:11 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp oeq <vscale x 8 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp oeq <vscale x 2 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp oeq <vscale x 4 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:13 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp oeq <vscale x 8 x bfloat> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2bf16 = fcmp oeq <vscale x 2 x bfloat> undef, undef
@@ -41,15 +41,15 @@ define void @fcmp_oeq_bfloat(i32 %arg) {
 
 define void @fcmp_ogt(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_ogt'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp ogt <vscale x 2 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp ogt <vscale x 4 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v8f32 = fcmp ogt <vscale x 8 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp ogt <vscale x 2 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v4f64 = fcmp ogt <vscale x 4 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp ogt <vscale x 2 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp ogt <vscale x 4 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp ogt <vscale x 8 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v16f16 = fcmp ogt <vscale x 16 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp ogt <vscale x 2 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp ogt <vscale x 4 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v8f32 = fcmp ogt <vscale x 8 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp ogt <vscale x 2 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v4f64 = fcmp ogt <vscale x 4 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp ogt <vscale x 2 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp ogt <vscale x 4 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp ogt <vscale x 8 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v16f16 = fcmp ogt <vscale x 16 x half> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2f32 = fcmp ogt <vscale x 2 x float> undef, undef
@@ -66,9 +66,9 @@ define void @fcmp_ogt(i32 %arg) {
 
 define void @fcmp_ogt_bfloat(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_ogt_bfloat'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp ogt <vscale x 2 x bfloat> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp ogt <vscale x 4 x bfloat> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:11 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp ogt <vscale x 8 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp ogt <vscale x 2 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp ogt <vscale x 4 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:13 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp ogt <vscale x 8 x bfloat> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2bf16 = fcmp ogt <vscale x 2 x bfloat> undef, undef
@@ -79,15 +79,15 @@ define void @fcmp_ogt_bfloat(i32 %arg) {
 
 define void @fcmp_oge(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_oge'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp oge <vscale x 2 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp oge <vscale x 4 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v8f32 = fcmp oge <vscale x 8 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp oge <vscale x 2 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v4f64 = fcmp oge <vscale x 4 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp oge <vscale x 2 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp oge <vscale x 4 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp oge <vscale x 8 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v16f16 = fcmp oge <vscale x 16 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp oge <vscale x 2 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp oge <vscale x 4 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v8f32 = fcmp oge <vscale x 8 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp oge <vscale x 2 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v4f64 = fcmp oge <vscale x 4 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp oge <vscale x 2 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp oge <vscale x 4 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp oge <vscale x 8 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v16f16 = fcmp oge <vscale x 16 x half> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2f32 = fcmp oge <vscale x 2 x float> undef, undef
@@ -104,9 +104,9 @@ define void @fcmp_oge(i32 %arg) {
 
 define void @fcmp_oge_bfloat(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_oge_bfloat'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp oge <vscale x 2 x bfloat> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp oge <vscale x 4 x bfloat> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:11 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp oge <vscale x 8 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp oge <vscale x 2 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp oge <vscale x 4 x bfloat> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:13 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp oge <vscale x 8 x bfloat> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2bf16 = fcmp oge <vscale x 2 x bfloat> undef, undef
@@ -117,15 +117,15 @@ define void @fcmp_oge_bfloat(i32 %arg) {
 
 define void @fcmp_olt(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_olt'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp olt <vscale x 2 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp olt <vscale x 4 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v8f32 = fcmp olt <vscale x 8 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp olt <vscale x 2 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v4f64 = fcmp olt <vscale x 4 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp olt <vscale x 2 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp olt <vscale x 4 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp olt <vscale x 8 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found costs of 2 for: %v16f16 = fcmp olt <vscale x 16 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp olt <vscale x 2 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp olt <vscale x 4 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v8f32 = fcmp olt <vscale x 8 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp olt <vscale x 2 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v4f64 = fcmp olt <vscale x 4 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp olt <vscale x 2 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp olt <vscale x 4 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp olt <vscale x 8 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v16f16 = fcmp olt <vscale x 16 x half> undef, undef
 ; CHECK-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
   %v2f32 = fcmp olt <vscale x 2 x float> undef, undef
@@ -142,9 +142,9 @@ define void @fcmp_olt(i32 %arg) {
 
 define void @fcmp_olt_bfloat(i32 %arg) {
 ; CHECK-LABEL: 'fcmp_olt_bfloat'
-; CHECK-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp olt <vscale x 2 x bfloat> undef, unde...
[truncated]

c-rhodes · 2025-08-18T08:27:15Z

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

+  auto ResolveVariant = [&](unsigned Opcode) {
+    unsigned SCIdx = TII->get(Opcode).getSchedClass();
+    while (SCIdx && Sched.getSchedClassDesc(SCIdx)->isVariant())
+      SCIdx = ST->resolveVariantSchedClass(SCIdx, nullptr, TII,
+                                           Sched.getProcessorID());
+    assert(SCIdx);
+    const MCSchedClassDesc &SCDesc = Sched.SchedClassTable[SCIdx];
+    const MCWriteProcResEntry *B = ST->getWriteProcResBegin(&SCDesc);
+    const MCWriteProcResEntry *E = ST->getWriteProcResEnd(&SCDesc);
+    unsigned Min = Sched.IssueWidth;
+    for (const MCWriteProcResEntry *I = B; I != E; I++)
+      Min = std::min(Min, Sched.getProcResource(I->ProcResourceIdx)->NumUnits /
+                              (I->ReleaseAtCycle - I->AcquireAtCycle));
+    return Min;
+  };
+
+  return ResolveVariant(Opcode1) < ResolveVariant(Opcode2);


I'm simplifying a bit, but can you not just do: MCSchedModel::getReciprocalThroughput(Opcode1) > MCSchedModel::getReciprocalThroughput(Opcode2)?

Yep I think they are very similar. I was avoiding it mostly because it returns a double and I don't trust them to be deterministic across different platforms and compilers. I think they should be fine though. Unfortunately we can't rely on anything that uses a Variant as it uses a MI and we can't really get away with passing nullptr. The instructions we are checking will not have variants, so I've removed that from the patch.

ok, I'm probably missing something but with this change your tests pass for me:

diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp index 860ca41ccd44..e85a17c6a43b 100644 --- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp +++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp @@ -4418,21 +4418,12 @@ bool AArch64TTIImpl::hasKnownLowerThroughputFromSchedulingModel( if (!Sched.hasInstrSchedModel()) return false; - auto GetSimpleThroughput = [&](unsigned Opcode) { - unsigned SCIdx = TII->get(Opcode).getSchedClass(); - assert(!Sched.getSchedClassDesc(SCIdx)->isVariant() && - "We cannot handle variant scheduling classes without an MI"); - const MCSchedClassDesc &SCDesc = Sched.SchedClassTable[SCIdx]; - const MCWriteProcResEntry *B = ST->getWriteProcResBegin(&SCDesc); - const MCWriteProcResEntry *E = ST->getWriteProcResEnd(&SCDesc); - double Min = Sched.IssueWidth; - for (const MCWriteProcResEntry *I = B; I != E; I++) - Min = std::min(Min, Sched.getProcResource(I->ProcResourceIdx)->NumUnits / - double(I->ReleaseAtCycle - I->AcquireAtCycle)); - return Min; - }; - - return GetSimpleThroughput(Opcode1) < GetSimpleThroughput(Opcode2); + return MCSchedModel::getReciprocalThroughput( + *ST, + *(Sched.getSchedClassDesc(TII->get(Opcode1).getSchedClass()))) > + MCSchedModel::getReciprocalThroughput( + *ST, + *(Sched.getSchedClassDesc(TII->get(Opcode2).getSchedClass()))); } InstructionCost AArch64TTIImpl::getCmpSelInstrCost(

Looks like this version of the function:

double MCSchedModel::getReciprocalThroughput(const MCSubtargetInfo &STI, const MCSchedClassDesc &SCDesc) {

also deals with the case where ReleaseAtCycle == AcquireAtCycle.

It gives incorrect results for models with invalid sched desc's (for schedules without SVE info), and I don't love that it uses 1/*MinThroughput for the same reasons as above (and I didn't think that ReleaseAtCycle == AcquireAtCycle would happen, maybe I'm wrong about that). I've had a go at making it work. I'm not sure if using the sched info is a really great idea as it locks us in to not using variants, but we can try it and change it if it causes problems.

I guess it would also be possible to rewrite getReciprocalThroughput to work internally on integer fractions and return a fraction. A helper function could then be added to convert this to a double in places like CodeGen/TargetSchedule.cpp where it's used. Anyway, thanks for taking a look at this.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

llvm/test/Transforms/SLPVectorizer/AArch64/slp-fma-loss.ll

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

github-actions · 2025-08-29T07:23:41Z

⚠️ undef deprecator found issues in your code. ⚠️

You can test this locally with the following command:

git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 'HEAD~1' HEAD llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll

The following files introduce new uses of undef:

llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll

Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields undef. You should use poison values for placeholders instead.

In tests, avoid using undef and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead.

For example, this is considered a bad practice:

define void @fn() {
  ...
  br i1 undef, ...
}

Please use the following instead:

define void @fn(i1 %cond) {
  ...
  br i1 %cond, ...
}

Please refer to the Undefined Behavior Manual for more information.

david-arm

LGTM with nits!

david-arm · 2025-08-29T07:39:43Z

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

+      Sched.getSchedClassDesc(TII->get(Opcode2).getSchedClass());
+  // We cannot handle variant scheduling classes without an MI. If we need to
+  // support them for any of the instructions we query the information of we
+  // might need to add a qay to resolve them without a MI or not use the


nit: I think this should be ... a way to ...

david-arm · 2025-08-29T07:43:47Z

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

+  // support them for any of the instructions we query the information of we
+  // might need to add a qay to resolve them without a MI or not use the
+  // scheduling info.
+  assert(!SCD1->isVariant() && !SCD2->isVariant() &&


There is an assert here that SCD1 and SCD2 are not variants, but the code below suggests it's fine to have variants since we'll just return false. I think it's worth being stronger here by just going one way or the other, i.e. either:

Assert but don't check and add a comment to the function saying this shouldn't be called for opcodes with variants, or

Remove the assert and bail out if they are variants.

What do you think?

davemgreen requested review from SamTebbs33, c-rhodes, Asher8118, sdesmalen-arm, sjoerdmeijer and david-arm August 15, 2025 14:54

llvmbot added backend:AArch64 llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Aug 15, 2025

davemgreen force-pushed the gh-a64-fcmpcostsched branch from fbf907f to 8ba917c Compare August 15, 2025 14:56

c-rhodes reviewed Aug 18, 2025

View reviewed changes

david-arm reviewed Aug 18, 2025

View reviewed changes

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp Show resolved Hide resolved

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp Outdated Show resolved Hide resolved

llvm/test/Transforms/SLPVectorizer/AArch64/slp-fma-loss.ll Outdated Show resolved Hide resolved

sdesmalen-arm reviewed Aug 21, 2025

View reviewed changes

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp Show resolved Hide resolved

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp Show resolved Hide resolved

davemgreen force-pushed the gh-a64-fcmpcostsched branch from 8ba917c to dfd4522 Compare August 27, 2025 20:12

[AArch64] Give a higher cost for more expensive SVE FCMP instructions

1204299

davemgreen force-pushed the gh-a64-fcmpcostsched branch from dfd4522 to 1204299 Compare August 29, 2025 07:21

david-arm approved these changes Aug 29, 2025

View reviewed changes

c-rhodes approved these changes Aug 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64] Give a higher cost for more expensive SVE FCMP instructions #153816

[AArch64] Give a higher cost for more expensive SVE FCMP instructions #153816

davemgreen commented Aug 15, 2025 •

edited

Loading

Uh oh!

llvmbot commented Aug 15, 2025 •

edited

Loading

Uh oh!

llvmbot commented Aug 15, 2025

Uh oh!

c-rhodes Aug 18, 2025

Uh oh!

davemgreen Aug 27, 2025

Uh oh!

c-rhodes Aug 28, 2025

Uh oh!

david-arm Aug 28, 2025

Uh oh!

davemgreen Aug 29, 2025

Uh oh!

david-arm Aug 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 29, 2025

Uh oh!

david-arm left a comment

Uh oh!

david-arm Aug 29, 2025

Uh oh!

david-arm Aug 29, 2025

Uh oh!

Uh oh!

[AArch64] Give a higher cost for more expensive SVE FCMP instructions #153816

Are you sure you want to change the base?

[AArch64] Give a higher cost for more expensive SVE FCMP instructions #153816

Conversation

davemgreen commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 29, 2025

Uh oh!

david-arm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

davemgreen commented Aug 15, 2025 •

edited

Loading

llvmbot commented Aug 15, 2025 •

edited

Loading