-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[AArch64] Give a higher cost for more expensive SVE FCMP instructions #153816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-backend-aarch64 Author: David Green (davemgreen) ChangesThis tries to add a higher cost for SVE FCM** comparison instructions that often have a lower throughput than the Neon equivalents that can be exectured on more vector pipelines. This patch takes the slightly unorthodox approach of using the information in the scheduling model to compare the throughput of a FCMEQ_PPzZZ_S (SVE) and a FCMEQv4f32 (Neon). This isn't how things will (probably) want to look in the long run, where all the information comes more directly from the scheduling model, but that still needs to be proven out. The downsides of this approach of using the scheduling model info is if the core does not have a scheduling model, but wants a different cost then an alternative approach will be needed (but then maybe that is a good reason to create a new scheduling model). The alternative would either be to make a subtarget feature for the affected cores (as in 17bf27d) or just always enable it, although that might need to change in the future. Patch is 365.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/153816.diff 16 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64SchedA510.td b/llvm/lib/Target/AArch64/AArch64SchedA510.td
index b93d67f3091e7..356e3fa39c53f 100644
--- a/llvm/lib/Target/AArch64/AArch64SchedA510.td
+++ b/llvm/lib/Target/AArch64/AArch64SchedA510.td
@@ -1016,7 +1016,7 @@ def : InstRW<[CortexA510MCWrite<16, 13, CortexA510UnitVALU>], (instrs FADDA_VPZ_
def : InstRW<[CortexA510MCWrite<8, 5, CortexA510UnitVALU>], (instrs FADDA_VPZ_D)>;
// Floating point compare
-def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FACG[ET]_PPzZZ_[HSD]",
+def : InstRW<[CortexA510MCWrite<4, 2, CortexA510UnitVALU>], (instregex "^FACG[ET]_PPzZZ_[HSD]",
"^FCM(EQ|GE|GT|NE)_PPzZ[0Z]_[HSD]",
"^FCM(LE|LT)_PPzZ0_[HSD]",
"^FCMUO_PPzZZ_[HSD]")>;
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 3042251cf754d..3ec45178af68e 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -4365,6 +4365,34 @@ AArch64TTIImpl::getAddressComputationCost(Type *PtrTy, ScalarEvolution *SE,
return 1;
}
+/// Check whether Opcode1 has less throughput according to the scheduling
+/// model than Opcode2.
+bool AArch64TTIImpl::hasLessThroughputFromSchedulingModel(
+ unsigned Opcode1, unsigned Opcode2) const {
+ const MCSchedModel &Sched = ST->getSchedModel();
+ const TargetInstrInfo *TII = ST->getInstrInfo();
+ if (!Sched.hasInstrSchedModel())
+ return false;
+
+ auto ResolveVariant = [&](unsigned Opcode) {
+ unsigned SCIdx = TII->get(Opcode).getSchedClass();
+ while (SCIdx && Sched.getSchedClassDesc(SCIdx)->isVariant())
+ SCIdx = ST->resolveVariantSchedClass(SCIdx, nullptr, TII,
+ Sched.getProcessorID());
+ assert(SCIdx);
+ const MCSchedClassDesc &SCDesc = Sched.SchedClassTable[SCIdx];
+ const MCWriteProcResEntry *B = ST->getWriteProcResBegin(&SCDesc);
+ const MCWriteProcResEntry *E = ST->getWriteProcResEnd(&SCDesc);
+ unsigned Min = Sched.IssueWidth;
+ for (const MCWriteProcResEntry *I = B; I != E; I++)
+ Min = std::min(Min, Sched.getProcResource(I->ProcResourceIdx)->NumUnits /
+ (I->ReleaseAtCycle - I->AcquireAtCycle));
+ return Min;
+ };
+
+ return ResolveVariant(Opcode1) < ResolveVariant(Opcode2);
+}
+
InstructionCost AArch64TTIImpl::getCmpSelInstrCost(
unsigned Opcode, Type *ValTy, Type *CondTy, CmpInst::Predicate VecPred,
TTI::TargetCostKind CostKind, TTI::OperandValueInfo Op1Info,
@@ -4462,6 +4490,12 @@ InstructionCost AArch64TTIImpl::getCmpSelInstrCost(
(VecPred == FCmpInst::FCMP_ONE || VecPred == FCmpInst::FCMP_UEQ))
Factor = 3; // fcmxx+fcmyy+or
+ if (isa<ScalableVectorType>(ValTy) &&
+ CostKind == TTI::TCK_RecipThroughput &&
+ hasLessThroughputFromSchedulingModel(AArch64::FCMEQ_PPzZZ_S,
+ AArch64::FCMEQv4f32))
+ Factor *= 2;
+
return Factor * (CostKind == TTI::TCK_Latency ? 2 : LT.first);
}
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
index 9c96fdd427814..fe36e43c616c1 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
@@ -174,6 +174,11 @@ class AArch64TTIImpl final : public BasicTTIImplBase<AArch64TTIImpl> {
bool prefersVectorizedAddressing() const override;
+ /// Check whether Opcode1 has less throughput according to the scheduling
+ /// model than Opcode2.
+ bool hasLessThroughputFromSchedulingModel(unsigned Opcode1,
+ unsigned Opcode2) const;
+
InstructionCost
getMaskedMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
unsigned AddressSpace,
diff --git a/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll b/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll
index 0789707a53b90..5029ef974681d 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll
@@ -58,10 +58,10 @@ define <vscale x 32 x i1> @cmp_nxv32i1() {
; Check fcmp for legal FP vectors
define void @cmp_legal_fp() #0 {
; CHECK-LABEL: 'cmp_legal_fp'
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %1 = fcmp oge <vscale x 2 x double> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %2 = fcmp oge <vscale x 4 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %3 = fcmp oge <vscale x 8 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:11 CodeSize:5 Lat:5 SizeLat:5 for: %4 = fcmp oge <vscale x 8 x bfloat> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %1 = fcmp oge <vscale x 2 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %2 = fcmp oge <vscale x 4 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %3 = fcmp oge <vscale x 8 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:13 CodeSize:5 Lat:5 SizeLat:5 for: %4 = fcmp oge <vscale x 8 x bfloat> undef, undef
; CHECK-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
%1 = fcmp oge <vscale x 2 x double> undef, undef
@@ -74,7 +74,7 @@ define void @cmp_legal_fp() #0 {
; Check fcmp for an illegal FP vector
define <vscale x 16 x i1> @cmp_nxv16f16() {
; CHECK-LABEL: 'cmp_nxv16f16'
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %res = fcmp oge <vscale x 16 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %res = fcmp oge <vscale x 16 x half> undef, undef
; CHECK-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret <vscale x 16 x i1> %res
;
%res = fcmp oge <vscale x 16 x half> undef, undef
diff --git a/llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll b/llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll
index a14176cd6d532..c48ca20949c21 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll
@@ -3,15 +3,15 @@
define void @fcmp_oeq(i32 %arg) {
; CHECK-LABEL: 'fcmp_oeq'
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp oeq <vscale x 2 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp oeq <vscale x 4 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v8f32 = fcmp oeq <vscale x 8 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp oeq <vscale x 2 x double> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v4f64 = fcmp oeq <vscale x 4 x double> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp oeq <vscale x 2 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp oeq <vscale x 4 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp oeq <vscale x 8 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v16f16 = fcmp oeq <vscale x 16 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp oeq <vscale x 2 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp oeq <vscale x 4 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v8f32 = fcmp oeq <vscale x 8 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp oeq <vscale x 2 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v4f64 = fcmp oeq <vscale x 4 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp oeq <vscale x 2 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp oeq <vscale x 4 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp oeq <vscale x 8 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v16f16 = fcmp oeq <vscale x 16 x half> undef, undef
; CHECK-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
%v2f32 = fcmp oeq <vscale x 2 x float> undef, undef
@@ -28,9 +28,9 @@ define void @fcmp_oeq(i32 %arg) {
define void @fcmp_oeq_bfloat(i32 %arg) {
; CHECK-LABEL: 'fcmp_oeq_bfloat'
-; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp oeq <vscale x 2 x bfloat> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp oeq <vscale x 4 x bfloat> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:11 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp oeq <vscale x 8 x bfloat> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp oeq <vscale x 2 x bfloat> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp oeq <vscale x 4 x bfloat> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:13 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp oeq <vscale x 8 x bfloat> undef, undef
; CHECK-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
%v2bf16 = fcmp oeq <vscale x 2 x bfloat> undef, undef
@@ -41,15 +41,15 @@ define void @fcmp_oeq_bfloat(i32 %arg) {
define void @fcmp_ogt(i32 %arg) {
; CHECK-LABEL: 'fcmp_ogt'
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp ogt <vscale x 2 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp ogt <vscale x 4 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v8f32 = fcmp ogt <vscale x 8 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp ogt <vscale x 2 x double> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v4f64 = fcmp ogt <vscale x 4 x double> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp ogt <vscale x 2 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp ogt <vscale x 4 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp ogt <vscale x 8 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v16f16 = fcmp ogt <vscale x 16 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp ogt <vscale x 2 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp ogt <vscale x 4 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v8f32 = fcmp ogt <vscale x 8 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp ogt <vscale x 2 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v4f64 = fcmp ogt <vscale x 4 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp ogt <vscale x 2 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp ogt <vscale x 4 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp ogt <vscale x 8 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v16f16 = fcmp ogt <vscale x 16 x half> undef, undef
; CHECK-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
%v2f32 = fcmp ogt <vscale x 2 x float> undef, undef
@@ -66,9 +66,9 @@ define void @fcmp_ogt(i32 %arg) {
define void @fcmp_ogt_bfloat(i32 %arg) {
; CHECK-LABEL: 'fcmp_ogt_bfloat'
-; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp ogt <vscale x 2 x bfloat> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp ogt <vscale x 4 x bfloat> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:11 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp ogt <vscale x 8 x bfloat> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp ogt <vscale x 2 x bfloat> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp ogt <vscale x 4 x bfloat> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:13 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp ogt <vscale x 8 x bfloat> undef, undef
; CHECK-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
%v2bf16 = fcmp ogt <vscale x 2 x bfloat> undef, undef
@@ -79,15 +79,15 @@ define void @fcmp_ogt_bfloat(i32 %arg) {
define void @fcmp_oge(i32 %arg) {
; CHECK-LABEL: 'fcmp_oge'
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp oge <vscale x 2 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp oge <vscale x 4 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v8f32 = fcmp oge <vscale x 8 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp oge <vscale x 2 x double> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v4f64 = fcmp oge <vscale x 4 x double> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp oge <vscale x 2 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp oge <vscale x 4 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp oge <vscale x 8 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v16f16 = fcmp oge <vscale x 16 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp oge <vscale x 2 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp oge <vscale x 4 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v8f32 = fcmp oge <vscale x 8 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp oge <vscale x 2 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v4f64 = fcmp oge <vscale x 4 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp oge <vscale x 2 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp oge <vscale x 4 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp oge <vscale x 8 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v16f16 = fcmp oge <vscale x 16 x half> undef, undef
; CHECK-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
%v2f32 = fcmp oge <vscale x 2 x float> undef, undef
@@ -104,9 +104,9 @@ define void @fcmp_oge(i32 %arg) {
define void @fcmp_oge_bfloat(i32 %arg) {
; CHECK-LABEL: 'fcmp_oge_bfloat'
-; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp oge <vscale x 2 x bfloat> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp oge <vscale x 4 x bfloat> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:11 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp oge <vscale x 8 x bfloat> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp oge <vscale x 2 x bfloat> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp oge <vscale x 4 x bfloat> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:13 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp oge <vscale x 8 x bfloat> undef, undef
; CHECK-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
%v2bf16 = fcmp oge <vscale x 2 x bfloat> undef, undef
@@ -117,15 +117,15 @@ define void @fcmp_oge_bfloat(i32 %arg) {
define void @fcmp_olt(i32 %arg) {
; CHECK-LABEL: 'fcmp_olt'
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp olt <vscale x 2 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp olt <vscale x 4 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v8f32 = fcmp olt <vscale x 8 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp olt <vscale x 2 x double> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v4f64 = fcmp olt <vscale x 4 x double> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp olt <vscale x 2 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp olt <vscale x 4 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp olt <vscale x 8 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v16f16 = fcmp olt <vscale x 16 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp olt <vscale x 2 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp olt <vscale x 4 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v8f32 = fcmp olt <vscale x 8 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp olt <vscale x 2 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v4f64 = fcmp olt <vscale x 4 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp olt <vscale x 2 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp olt <vscale x 4 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp olt <vscale x 8 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v16f16 = fcmp olt <vscale x 16 x half> undef, undef
; CHECK-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
%v2f32 = fcmp olt <vscale x 2 x float> undef, undef
@@ -142,9 +142,9 @@ define void @fcmp_olt(i32 %arg) {
define void @fcmp_olt_bfloat(i32 %arg) {
; CHECK-LABEL: 'fcmp_olt_bfloat'
-; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp olt <vscale x 2 x bfloat> undef, unde...
[truncated]
|
@llvm/pr-subscribers-llvm-transforms Author: David Green (davemgreen) ChangesThis tries to add a higher cost for SVE FCM** comparison instructions that often have a lower throughput than the Neon equivalents that can be exectured on more vector pipelines. This patch takes the slightly unorthodox approach of using the information in the scheduling model to compare the throughput of a FCMEQ_PPzZZ_S (SVE) and a FCMEQv4f32 (Neon). This isn't how things will (probably) want to look in the long run, where all the information comes more directly from the scheduling model, but that still needs to be proven out. The downsides of this approach of using the scheduling model info is if the core does not have a scheduling model, but wants a different cost then an alternative approach will be needed (but then maybe that is a good reason to create a new scheduling model). The alternative would either be to make a subtarget feature for the affected cores (as in 17bf27d) or just always enable it, although that might need to change in the future. Patch is 365.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/153816.diff 16 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64SchedA510.td b/llvm/lib/Target/AArch64/AArch64SchedA510.td
index b93d67f3091e7..356e3fa39c53f 100644
--- a/llvm/lib/Target/AArch64/AArch64SchedA510.td
+++ b/llvm/lib/Target/AArch64/AArch64SchedA510.td
@@ -1016,7 +1016,7 @@ def : InstRW<[CortexA510MCWrite<16, 13, CortexA510UnitVALU>], (instrs FADDA_VPZ_
def : InstRW<[CortexA510MCWrite<8, 5, CortexA510UnitVALU>], (instrs FADDA_VPZ_D)>;
// Floating point compare
-def : InstRW<[CortexA510Write<4, CortexA510UnitVALU>], (instregex "^FACG[ET]_PPzZZ_[HSD]",
+def : InstRW<[CortexA510MCWrite<4, 2, CortexA510UnitVALU>], (instregex "^FACG[ET]_PPzZZ_[HSD]",
"^FCM(EQ|GE|GT|NE)_PPzZ[0Z]_[HSD]",
"^FCM(LE|LT)_PPzZ0_[HSD]",
"^FCMUO_PPzZZ_[HSD]")>;
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 3042251cf754d..3ec45178af68e 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -4365,6 +4365,34 @@ AArch64TTIImpl::getAddressComputationCost(Type *PtrTy, ScalarEvolution *SE,
return 1;
}
+/// Check whether Opcode1 has less throughput according to the scheduling
+/// model than Opcode2.
+bool AArch64TTIImpl::hasLessThroughputFromSchedulingModel(
+ unsigned Opcode1, unsigned Opcode2) const {
+ const MCSchedModel &Sched = ST->getSchedModel();
+ const TargetInstrInfo *TII = ST->getInstrInfo();
+ if (!Sched.hasInstrSchedModel())
+ return false;
+
+ auto ResolveVariant = [&](unsigned Opcode) {
+ unsigned SCIdx = TII->get(Opcode).getSchedClass();
+ while (SCIdx && Sched.getSchedClassDesc(SCIdx)->isVariant())
+ SCIdx = ST->resolveVariantSchedClass(SCIdx, nullptr, TII,
+ Sched.getProcessorID());
+ assert(SCIdx);
+ const MCSchedClassDesc &SCDesc = Sched.SchedClassTable[SCIdx];
+ const MCWriteProcResEntry *B = ST->getWriteProcResBegin(&SCDesc);
+ const MCWriteProcResEntry *E = ST->getWriteProcResEnd(&SCDesc);
+ unsigned Min = Sched.IssueWidth;
+ for (const MCWriteProcResEntry *I = B; I != E; I++)
+ Min = std::min(Min, Sched.getProcResource(I->ProcResourceIdx)->NumUnits /
+ (I->ReleaseAtCycle - I->AcquireAtCycle));
+ return Min;
+ };
+
+ return ResolveVariant(Opcode1) < ResolveVariant(Opcode2);
+}
+
InstructionCost AArch64TTIImpl::getCmpSelInstrCost(
unsigned Opcode, Type *ValTy, Type *CondTy, CmpInst::Predicate VecPred,
TTI::TargetCostKind CostKind, TTI::OperandValueInfo Op1Info,
@@ -4462,6 +4490,12 @@ InstructionCost AArch64TTIImpl::getCmpSelInstrCost(
(VecPred == FCmpInst::FCMP_ONE || VecPred == FCmpInst::FCMP_UEQ))
Factor = 3; // fcmxx+fcmyy+or
+ if (isa<ScalableVectorType>(ValTy) &&
+ CostKind == TTI::TCK_RecipThroughput &&
+ hasLessThroughputFromSchedulingModel(AArch64::FCMEQ_PPzZZ_S,
+ AArch64::FCMEQv4f32))
+ Factor *= 2;
+
return Factor * (CostKind == TTI::TCK_Latency ? 2 : LT.first);
}
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
index 9c96fdd427814..fe36e43c616c1 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
@@ -174,6 +174,11 @@ class AArch64TTIImpl final : public BasicTTIImplBase<AArch64TTIImpl> {
bool prefersVectorizedAddressing() const override;
+ /// Check whether Opcode1 has less throughput according to the scheduling
+ /// model than Opcode2.
+ bool hasLessThroughputFromSchedulingModel(unsigned Opcode1,
+ unsigned Opcode2) const;
+
InstructionCost
getMaskedMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
unsigned AddressSpace,
diff --git a/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll b/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll
index 0789707a53b90..5029ef974681d 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll
@@ -58,10 +58,10 @@ define <vscale x 32 x i1> @cmp_nxv32i1() {
; Check fcmp for legal FP vectors
define void @cmp_legal_fp() #0 {
; CHECK-LABEL: 'cmp_legal_fp'
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %1 = fcmp oge <vscale x 2 x double> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %2 = fcmp oge <vscale x 4 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %3 = fcmp oge <vscale x 8 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:11 CodeSize:5 Lat:5 SizeLat:5 for: %4 = fcmp oge <vscale x 8 x bfloat> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %1 = fcmp oge <vscale x 2 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %2 = fcmp oge <vscale x 4 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %3 = fcmp oge <vscale x 8 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:13 CodeSize:5 Lat:5 SizeLat:5 for: %4 = fcmp oge <vscale x 8 x bfloat> undef, undef
; CHECK-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
%1 = fcmp oge <vscale x 2 x double> undef, undef
@@ -74,7 +74,7 @@ define void @cmp_legal_fp() #0 {
; Check fcmp for an illegal FP vector
define <vscale x 16 x i1> @cmp_nxv16f16() {
; CHECK-LABEL: 'cmp_nxv16f16'
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %res = fcmp oge <vscale x 16 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %res = fcmp oge <vscale x 16 x half> undef, undef
; CHECK-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret <vscale x 16 x i1> %res
;
%res = fcmp oge <vscale x 16 x half> undef, undef
diff --git a/llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll b/llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll
index a14176cd6d532..c48ca20949c21 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll
@@ -3,15 +3,15 @@
define void @fcmp_oeq(i32 %arg) {
; CHECK-LABEL: 'fcmp_oeq'
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp oeq <vscale x 2 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp oeq <vscale x 4 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v8f32 = fcmp oeq <vscale x 8 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp oeq <vscale x 2 x double> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v4f64 = fcmp oeq <vscale x 4 x double> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp oeq <vscale x 2 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp oeq <vscale x 4 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp oeq <vscale x 8 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v16f16 = fcmp oeq <vscale x 16 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp oeq <vscale x 2 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp oeq <vscale x 4 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v8f32 = fcmp oeq <vscale x 8 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp oeq <vscale x 2 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v4f64 = fcmp oeq <vscale x 4 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp oeq <vscale x 2 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp oeq <vscale x 4 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp oeq <vscale x 8 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v16f16 = fcmp oeq <vscale x 16 x half> undef, undef
; CHECK-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
%v2f32 = fcmp oeq <vscale x 2 x float> undef, undef
@@ -28,9 +28,9 @@ define void @fcmp_oeq(i32 %arg) {
define void @fcmp_oeq_bfloat(i32 %arg) {
; CHECK-LABEL: 'fcmp_oeq_bfloat'
-; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp oeq <vscale x 2 x bfloat> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp oeq <vscale x 4 x bfloat> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:11 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp oeq <vscale x 8 x bfloat> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp oeq <vscale x 2 x bfloat> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp oeq <vscale x 4 x bfloat> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:13 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp oeq <vscale x 8 x bfloat> undef, undef
; CHECK-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
%v2bf16 = fcmp oeq <vscale x 2 x bfloat> undef, undef
@@ -41,15 +41,15 @@ define void @fcmp_oeq_bfloat(i32 %arg) {
define void @fcmp_ogt(i32 %arg) {
; CHECK-LABEL: 'fcmp_ogt'
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp ogt <vscale x 2 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp ogt <vscale x 4 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v8f32 = fcmp ogt <vscale x 8 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp ogt <vscale x 2 x double> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v4f64 = fcmp ogt <vscale x 4 x double> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp ogt <vscale x 2 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp ogt <vscale x 4 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp ogt <vscale x 8 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v16f16 = fcmp ogt <vscale x 16 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp ogt <vscale x 2 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp ogt <vscale x 4 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v8f32 = fcmp ogt <vscale x 8 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp ogt <vscale x 2 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v4f64 = fcmp ogt <vscale x 4 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp ogt <vscale x 2 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp ogt <vscale x 4 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp ogt <vscale x 8 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v16f16 = fcmp ogt <vscale x 16 x half> undef, undef
; CHECK-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
%v2f32 = fcmp ogt <vscale x 2 x float> undef, undef
@@ -66,9 +66,9 @@ define void @fcmp_ogt(i32 %arg) {
define void @fcmp_ogt_bfloat(i32 %arg) {
; CHECK-LABEL: 'fcmp_ogt_bfloat'
-; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp ogt <vscale x 2 x bfloat> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp ogt <vscale x 4 x bfloat> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:11 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp ogt <vscale x 8 x bfloat> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp ogt <vscale x 2 x bfloat> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp ogt <vscale x 4 x bfloat> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:13 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp ogt <vscale x 8 x bfloat> undef, undef
; CHECK-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
%v2bf16 = fcmp ogt <vscale x 2 x bfloat> undef, undef
@@ -79,15 +79,15 @@ define void @fcmp_ogt_bfloat(i32 %arg) {
define void @fcmp_oge(i32 %arg) {
; CHECK-LABEL: 'fcmp_oge'
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp oge <vscale x 2 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp oge <vscale x 4 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v8f32 = fcmp oge <vscale x 8 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp oge <vscale x 2 x double> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v4f64 = fcmp oge <vscale x 4 x double> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp oge <vscale x 2 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp oge <vscale x 4 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp oge <vscale x 8 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v16f16 = fcmp oge <vscale x 16 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp oge <vscale x 2 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp oge <vscale x 4 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v8f32 = fcmp oge <vscale x 8 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp oge <vscale x 2 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v4f64 = fcmp oge <vscale x 4 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp oge <vscale x 2 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp oge <vscale x 4 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp oge <vscale x 8 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v16f16 = fcmp oge <vscale x 16 x half> undef, undef
; CHECK-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
%v2f32 = fcmp oge <vscale x 2 x float> undef, undef
@@ -104,9 +104,9 @@ define void @fcmp_oge(i32 %arg) {
define void @fcmp_oge_bfloat(i32 %arg) {
; CHECK-LABEL: 'fcmp_oge_bfloat'
-; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp oge <vscale x 2 x bfloat> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp oge <vscale x 4 x bfloat> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:11 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp oge <vscale x 8 x bfloat> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp oge <vscale x 2 x bfloat> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:3 Lat:4 SizeLat:3 for: %v4bf16 = fcmp oge <vscale x 4 x bfloat> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:13 CodeSize:5 Lat:5 SizeLat:5 for: %v8bf16 = fcmp oge <vscale x 8 x bfloat> undef, undef
; CHECK-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
%v2bf16 = fcmp oge <vscale x 2 x bfloat> undef, undef
@@ -117,15 +117,15 @@ define void @fcmp_oge_bfloat(i32 %arg) {
define void @fcmp_olt(i32 %arg) {
; CHECK-LABEL: 'fcmp_olt'
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp olt <vscale x 2 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp olt <vscale x 4 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v8f32 = fcmp olt <vscale x 8 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp olt <vscale x 2 x double> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v4f64 = fcmp olt <vscale x 4 x double> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp olt <vscale x 2 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp olt <vscale x 4 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp olt <vscale x 8 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found costs of 2 for: %v16f16 = fcmp olt <vscale x 16 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f32 = fcmp olt <vscale x 2 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f32 = fcmp olt <vscale x 4 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v8f32 = fcmp olt <vscale x 8 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f64 = fcmp olt <vscale x 2 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v4f64 = fcmp olt <vscale x 4 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v2f16 = fcmp olt <vscale x 2 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v4f16 = fcmp olt <vscale x 4 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:2 SizeLat:1 for: %v8f16 = fcmp olt <vscale x 8 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found costs of RThru:4 CodeSize:2 Lat:2 SizeLat:2 for: %v16f16 = fcmp olt <vscale x 16 x half> undef, undef
; CHECK-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
%v2f32 = fcmp olt <vscale x 2 x float> undef, undef
@@ -142,9 +142,9 @@ define void @fcmp_olt(i32 %arg) {
define void @fcmp_olt_bfloat(i32 %arg) {
; CHECK-LABEL: 'fcmp_olt_bfloat'
-; CHECK-NEXT: Cost Model: Found costs of RThru:3 CodeSize:3 Lat:4 SizeLat:3 for: %v2bf16 = fcmp olt <vscale x 2 x bfloat> undef, unde...
[truncated]
|
fbf907f
to
8ba917c
Compare
auto ResolveVariant = [&](unsigned Opcode) { | ||
unsigned SCIdx = TII->get(Opcode).getSchedClass(); | ||
while (SCIdx && Sched.getSchedClassDesc(SCIdx)->isVariant()) | ||
SCIdx = ST->resolveVariantSchedClass(SCIdx, nullptr, TII, | ||
Sched.getProcessorID()); | ||
assert(SCIdx); | ||
const MCSchedClassDesc &SCDesc = Sched.SchedClassTable[SCIdx]; | ||
const MCWriteProcResEntry *B = ST->getWriteProcResBegin(&SCDesc); | ||
const MCWriteProcResEntry *E = ST->getWriteProcResEnd(&SCDesc); | ||
unsigned Min = Sched.IssueWidth; | ||
for (const MCWriteProcResEntry *I = B; I != E; I++) | ||
Min = std::min(Min, Sched.getProcResource(I->ProcResourceIdx)->NumUnits / | ||
(I->ReleaseAtCycle - I->AcquireAtCycle)); | ||
return Min; | ||
}; | ||
|
||
return ResolveVariant(Opcode1) < ResolveVariant(Opcode2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm simplifying a bit, but can you not just do: MCSchedModel::getReciprocalThroughput(Opcode1) > MCSchedModel::getReciprocalThroughput(Opcode2)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I think they are very similar. I was avoiding it mostly because it returns a double and I don't trust them to be deterministic across different platforms and compilers. I think they should be fine though. Unfortunately we can't rely on anything that uses a Variant as it uses a MI and we can't really get away with passing nullptr. The instructions we are checking will not have variants, so I've removed that from the patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'm probably missing something but with this change your tests pass for me:
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 860ca41ccd44..e85a17c6a43b 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -4418,21 +4418,12 @@ bool AArch64TTIImpl::hasKnownLowerThroughputFromSchedulingModel(
if (!Sched.hasInstrSchedModel())
return false;
- auto GetSimpleThroughput = [&](unsigned Opcode) {
- unsigned SCIdx = TII->get(Opcode).getSchedClass();
- assert(!Sched.getSchedClassDesc(SCIdx)->isVariant() &&
- "We cannot handle variant scheduling classes without an MI");
- const MCSchedClassDesc &SCDesc = Sched.SchedClassTable[SCIdx];
- const MCWriteProcResEntry *B = ST->getWriteProcResBegin(&SCDesc);
- const MCWriteProcResEntry *E = ST->getWriteProcResEnd(&SCDesc);
- double Min = Sched.IssueWidth;
- for (const MCWriteProcResEntry *I = B; I != E; I++)
- Min = std::min(Min, Sched.getProcResource(I->ProcResourceIdx)->NumUnits /
- double(I->ReleaseAtCycle - I->AcquireAtCycle));
- return Min;
- };
-
- return GetSimpleThroughput(Opcode1) < GetSimpleThroughput(Opcode2);
+ return MCSchedModel::getReciprocalThroughput(
+ *ST,
+ *(Sched.getSchedClassDesc(TII->get(Opcode1).getSchedClass()))) >
+ MCSchedModel::getReciprocalThroughput(
+ *ST,
+ *(Sched.getSchedClassDesc(TII->get(Opcode2).getSchedClass())));
}
InstructionCost AArch64TTIImpl::getCmpSelInstrCost(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this version of the function:
double
MCSchedModel::getReciprocalThroughput(const MCSubtargetInfo &STI,
const MCSchedClassDesc &SCDesc) {
also deals with the case where ReleaseAtCycle == AcquireAtCycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gives incorrect results for models with invalid sched desc's (for schedules without SVE info), and I don't love that it uses 1/*MinThroughput for the same reasons as above (and I didn't think that ReleaseAtCycle == AcquireAtCycle would happen, maybe I'm wrong about that). I've had a go at making it work. I'm not sure if using the sched info is a really great idea as it locks us in to not using variants, but we can try it and change it if it causes problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it would also be possible to rewrite getReciprocalThroughput
to work internally on integer fractions and return a fraction. A helper function could then be added to convert this to a double in places like CodeGen/TargetSchedule.cpp where it's used. Anyway, thanks for taking a look at this.
8ba917c
to
dfd4522
Compare
dfd4522
to
1204299
Compare
You can test this locally with the following command:git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 'HEAD~1' HEAD llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h llvm/test/Analysis/CostModel/AArch64/sve-cmpsel.ll llvm/test/Analysis/CostModel/AArch64/sve-fcmp.ll The following files introduce new uses of undef:
Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields In tests, avoid using For example, this is considered a bad practice: define void @fn() {
...
br i1 undef, ...
} Please use the following instead: define void @fn(i1 %cond) {
...
br i1 %cond, ...
} Please refer to the Undefined Behavior Manual for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with nits!
Sched.getSchedClassDesc(TII->get(Opcode2).getSchedClass()); | ||
// We cannot handle variant scheduling classes without an MI. If we need to | ||
// support them for any of the instructions we query the information of we | ||
// might need to add a qay to resolve them without a MI or not use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this should be ... a way to ...
// support them for any of the instructions we query the information of we | ||
// might need to add a qay to resolve them without a MI or not use the | ||
// scheduling info. | ||
assert(!SCD1->isVariant() && !SCD2->isVariant() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an assert here that SCD1 and SCD2 are not variants, but the code below suggests it's fine to have variants since we'll just return false. I think it's worth being stronger here by just going one way or the other, i.e. either:
- Assert but don't check and add a comment to the function saying this shouldn't be called for opcodes with variants, or
- Remove the assert and bail out if they are variants.
What do you think?
This tries to add a higher cost for SVE FCM** comparison instructions that often have a lower throughput than the Neon equivalents that can be exectured on more vector pipelines.
This patch takes the slightly unorthodox approach of using the information in the scheduling model to compare the throughput of a FCMEQ_PPzZZ_S (SVE) and a FCMEQv4f32 (Neon). This isn't how things will (probably) want to work in the long run, where all the information comes more directly from the scheduling model, but that still needs to be proven out. The downsides of this approach of using the scheduling model info is if the core does not have a scheduling model but wants a different cost - then an alternative approach will be needed (but then maybe that is a good reason to create a new scheduling model).
The alternative would either be to make a subtarget feature for the affected cores (as in 17bf27d) or just always enable it, although that might need to change in the future.