-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[AArch64] Add FeatureFuseCCSelect to a number of CPU configurations. #153188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: David Green (davemgreen) ChangesThis marks CMP+CSel as fusable according to the SWOGs of Full diff: https://github.com/llvm/llvm-project/pull/153188.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64Processors.td b/llvm/lib/Target/AArch64/AArch64Processors.td
index 1bc1d98a6f65b..c3627b802fe14 100644
--- a/llvm/lib/Target/AArch64/AArch64Processors.td
+++ b/llvm/lib/Target/AArch64/AArch64Processors.td
@@ -134,6 +134,7 @@ def TuneA78 : SubtargetFeature<"a78", "ARMProcFamily", "CortexA78",
FeatureCmpBccFusion,
FeatureFuseAES,
FeatureFuseAdrpAdd,
+ FeatureFuseCCSelect,
FeatureAddrLSLSlow14,
FeatureALULSLFast,
FeaturePostRAScheduler,
@@ -146,6 +147,7 @@ def TuneA78AE : SubtargetFeature<"a78ae", "ARMProcFamily",
FeatureCmpBccFusion,
FeatureFuseAES,
FeatureFuseAdrpAdd,
+ FeatureFuseCCSelect,
FeatureAddrLSLSlow14,
FeatureALULSLFast,
FeaturePostRAScheduler,
@@ -158,6 +160,7 @@ def TuneA78C : SubtargetFeature<"a78c", "ARMProcFamily",
FeatureCmpBccFusion,
FeatureFuseAES,
FeatureFuseAdrpAdd,
+ FeatureFuseCCSelect,
FeatureAddrLSLSlow14,
FeatureALULSLFast,
FeaturePostRAScheduler,
@@ -169,6 +172,7 @@ def TuneA710 : SubtargetFeature<"a710", "ARMProcFamily", "CortexA710",
FeatureCmpBccFusion,
FeatureFuseAES,
FeatureFuseAdrpAdd,
+ FeatureFuseCCSelect,
FeatureALULSLFast,
FeaturePostRAScheduler,
FeatureEnableSelectOptimize,
@@ -181,6 +185,7 @@ def TuneA715 : SubtargetFeature<"a715", "ARMProcFamily", "CortexA715",
FeatureCmpBccFusion,
FeatureALULSLFast,
FeatureFuseAdrpAdd,
+ FeatureFuseCCSelect,
FeatureEnableSelectOptimize,
FeaturePredictableSelectIsExpensive]>;
@@ -191,6 +196,7 @@ def TuneA720 : SubtargetFeature<"a720", "ARMProcFamily", "CortexA720",
FeatureCmpBccFusion,
FeatureALULSLFast,
FeatureFuseAdrpAdd,
+ FeatureFuseCCSelect,
FeatureEnableSelectOptimize,
FeaturePredictableSelectIsExpensive]>;
@@ -201,6 +207,7 @@ def TuneA720AE : SubtargetFeature<"a720ae", "ARMProcFamily", "CortexA720",
FeatureCmpBccFusion,
FeatureALULSLFast,
FeatureFuseAdrpAdd,
+ FeatureFuseCCSelect,
FeatureEnableSelectOptimize,
FeaturePredictableSelectIsExpensive]>;
@@ -212,6 +219,7 @@ def TuneA725 : SubtargetFeature<"cortex-a725", "ARMProcFamily",
FeatureCmpBccFusion,
FeatureALULSLFast,
FeatureFuseAdrpAdd,
+ FeatureFuseCCSelect,
FeatureEnableSelectOptimize,
FeaturePredictableSelectIsExpensive]>;
@@ -262,6 +270,7 @@ def TuneX4 : SubtargetFeature<"cortex-x4", "ARMProcFamily", "CortexX4",
"Cortex-X4 ARM processors", [
FeatureALULSLFast,
FeatureFuseAdrpAdd,
+ FeatureFuseCCSelect,
FeatureFuseAES,
FeaturePostRAScheduler,
FeatureEnableSelectOptimize,
@@ -273,6 +282,7 @@ def TuneX925 : SubtargetFeature<"cortex-x925", "ARMProcFamily",
"CortexX925", "Cortex-X925 ARM processors",[
FeatureALULSLFast,
FeatureFuseAdrpAdd,
+ FeatureFuseCCSelect,
FeatureFuseAES,
FeaturePostRAScheduler,
FeatureEnableSelectOptimize,
@@ -536,6 +546,7 @@ def TuneNeoverseN2 : SubtargetFeature<"neoversen2", "ARMProcFamily", "NeoverseN2
"Neoverse N2 ARM processors", [
FeatureFuseAES,
FeatureFuseAdrpAdd,
+ FeatureFuseCCSelect,
FeatureALULSLFast,
FeaturePostRAScheduler,
FeatureEnableSelectOptimize,
@@ -547,6 +558,7 @@ def TuneNeoverseN3 : SubtargetFeature<"neoversen3", "ARMProcFamily", "NeoverseN3
FeaturePostRAScheduler,
FeatureALULSLFast,
FeatureFuseAdrpAdd,
+ FeatureFuseCCSelect,
FeatureEnableSelectOptimize,
FeaturePredictableSelectIsExpensive]>;
@@ -563,6 +575,7 @@ def TuneNeoverseV1 : SubtargetFeature<"neoversev1", "ARMProcFamily", "NeoverseV1
"Neoverse V1 ARM processors", [
FeatureFuseAES,
FeatureFuseAdrpAdd,
+ FeatureFuseCCSelect,
FeatureAddrLSLSlow14,
FeatureALULSLFast,
FeaturePostRAScheduler,
@@ -575,6 +588,7 @@ def TuneNeoverseV2 : SubtargetFeature<"neoversev2", "ARMProcFamily", "NeoverseV2
FeatureFuseAES,
FeatureCmpBccFusion,
FeatureFuseAdrpAdd,
+ FeatureFuseCCSelect,
FeatureALULSLFast,
FeaturePostRAScheduler,
FeatureEnableSelectOptimize,
@@ -588,6 +602,7 @@ def TuneNeoverseV3 : SubtargetFeature<"neoversev3", "ARMProcFamily", "NeoverseV3
FeatureFuseAES,
FeatureALULSLFast,
FeatureFuseAdrpAdd,
+ FeatureFuseCCSelect,
FeaturePostRAScheduler,
FeatureEnableSelectOptimize,
FeatureAvoidLDAPUR,
@@ -598,6 +613,7 @@ def TuneNeoverseV3AE : SubtargetFeature<"neoversev3AE", "ARMProcFamily", "Neover
FeatureFuseAES,
FeatureALULSLFast,
FeatureFuseAdrpAdd,
+ FeatureFuseCCSelect,
FeaturePostRAScheduler,
FeatureEnableSelectOptimize,
FeatureAvoidLDAPUR,
diff --git a/llvm/test/CodeGen/AArch64/misched-fusion-csel.ll b/llvm/test/CodeGen/AArch64/misched-fusion-csel.ll
index ac0adb7f85d0d..e289a0b1d3de7 100644
--- a/llvm/test/CodeGen/AArch64/misched-fusion-csel.ll
+++ b/llvm/test/CodeGen/AArch64/misched-fusion-csel.ll
@@ -2,6 +2,18 @@
; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m3 | FileCheck %s
; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m4 | FileCheck %s
; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=exynos-m5 | FileCheck %s
+; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a78 | FileCheck %s
+; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a710 | FileCheck %s
+; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a715 | FileCheck %s
+; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a720 | FileCheck %s
+; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-a725 | FileCheck %s
+; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-x4 | FileCheck %s
+; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=cortex-x925 | FileCheck %s
+; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=neoverse-n2 | FileCheck %s
+; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=neoverse-n3 | FileCheck %s
+; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=neoverse-v1 | FileCheck %s
+; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=neoverse-v2 | FileCheck %s
+; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=neoverse-v3 | FileCheck %s
target triple = "aarch64-unknown"
|
LGTM |
; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=neoverse-n3 | FileCheck %s | ||
; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=neoverse-v1 | FileCheck %s | ||
; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=neoverse-v2 | FileCheck %s | ||
; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=neoverse-v3 | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Sorry, I don't understand how this file tests the new changes.
I added one of the new RUN lines without your changes, and used llvm-lit
to test the file, and it passed normally.
Shouldn't this be tested using RUN line like this:
llvm-mca -mcpu=neoverse-v3 --iterations=1 | FileCheck %s
? and we validate output like this:
Instructions:
Total uOps:
Am I missing something ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are not the best. I was hoping to get away with reusing the existing test, as it is debatable whether this kind of thing is worth testing very strictly - it can be better done at a higher level, benchmarking a number of applications and making sure the tuning is for the better. But as you point out they already place the cmp next to the csel.
llvm-mca does not use information about fused operations so cannot be used to test this. That might be worth inventing at some point in the future, but these currently just tell the scheduler that it should try and fuse the operations, not what happens when it does.
I'll try and update the test with debug scheduling info to be clearer.
This marks CMP+CSel as fusable according to the SWOGs of cortex-a78 cortex-a710 cortex-a715 cortex-a720 cortex-a725 cortex-x4 cortex-x925 neoverse-n2 neoverse-n3 neoverse-v1 neoverse-v2 neoverse-v3
08f819e
to
7771cf7
Compare
This marks CMP+CSel as fusable according to the SWOGs of
cortex-a78
cortex-a710
cortex-a715
cortex-a720
cortex-a725
cortex-x4
cortex-x925
neoverse-n2
neoverse-n3
neoverse-v1
neoverse-v2
neoverse-v3