-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[Draft][LLVM] Refine MemoryEffect handling for target-specific intrinsics #155590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Draft][LLVM] Refine MemoryEffect handling for target-specific intrinsics #155590
Conversation
This patch introduces preliminary support for additional memory locations, such as FPMR and ZA, needed to model AArch64 architectural registers as memory dependencies. Currently, these locations are not yet target-specific. The goal is to enable the compiler to express read/write effects on these resources. What This Patch Does: Adds two new memory locations: FPMR and ZA, intended to represent AArch64-specific inaccessible memory types. Current Limitations: These new locations are not yet target-specific in the type-safe sense, they are globally visible and hardcoded. There is no mechanism yet to associate a memory location with its corresponding target (e.g., AArch64 vs RISCV). No changes are made yet to bitcode serialization, parser support, or alias analysis behavior. This patch is not functionally complete — it is a structural prototype to solicit feedback on the direction and I would like some suggestion on how to proceed.
This patch adds IntrInaccessibleReadWriteMem metadata to allow to set ModRef at the same time for a Location. This patch depends on how we implement PR#148650.
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: None (CarolineConcatto) ChangesEnable more precise alias and dependency analysis between calls when reasoning
Patch is 51.64 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155590.diff 20 Files Affected:
diff --git a/clang/test/CodeGen/AArch64/attr-fp8-function.c b/clang/test/CodeGen/AArch64/attr-fp8-function.c
index 54bfd177bd809..62b910eafa4a7 100644
--- a/clang/test/CodeGen/AArch64/attr-fp8-function.c
+++ b/clang/test/CodeGen/AArch64/attr-fp8-function.c
@@ -18,20 +18,29 @@ svfloat16_t test_svcvtlt2_f16_mf8(svmfloat8_t zn, fpm_t fpm) __arm_streaming {
return svcvtlt2_f16_mf8_fpm(zn, fpm);
}
-// CHECK: declare void @llvm.aarch64.set.fpmr(i64) [[ATTR3:#.*]]
-// CHECK: declare <vscale x 8 x half> @llvm.aarch64.sve.fp8.cvtlt2.nxv8f16(<vscale x 16 x i8>) [[ATTR4:#.*]]
+// CHECK: declare void @llvm.aarch64.set.fpmr(i64) [[ATTR2:#.*]]
+// CHECK: declare <vscale x 8 x half> @llvm.aarch64.sve.fp8.cvtlt2.nxv8f16(<vscale x 16 x i8>) [[ATTR3:#.*]]
// SME
+// With only fprm as inaccessible memory
svfloat32_t test_svmlalltt_lane_f32_mf8(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm) __arm_streaming {
return svmlalltt_lane_f32_mf8_fpm(zda, zn, zm, 7, fpm);
}
-// CHECK: declare <vscale x 4 x float> @llvm.aarch64.sve.fp8.fmlalltt.lane.nxv4f32(<vscale x 4 x float>, <vscale x 16 x i8>, <vscale x 16 x i8>, i32 immarg) [[ATTR4]]
+// CHECK: declare <vscale x 4 x float> @llvm.aarch64.sve.fp8.fmlalltt.lane.nxv4f32(<vscale x 4 x float>, <vscale x 16 x i8>, <vscale x 16 x i8>, i32 immarg) [[ATTR3:#.*]]
-// CHECK: declare <16 x i8> @llvm.aarch64.neon.fp8.fcvtn.v16i8.v8f16(<8 x half>, <8 x half>) [[ATTR4]]
+// With fpmr and za as incaccessible memory
+void test_svdot_lane_za32_f8_vg1x2(uint32_t slice, svmfloat8x2_t zn, svmfloat8_t zm, fpm_t fpmr) __arm_streaming __arm_inout("za") {
+ svdot_lane_za32_mf8_vg1x2_fpm(slice, zn, zm, 3, fpmr);
+}
+
+// CHECK: declare void @llvm.aarch64.sme.fp8.fdot.lane.za32.vg1x2(i32, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, i32 immarg) [[ATTR5:#.*]]
+// CHECK: declare <16 x i8> @llvm.aarch64.neon.fp8.fcvtn.v16i8.v8f16(<8 x half>, <8 x half>) [[ATTR3]]
-// CHECK: attributes [[ATTR1:#.*]] = {{{.*}}}
-// CHECK: attributes [[ATTR2:#.*]] = {{{.*}}}
-// CHECK: attributes [[ATTR3]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) }
-// CHECK: attributes [[ATTR4]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: read) }
+// CHECK: attributes [[ATTR0:#.*]] = {{{.*}}}
+// CHECK: attributes [[ATTR1:#.*]] = {{{.*}}}
+// CHECK: attributes [[ATTR2]] = { nocallback nofree nosync nounwind willreturn memory(aarch64_fpmr: write) }
+// CHECK: attributes [[ATTR3]] = { nocallback nofree nosync nounwind willreturn memory(aarch64_fpmr: read) }
+// CHECK: attributes [[ATTR4:#.*]] = {{{.*}}}
+// CHECK: attributes [[ATTR5:#.*]] = { nocallback nofree nosync nounwind willreturn memory(aarch64_fpmr: read, aarch64_za: readwrite) }
diff --git a/llvm/include/llvm/AsmParser/LLToken.h b/llvm/include/llvm/AsmParser/LLToken.h
index c7e4bdf3ff811..c08eb99c1f5b2 100644
--- a/llvm/include/llvm/AsmParser/LLToken.h
+++ b/llvm/include/llvm/AsmParser/LLToken.h
@@ -202,6 +202,8 @@ enum Kind {
kw_readwrite,
kw_argmem,
kw_inaccessiblemem,
+ kw_aarch64_fpmr,
+ kw_aarch64_za,
kw_errnomem,
// Legacy attributes:
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index bd6f94ac1286c..33e89f88ef0d6 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -49,6 +49,18 @@ def IntrArgMemOnly : IntrinsicProperty;
// accessible by the module being compiled. This is a weaker form of IntrNoMem.
def IntrInaccessibleMemOnly : IntrinsicProperty;
+
+
+class IntrinsicMemoryLocation;
+// This should be added in the Target, but once in IntrinsicsAArch64.td
+// It complains error: "Variable not defined: 'AArch64_FPMR'"
+def AArch64_FPMR : IntrinsicMemoryLocation;
+def AArch64_ZA: IntrinsicMemoryLocation;
+// IntrInaccessible{Read|Write}MemOnly needs to set Location
+class IntrInaccessibleReadMemOnly<IntrinsicMemoryLocation idx> : IntrinsicProperty{IntrinsicMemoryLocation Loc=idx;}
+class IntrInaccessibleWriteMemOnly<IntrinsicMemoryLocation idx> : IntrinsicProperty{IntrinsicMemoryLocation Loc=idx;}
+class IntrInaccessibleReadWriteMem<IntrinsicMemoryLocation idx> : IntrinsicProperty{IntrinsicMemoryLocation Loc=idx;}
+
// IntrInaccessibleMemOrArgMemOnly -- This intrinsic only accesses memory that
// its pointer-typed arguments point to or memory that is not accessible
// by the module being compiled. This is a weaker form of IntrArgMemOnly.
diff --git a/llvm/include/llvm/IR/IntrinsicsAArch64.td b/llvm/include/llvm/IR/IntrinsicsAArch64.td
index ca6e2128812f7..3aaf52b981eb0 100644
--- a/llvm/include/llvm/IR/IntrinsicsAArch64.td
+++ b/llvm/include/llvm/IR/IntrinsicsAArch64.td
@@ -761,7 +761,7 @@ let TargetPrefix = "aarch64" in {
class RNDR_Intrinsic
: DefaultAttrsIntrinsic<[llvm_i64_ty, llvm_i1_ty], [], [IntrNoMem, IntrHasSideEffects]>;
class FPMR_Set_Intrinsic
- : DefaultAttrsIntrinsic<[], [llvm_i64_ty], [IntrWriteMem, IntrInaccessibleMemOnly]>;
+ : DefaultAttrsIntrinsic<[], [llvm_i64_ty], [IntrInaccessibleWriteMemOnly<AArch64_FPMR>]>;
}
// FP environment registers.
@@ -999,7 +999,7 @@ def int_aarch64_st64bv0: Intrinsic<[llvm_i64_ty], !listconcat([llvm_ptr_ty], dat
// Conversions
class AdvSIMD_FP8_1VectorArg_Long_Intrinsic
- : DefaultAttrsIntrinsic<[llvm_anyvector_ty], [llvm_anyvector_ty], [IntrReadMem, IntrInaccessibleMemOnly]>;
+ : DefaultAttrsIntrinsic<[llvm_anyvector_ty], [llvm_anyvector_ty], [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
def int_aarch64_neon_fp8_cvtl1 : AdvSIMD_FP8_1VectorArg_Long_Intrinsic;
def int_aarch64_neon_fp8_cvtl2 : AdvSIMD_FP8_1VectorArg_Long_Intrinsic;
@@ -1008,13 +1008,13 @@ def int_aarch64_st64bv0: Intrinsic<[llvm_i64_ty], !listconcat([llvm_ptr_ty], dat
: DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[llvm_anyvector_ty,
LLVMMatchType<1>],
- [IntrReadMem, IntrInaccessibleMemOnly]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
def int_aarch64_neon_fp8_fcvtn2
: DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>,
llvm_anyvector_ty,
LLVMMatchType<1>],
- [IntrReadMem, IntrInaccessibleMemOnly]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
// Dot-product
class AdvSIMD_FP8_DOT_Intrinsic
@@ -1022,14 +1022,14 @@ def int_aarch64_st64bv0: Intrinsic<[llvm_i64_ty], !listconcat([llvm_ptr_ty], dat
[LLVMMatchType<0>,
llvm_anyvector_ty,
LLVMMatchType<1>],
- [IntrReadMem, IntrInaccessibleMemOnly]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
class AdvSIMD_FP8_DOT_LANE_Intrinsic
: DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>,
llvm_anyvector_ty,
llvm_v16i8_ty,
llvm_i32_ty],
- [IntrReadMem, IntrInaccessibleMemOnly, ImmArg<ArgIndex<3>>]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>, ImmArg<ArgIndex<3>>]>;
def int_aarch64_neon_fp8_fdot2 : AdvSIMD_FP8_DOT_Intrinsic;
def int_aarch64_neon_fp8_fdot2_lane : AdvSIMD_FP8_DOT_LANE_Intrinsic;
@@ -1044,7 +1044,7 @@ def int_aarch64_st64bv0: Intrinsic<[llvm_i64_ty], !listconcat([llvm_ptr_ty], dat
[LLVMMatchType<0>,
llvm_v16i8_ty,
llvm_v16i8_ty],
- [IntrReadMem, IntrInaccessibleMemOnly]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
class AdvSIMD_FP8_FMLA_LANE_Intrinsic
: DefaultAttrsIntrinsic<[llvm_anyvector_ty],
@@ -1052,7 +1052,7 @@ def int_aarch64_st64bv0: Intrinsic<[llvm_i64_ty], !listconcat([llvm_ptr_ty], dat
llvm_v16i8_ty,
llvm_v16i8_ty,
llvm_i32_ty],
- [IntrReadMem, IntrInaccessibleMemOnly, ImmArg<ArgIndex<3>>]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>, ImmArg<ArgIndex<3>>]>;
def int_aarch64_neon_fp8_fmlalb : AdvSIMD_FP8_FMLA_Intrinsic;
def int_aarch64_neon_fp8_fmlalt : AdvSIMD_FP8_FMLA_Intrinsic;
@@ -3070,6 +3070,12 @@ let TargetPrefix = "aarch64" in {
llvm_anyvector_ty,
LLVMMatchType<0>], [ImmArg<ArgIndex<0>>, IntrInaccessibleMemOnly, IntrHasSideEffects]>;
+ class SME_FP8_OuterProduct_QuarterTile_Single_Single
+ : DefaultAttrsIntrinsic<[],
+ [llvm_i32_ty,
+ llvm_anyvector_ty,
+ LLVMMatchType<0>], [ImmArg<ArgIndex<0>>, IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>, IntrHasSideEffects]>;
+
class SME_OuterProduct_QuarterTile_Single_Multi
: DefaultAttrsIntrinsic<[],
[llvm_i32_ty,
@@ -3077,6 +3083,13 @@ let TargetPrefix = "aarch64" in {
LLVMMatchType<0>,
LLVMMatchType<0>], [ImmArg<ArgIndex<0>>, IntrInaccessibleMemOnly, IntrHasSideEffects]>;
+ class SME_FP8_OuterProduct_QuarterTile_Single_Multi
+ : DefaultAttrsIntrinsic<[],
+ [llvm_i32_ty,
+ llvm_anyvector_ty,
+ LLVMMatchType<0>,
+ LLVMMatchType<0>], [ImmArg<ArgIndex<0>>, IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>, IntrHasSideEffects]>;
+
class SME_OuterProduct_QuarterTile_Multi_Multi
: DefaultAttrsIntrinsic<[],
[llvm_i32_ty,
@@ -3085,6 +3098,14 @@ let TargetPrefix = "aarch64" in {
LLVMMatchType<0>,
LLVMMatchType<0>], [ImmArg<ArgIndex<0>>, IntrInaccessibleMemOnly, IntrHasSideEffects]>;
+ class SME_FP8_OuterProduct_QuarterTile_Multi_Multi
+ : DefaultAttrsIntrinsic<[],
+ [llvm_i32_ty,
+ llvm_anyvector_ty,
+ LLVMMatchType<0>,
+ LLVMMatchType<0>,
+ LLVMMatchType<0>], [ImmArg<ArgIndex<0>>, IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>, IntrHasSideEffects]>;
+
// 2-way and 4-way multi-vector signed/unsigned Quarter Tile Quarter Product A/S
foreach mode = ["s", "a"] in {
foreach za = ["", "_za64"] in {
@@ -3127,10 +3148,10 @@ let TargetPrefix = "aarch64" in {
// 16 and 32 bit multi-vector floating point 8 Quarter Tile Quarter Product
foreach za = ["za16", "za32"] in {
- def int_aarch64_sme_fp8_fmop4a_ # za # "_1x1" : SME_OuterProduct_QuarterTile_Single_Single;
- def int_aarch64_sme_fp8_fmop4a_ # za # "_1x2" : SME_OuterProduct_QuarterTile_Single_Multi;
- def int_aarch64_sme_fp8_fmop4a_ # za # "_2x1" : SME_OuterProduct_QuarterTile_Single_Multi;
- def int_aarch64_sme_fp8_fmop4a_ # za # "_2x2" : SME_OuterProduct_QuarterTile_Multi_Multi;
+ def int_aarch64_sme_fp8_fmop4a_ # za # "_1x1" : SME_FP8_OuterProduct_QuarterTile_Single_Single;
+ def int_aarch64_sme_fp8_fmop4a_ # za # "_1x2" : SME_FP8_OuterProduct_QuarterTile_Single_Multi;
+ def int_aarch64_sme_fp8_fmop4a_ # za # "_2x1" : SME_FP8_OuterProduct_QuarterTile_Single_Multi;
+ def int_aarch64_sme_fp8_fmop4a_ # za # "_2x2" : SME_FP8_OuterProduct_QuarterTile_Multi_Multi;
}
class SME_AddVectorToTile_Intrinsic
@@ -4027,7 +4048,7 @@ let TargetPrefix = "aarch64" in {
class SVE2_FP8_Cvt
: DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[llvm_nxv16i8_ty],
- [IntrReadMem, IntrInaccessibleMemOnly]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
def int_aarch64_sve_fp8_cvt1 : SVE2_FP8_Cvt;
def int_aarch64_sve_fp8_cvt2 : SVE2_FP8_Cvt;
@@ -4038,7 +4059,7 @@ let TargetPrefix = "aarch64" in {
class SVE2_FP8_Narrow_Cvt
: DefaultAttrsIntrinsic<[llvm_nxv16i8_ty],
[llvm_anyvector_ty, LLVMMatchType<0>],
- [IntrReadMem, IntrInaccessibleMemOnly]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
def int_aarch64_sve_fp8_cvtn : SVE2_FP8_Narrow_Cvt;
def int_aarch64_sve_fp8_cvtnb : SVE2_FP8_Narrow_Cvt;
@@ -4046,20 +4067,20 @@ let TargetPrefix = "aarch64" in {
def int_aarch64_sve_fp8_cvtnt
: DefaultAttrsIntrinsic<[llvm_nxv16i8_ty],
[llvm_nxv16i8_ty, llvm_anyvector_ty, LLVMMatchType<0>],
- [IntrReadMem, IntrInaccessibleMemOnly]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
// Dot product
class SVE2_FP8_FMLA_FDOT
: DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>,
llvm_nxv16i8_ty, llvm_nxv16i8_ty],
- [IntrReadMem, IntrInaccessibleMemOnly]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
class SVE2_FP8_FMLA_FDOT_Lane
: DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>,
llvm_nxv16i8_ty, llvm_nxv16i8_ty, llvm_i32_ty],
- [IntrReadMem, IntrInaccessibleMemOnly, ImmArg<ArgIndex<3>>]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>, ImmArg<ArgIndex<3>>]>;
def int_aarch64_sve_fp8_fdot : SVE2_FP8_FMLA_FDOT;
def int_aarch64_sve_fp8_fdot_lane : SVE2_FP8_FMLA_FDOT_Lane;
@@ -4086,69 +4107,69 @@ let TargetPrefix = "aarch64" in {
class SVE2_FP8_CVT_X2_Single_Intrinsic
: DefaultAttrsIntrinsic<[llvm_anyvector_ty, LLVMMatchType<0>],
[llvm_nxv16i8_ty],
- [IntrReadMem, IntrInaccessibleMemOnly]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
class SVE2_FP8_CVT_Single_X4_Intrinsic
: DefaultAttrsIntrinsic<[llvm_nxv16i8_ty],
[llvm_nxv4f32_ty, llvm_nxv4f32_ty, llvm_nxv4f32_ty, llvm_nxv4f32_ty],
- [IntrReadMem, IntrInaccessibleMemOnly]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
class SME_FP8_OuterProduct_Intrinsic
: DefaultAttrsIntrinsic<[],
[llvm_i32_ty,
llvm_nxv16i1_ty, llvm_nxv16i1_ty,
llvm_nxv16i8_ty, llvm_nxv16i8_ty],
- [ImmArg<ArgIndex<0>>, IntrInaccessibleMemOnly]>;
+ [ImmArg<ArgIndex<0>>, IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>]>;
class SME_FP8_ZA_LANE_VGx1_Intrinsic
: DefaultAttrsIntrinsic<[], [llvm_i32_ty,
llvm_nxv16i8_ty,
llvm_nxv16i8_ty,
llvm_i32_ty],
- [IntrInaccessibleMemOnly, ImmArg<ArgIndex<3>>]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>, ImmArg<ArgIndex<3>>]>;
class SME_FP8_ZA_LANE_VGx2_Intrinsic
: DefaultAttrsIntrinsic<[], [llvm_i32_ty,
llvm_nxv16i8_ty, llvm_nxv16i8_ty,
llvm_nxv16i8_ty,
llvm_i32_ty],
- [IntrInaccessibleMemOnly, ImmArg<ArgIndex<4>>]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>, ImmArg<ArgIndex<4>>]>;
class SME_FP8_ZA_LANE_VGx4_Intrinsic
: DefaultAttrsIntrinsic<[], [llvm_i32_ty,
llvm_nxv16i8_ty, llvm_nxv16i8_ty, llvm_nxv16i8_ty, llvm_nxv16i8_ty,
llvm_nxv16i8_ty,
llvm_i32_ty],
- [IntrInaccessibleMemOnly, ImmArg<ArgIndex<6>>]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>, ImmArg<ArgIndex<6>>]>;
class SME_FP8_ZA_SINGLE_VGx1_Intrinsic
: DefaultAttrsIntrinsic<[], [llvm_i32_ty,
llvm_nxv16i8_ty,
llvm_nxv16i8_ty],
- [IntrInaccessibleMemOnly]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>]>;
class SME_FP8_ZA_SINGLE_VGx2_Intrinsic
: DefaultAttrsIntrinsic<[], [llvm_i32_ty,
llvm_nxv16i8_ty, llvm_nxv16i8_ty,
llvm_nxv16i8_ty],
- [IntrInaccessibleMemOnly]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>]>;
class SME_FP8_ZA_SINGLE_VGx4_Intrinsic
: DefaultAttrsIntrinsic<[], [llvm_i32_ty,
llvm_nxv16i8_ty, llvm_nxv16i8_ty, llvm_nxv16i8_ty, llvm_nxv16i8_ty,
llvm_nxv16i8_ty],
- [IntrInaccessibleMemOnly]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>]>;
class SME_FP8_ZA_MULTI_VGx2_Intrinsic
: DefaultAttrsIntrinsic<[], [llvm_i32_ty,
llvm_nxv16i8_ty, llvm_nxv16i8_ty,
llvm_nxv16i8_ty, llvm_nxv16i8_ty],
- [IntrInaccessibleMemOnly]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>]>;
class SME_FP8_ZA_MULTI_VGx4_Intrinsic
: DefaultAttrsIntrinsic<[], [llvm_i32_ty,
llvm_nxv16i8_ty, llvm_nxv16i8_ty, llvm_nxv16i8_ty, llvm_nxv16i8_ty,
llvm_nxv16i8_ty, llvm_nxv16i8_ty, llvm_nxv16i8_ty, llvm_nxv16i8_ty],
- [IntrInaccessibleMemOnly]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>]>;
//
// CVT from FP8 to half-precision/BFloat16 multi-vector
//
@@ -4167,7 +4188,7 @@ let TargetPrefix = "aarch64" in {
def int_aarch64_sve_fp8_cvt_x2
: DefaultAttrsIntrinsic<[llvm_nxv16i8_ty],
[llvm_anyvector_ty, LLVMMatchType<0>],
- [IntrReadMem, IntrInaccessibleMemOnly]>;
+ [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
def int_aarch64_sve_fp8_cvt_x4 : SVE2_FP8_CVT_Single_X4_Intrinsic;
def int_aarch64_sve_fp8_cvtn_x4 : SVE2_FP8_CVT_Single_X4_Intrinsic;
diff --git a/llvm/include/llvm/Support/ModRef.h b/llvm/include/llvm/Support/ModRef.h
index 71f3b5bcb9c2b..53d14717f486b 100644
--- a/llvm/include/llvm/Support/ModRef.h
+++ b/llvm/include/llvm/Support/ModRef.h
@@ -56,6 +56,11 @@ enum class ModRefInfo : uint8_t {
/// Debug print ModRefInfo.
LLVM_ABI raw_ostream &operator<<(raw_ostream &OS, ModRefInfo MR);
+enum class InaccessibleTargetMemLocation {
+ AARCH64_FPMR = 3,
+ AARCH64_ZA = 4,
+};
+
/// The locations at which a function might access memory.
enum class IRMemLocation {
/// Access to memory via argument pointers.
@@ -65,7 +70,7 @@ enum class IRMemLocation {
/// Errno memory.
ErrnoMem = 2,
/// Any other memory.
- Other = 3,
+ Other = 5,
/// Helpers to iterate all locations in the MemoryEffectsBase class.
First = ArgMem,
@@ -152,6 +157,46 @@ template <typename LocationEnum> class MemoryEffectsBase {
return MemoryEffectsBase(Location::Other, MR);
}
+ /// Create MemoryEffectsBase that can only read inaccessible memory.
+ static MemoryEffectsBase
+ inaccessibleReadMemOnly(Location Loc = Location::InaccessibleMem) {
+ return MemoryEffectsBase(Loc, ModRefInfo::Ref);
+ }
+
+ /// Create MemoryEffectsBase that can only write inaccessible memory.
+ static MemoryEffectsBase
+ inaccessibleWriteMemOnly(Location Loc = Location::InaccessibleMem) {
+ return MemoryEffectsBase(Loc, ModRefInfo::Mod);
+ }
+
+ /// Create MemoryEffectsBase that can read write inaccessible memory.
+ static MemoryEffectsBase
+ inaccessibleReadWriteMem(Location Loc = Location::InaccessibleMem) {
+ return MemoryEffectsBase(Loc, ModRefInfo::ModRef);
+ }
+
+ /// Checks if only target-specific memory locations are set.
+ /// Ignores standard locations like ArgMe...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
…sics Enable more precise alias and dependency analysis between calls when reasoning about its operation on the same target memory location. The key motivation is to break unnecessary dependencies between calls when one only reads from a target memory location if followed by a call that only modifies it. If the second call does not access any other memory location, we conclude that the two calls are independent. For example: ``` call void @llvm.aarch64.set.fpmr(i64) ; Call0 call void @llvm.aarch64.sme.fp8.fdot.lane.za16.vg1x2(...) ; Call1 call void @llvm.aarch64.set.fpmr(i64) ; Call2 ``` Here, the dependency should exist only between Call0 (write) and Call1 (read). Call1 and Call2 both touch the same target location, but since Call1 is a read and Call2 is a write with no other side effects, they are independent of each other.. The implementation modifies the MemoryEffects query by checking target-specific memory locations (IRMemLocation) and relaxing Mod/Ref relations accordingly. This allows the optimizer to avoid conservatively chaining dependencies across otherwise independent target memory operations. This patch depends on how we implement PR#148650
66b1c28
to
fb5b7ca
Compare
…m location is used
Enable more precise alias and dependency analysis between calls when reasoning
about its operation on the same target memory location.