Skip to content

Conversation

CarolineConcatto
Copy link
Contributor

Enable more precise alias and dependency analysis between calls when reasoning
about its operation on the same target memory location.

The key motivation is to break unnecessary dependencies between calls when one
only reads from a target memory location if followed by a call that only modifies
it. If the second call does not access any other memory location, we conclude
that the two calls are independent.

For example:
```
  call void @llvm.aarch64.set.fpmr(i64)                 ; Call0
  call void @llvm.aarch64.sme.fp8.fdot.lane.za16.vg1x2(...) ; Call1
  call void @llvm.aarch64.set.fpmr(i64)                 ; Call2
```

Here, the dependency should exist only between Call0 (write) and Call1 (read).
Call1 and Call2 both touch the same target location, but since Call1 is a
read and Call2 is a write with no other side effects, they are
independent of each other..

The implementation modifies the MemoryEffects query by checking target-specific
memory locations (IRMemLocation) and relaxing Mod/Ref relations accordingly.
This allows the optimizer to avoid conservatively chaining dependencies across
otherwise independent target memory operations.

    This patch depends on how we implement PR#148650

This patch introduces preliminary support for additional memory locations,
such as FPMR and ZA, needed to model AArch64 architectural registers as
memory dependencies.

Currently, these locations are not yet target-specific. The goal is to enable
the compiler to express read/write effects on these resources.

What This Patch Does:
  Adds two new memory locations: FPMR and ZA, intended to represent
AArch64-specific inaccessible memory types.

Current Limitations:
  These new locations are not yet target-specific in the type-safe sense,
they are globally visible and hardcoded.
  There is no mechanism yet to associate a memory location with its
corresponding target (e.g., AArch64 vs RISCV).
  No changes are made yet to bitcode serialization, parser support, or alias
analysis behavior.

This patch is not functionally complete — it is a structural prototype to
solicit feedback on the direction and I would like some suggestion on
how to proceed.
This patch adds IntrInaccessibleReadWriteMem metadata to allow to set
ModRef at the same time for a Location.

This patch depends on how we implement PR#148650.
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AArch64 tablegen llvm:support llvm:ir llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Aug 27, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 27, 2025

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-llvm-support

@llvm/pr-subscribers-llvm-analysis

Author: None (CarolineConcatto)

Changes

Enable more precise alias and dependency analysis between calls when reasoning
about its operation on the same target memory location.

The key motivation is to break unnecessary dependencies between calls when one
only reads from a target memory location if followed by a call that only modifies
it. If the second call does not access any other memory location, we conclude
that the two calls are independent.

For example:
```
  call void @<!-- -->llvm.aarch64.set.fpmr(i64)                 ; Call0
  call void @<!-- -->llvm.aarch64.sme.fp8.fdot.lane.za16.vg1x2(...) ; Call1
  call void @<!-- -->llvm.aarch64.set.fpmr(i64)                 ; Call2
```

Here, the dependency should exist only between Call0 (write) and Call1 (read).
Call1 and Call2 both touch the same target location, but since Call1 is a
read and Call2 is a write with no other side effects, they are
independent of each other..

The implementation modifies the MemoryEffects query by checking target-specific
memory locations (IRMemLocation) and relaxing Mod/Ref relations accordingly.
This allows the optimizer to avoid conservatively chaining dependencies across
otherwise independent target memory operations.

    This patch depends on how we implement PR#<!-- -->148650

Patch is 51.64 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155590.diff

20 Files Affected:

  • (modified) clang/test/CodeGen/AArch64/attr-fp8-function.c (+17-8)
  • (modified) llvm/include/llvm/AsmParser/LLToken.h (+2)
  • (modified) llvm/include/llvm/IR/Intrinsics.td (+12)
  • (modified) llvm/include/llvm/IR/IntrinsicsAArch64.td (+50-29)
  • (modified) llvm/include/llvm/Support/ModRef.h (+51-1)
  • (modified) llvm/include/llvm/TableGen/Record.h (+3)
  • (modified) llvm/lib/Analysis/BasicAliasAnalysis.cpp (+29)
  • (modified) llvm/lib/AsmParser/LLLexer.cpp (+2)
  • (modified) llvm/lib/AsmParser/LLParser.cpp (+19-13)
  • (modified) llvm/lib/Bitcode/Reader/BitcodeReader.cpp (+4)
  • (modified) llvm/lib/IR/Attributes.cpp (+13)
  • (modified) llvm/lib/Support/ModRef.cpp (+9)
  • (modified) llvm/lib/TableGen/Record.cpp (+15)
  • (modified) llvm/lib/Transforms/IPO/FunctionAttrs.cpp (+3)
  • (modified) llvm/test/Assembler/memory-attribute.ll (+25)
  • (modified) llvm/test/Bitcode/attributes.ll (-1)
  • (added) llvm/test/TableGen/intrinsic-attrs-fp8.td (+110)
  • (added) llvm/test/Transforms/EarlyCSE/AArch64/fp8-target-memory.ll (+156)
  • (modified) llvm/unittests/Support/ModRefTest.cpp (+2-1)
  • (modified) llvm/utils/TableGen/Basic/CodeGenIntrinsics.cpp (+19-1)
diff --git a/clang/test/CodeGen/AArch64/attr-fp8-function.c b/clang/test/CodeGen/AArch64/attr-fp8-function.c
index 54bfd177bd809..62b910eafa4a7 100644
--- a/clang/test/CodeGen/AArch64/attr-fp8-function.c
+++ b/clang/test/CodeGen/AArch64/attr-fp8-function.c
@@ -18,20 +18,29 @@ svfloat16_t test_svcvtlt2_f16_mf8(svmfloat8_t zn, fpm_t fpm) __arm_streaming {
   return svcvtlt2_f16_mf8_fpm(zn, fpm);
 }
 
-// CHECK: declare void @llvm.aarch64.set.fpmr(i64) [[ATTR3:#.*]]
-// CHECK: declare <vscale x 8 x half> @llvm.aarch64.sve.fp8.cvtlt2.nxv8f16(<vscale x 16 x i8>) [[ATTR4:#.*]]
+// CHECK: declare void @llvm.aarch64.set.fpmr(i64) [[ATTR2:#.*]]
+// CHECK: declare <vscale x 8 x half> @llvm.aarch64.sve.fp8.cvtlt2.nxv8f16(<vscale x 16 x i8>) [[ATTR3:#.*]]
 
 
 // SME
+// With only fprm as inaccessible memory
 svfloat32_t test_svmlalltt_lane_f32_mf8(svfloat32_t zda, svmfloat8_t zn, svmfloat8_t zm, fpm_t fpm) __arm_streaming {
   return svmlalltt_lane_f32_mf8_fpm(zda, zn, zm, 7, fpm);
 }
 
-// CHECK: declare <vscale x 4 x float> @llvm.aarch64.sve.fp8.fmlalltt.lane.nxv4f32(<vscale x 4 x float>, <vscale x 16 x i8>, <vscale x 16 x i8>, i32 immarg) [[ATTR4]]
+// CHECK: declare <vscale x 4 x float> @llvm.aarch64.sve.fp8.fmlalltt.lane.nxv4f32(<vscale x 4 x float>, <vscale x 16 x i8>, <vscale x 16 x i8>, i32 immarg) [[ATTR3:#.*]]
 
-// CHECK: declare <16 x i8> @llvm.aarch64.neon.fp8.fcvtn.v16i8.v8f16(<8 x half>, <8 x half>) [[ATTR4]]
+// With fpmr and za as incaccessible memory
+void test_svdot_lane_za32_f8_vg1x2(uint32_t slice, svmfloat8x2_t zn, svmfloat8_t zm, fpm_t fpmr)  __arm_streaming __arm_inout("za") {
+  svdot_lane_za32_mf8_vg1x2_fpm(slice, zn, zm, 3, fpmr);
+}
+
+// CHECK: declare void @llvm.aarch64.sme.fp8.fdot.lane.za32.vg1x2(i32, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, i32 immarg) [[ATTR5:#.*]]
+// CHECK: declare <16 x i8> @llvm.aarch64.neon.fp8.fcvtn.v16i8.v8f16(<8 x half>, <8 x half>) [[ATTR3]]
 
-// CHECK: attributes [[ATTR1:#.*]] = {{{.*}}} 
-// CHECK: attributes [[ATTR2:#.*]] = {{{.*}}}
-// CHECK: attributes [[ATTR3]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: write) }
-// CHECK: attributes [[ATTR4]] = { nocallback nofree nosync nounwind willreturn memory(inaccessiblemem: read) }
+// CHECK: attributes [[ATTR0:#.*]] = {{{.*}}}
+// CHECK: attributes [[ATTR1:#.*]] = {{{.*}}}
+// CHECK: attributes [[ATTR2]] = { nocallback nofree nosync nounwind willreturn memory(aarch64_fpmr: write) }
+// CHECK: attributes [[ATTR3]] = { nocallback nofree nosync nounwind willreturn memory(aarch64_fpmr: read) }
+// CHECK: attributes [[ATTR4:#.*]] = {{{.*}}}
+// CHECK: attributes [[ATTR5:#.*]] = { nocallback nofree nosync nounwind willreturn memory(aarch64_fpmr: read, aarch64_za: readwrite) }
diff --git a/llvm/include/llvm/AsmParser/LLToken.h b/llvm/include/llvm/AsmParser/LLToken.h
index c7e4bdf3ff811..c08eb99c1f5b2 100644
--- a/llvm/include/llvm/AsmParser/LLToken.h
+++ b/llvm/include/llvm/AsmParser/LLToken.h
@@ -202,6 +202,8 @@ enum Kind {
   kw_readwrite,
   kw_argmem,
   kw_inaccessiblemem,
+  kw_aarch64_fpmr,
+  kw_aarch64_za,
   kw_errnomem,
 
   // Legacy attributes:
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index bd6f94ac1286c..33e89f88ef0d6 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -49,6 +49,18 @@ def IntrArgMemOnly : IntrinsicProperty;
 // accessible by the module being compiled. This is a weaker form of IntrNoMem.
 def IntrInaccessibleMemOnly : IntrinsicProperty;
 
+
+
+class IntrinsicMemoryLocation;
+// This should be added in the Target, but once in IntrinsicsAArch64.td
+// It complains error: "Variable not defined: 'AArch64_FPMR'"
+def AArch64_FPMR : IntrinsicMemoryLocation;
+def AArch64_ZA:   IntrinsicMemoryLocation;
+// IntrInaccessible{Read|Write}MemOnly needs to set Location
+class IntrInaccessibleReadMemOnly<IntrinsicMemoryLocation idx> : IntrinsicProperty{IntrinsicMemoryLocation Loc=idx;}
+class IntrInaccessibleWriteMemOnly<IntrinsicMemoryLocation idx> : IntrinsicProperty{IntrinsicMemoryLocation Loc=idx;}
+class IntrInaccessibleReadWriteMem<IntrinsicMemoryLocation idx> : IntrinsicProperty{IntrinsicMemoryLocation Loc=idx;}
+
 // IntrInaccessibleMemOrArgMemOnly -- This intrinsic only accesses memory that
 // its pointer-typed arguments point to or memory that is not accessible
 // by the module being compiled. This is a weaker form of IntrArgMemOnly.
diff --git a/llvm/include/llvm/IR/IntrinsicsAArch64.td b/llvm/include/llvm/IR/IntrinsicsAArch64.td
index ca6e2128812f7..3aaf52b981eb0 100644
--- a/llvm/include/llvm/IR/IntrinsicsAArch64.td
+++ b/llvm/include/llvm/IR/IntrinsicsAArch64.td
@@ -761,7 +761,7 @@ let TargetPrefix = "aarch64" in {
   class RNDR_Intrinsic
     : DefaultAttrsIntrinsic<[llvm_i64_ty, llvm_i1_ty], [], [IntrNoMem, IntrHasSideEffects]>;
   class FPMR_Set_Intrinsic
-    : DefaultAttrsIntrinsic<[], [llvm_i64_ty], [IntrWriteMem, IntrInaccessibleMemOnly]>;
+    : DefaultAttrsIntrinsic<[], [llvm_i64_ty], [IntrInaccessibleWriteMemOnly<AArch64_FPMR>]>;
 }
 
 // FP environment registers.
@@ -999,7 +999,7 @@ def int_aarch64_st64bv0: Intrinsic<[llvm_i64_ty], !listconcat([llvm_ptr_ty], dat
 
   // Conversions
   class AdvSIMD_FP8_1VectorArg_Long_Intrinsic
-    : DefaultAttrsIntrinsic<[llvm_anyvector_ty], [llvm_anyvector_ty], [IntrReadMem, IntrInaccessibleMemOnly]>;
+    : DefaultAttrsIntrinsic<[llvm_anyvector_ty], [llvm_anyvector_ty], [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
 
   def int_aarch64_neon_fp8_cvtl1   : AdvSIMD_FP8_1VectorArg_Long_Intrinsic;
   def int_aarch64_neon_fp8_cvtl2   : AdvSIMD_FP8_1VectorArg_Long_Intrinsic;
@@ -1008,13 +1008,13 @@ def int_aarch64_st64bv0: Intrinsic<[llvm_i64_ty], !listconcat([llvm_ptr_ty], dat
     : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
                             [llvm_anyvector_ty,
                              LLVMMatchType<1>],
-                            [IntrReadMem, IntrInaccessibleMemOnly]>;
+                            [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
   def int_aarch64_neon_fp8_fcvtn2
     : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
                             [LLVMMatchType<0>,
                              llvm_anyvector_ty,
                              LLVMMatchType<1>],
-                            [IntrReadMem, IntrInaccessibleMemOnly]>;
+                            [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
 
   // Dot-product
   class AdvSIMD_FP8_DOT_Intrinsic
@@ -1022,14 +1022,14 @@ def int_aarch64_st64bv0: Intrinsic<[llvm_i64_ty], !listconcat([llvm_ptr_ty], dat
                             [LLVMMatchType<0>,
                              llvm_anyvector_ty,
                              LLVMMatchType<1>],
-                             [IntrReadMem, IntrInaccessibleMemOnly]>;
+                             [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
   class AdvSIMD_FP8_DOT_LANE_Intrinsic
     : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
                             [LLVMMatchType<0>,
                              llvm_anyvector_ty,
                              llvm_v16i8_ty,
                              llvm_i32_ty],
-                             [IntrReadMem, IntrInaccessibleMemOnly, ImmArg<ArgIndex<3>>]>;
+                             [IntrInaccessibleReadMemOnly<AArch64_FPMR>, ImmArg<ArgIndex<3>>]>;
 
   def int_aarch64_neon_fp8_fdot2 : AdvSIMD_FP8_DOT_Intrinsic;
   def int_aarch64_neon_fp8_fdot2_lane : AdvSIMD_FP8_DOT_LANE_Intrinsic;
@@ -1044,7 +1044,7 @@ def int_aarch64_st64bv0: Intrinsic<[llvm_i64_ty], !listconcat([llvm_ptr_ty], dat
                             [LLVMMatchType<0>,
                              llvm_v16i8_ty,
                              llvm_v16i8_ty],
-                             [IntrReadMem, IntrInaccessibleMemOnly]>;
+                             [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
 
   class AdvSIMD_FP8_FMLA_LANE_Intrinsic
     : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
@@ -1052,7 +1052,7 @@ def int_aarch64_st64bv0: Intrinsic<[llvm_i64_ty], !listconcat([llvm_ptr_ty], dat
                              llvm_v16i8_ty,
                              llvm_v16i8_ty,
                              llvm_i32_ty],
-                             [IntrReadMem, IntrInaccessibleMemOnly, ImmArg<ArgIndex<3>>]>;
+                             [IntrInaccessibleReadMemOnly<AArch64_FPMR>, ImmArg<ArgIndex<3>>]>;
 
   def int_aarch64_neon_fp8_fmlalb : AdvSIMD_FP8_FMLA_Intrinsic;
   def int_aarch64_neon_fp8_fmlalt : AdvSIMD_FP8_FMLA_Intrinsic;
@@ -3070,6 +3070,12 @@ let TargetPrefix = "aarch64" in {
           llvm_anyvector_ty,
           LLVMMatchType<0>], [ImmArg<ArgIndex<0>>, IntrInaccessibleMemOnly, IntrHasSideEffects]>;
 
+ class SME_FP8_OuterProduct_QuarterTile_Single_Single
+      : DefaultAttrsIntrinsic<[],
+          [llvm_i32_ty,
+          llvm_anyvector_ty,
+          LLVMMatchType<0>], [ImmArg<ArgIndex<0>>, IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>, IntrHasSideEffects]>;
+
   class SME_OuterProduct_QuarterTile_Single_Multi
       : DefaultAttrsIntrinsic<[],
           [llvm_i32_ty,
@@ -3077,6 +3083,13 @@ let TargetPrefix = "aarch64" in {
           LLVMMatchType<0>,
           LLVMMatchType<0>], [ImmArg<ArgIndex<0>>, IntrInaccessibleMemOnly, IntrHasSideEffects]>;
 
+  class SME_FP8_OuterProduct_QuarterTile_Single_Multi
+      : DefaultAttrsIntrinsic<[],
+          [llvm_i32_ty,
+          llvm_anyvector_ty,
+          LLVMMatchType<0>,
+          LLVMMatchType<0>], [ImmArg<ArgIndex<0>>, IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>, IntrHasSideEffects]>;
+
   class SME_OuterProduct_QuarterTile_Multi_Multi
       : DefaultAttrsIntrinsic<[],
           [llvm_i32_ty,
@@ -3085,6 +3098,14 @@ let TargetPrefix = "aarch64" in {
           LLVMMatchType<0>,
           LLVMMatchType<0>], [ImmArg<ArgIndex<0>>, IntrInaccessibleMemOnly, IntrHasSideEffects]>;
 
+  class SME_FP8_OuterProduct_QuarterTile_Multi_Multi
+      : DefaultAttrsIntrinsic<[],
+          [llvm_i32_ty,
+          llvm_anyvector_ty,
+          LLVMMatchType<0>,
+          LLVMMatchType<0>,
+          LLVMMatchType<0>], [ImmArg<ArgIndex<0>>, IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>, IntrHasSideEffects]>;
+
   // 2-way and 4-way multi-vector signed/unsigned Quarter Tile Quarter Product A/S
   foreach mode = ["s", "a"] in {
     foreach za = ["", "_za64"] in {
@@ -3127,10 +3148,10 @@ let TargetPrefix = "aarch64" in {
 
   // 16 and 32 bit multi-vector floating point 8 Quarter Tile Quarter Product
   foreach za = ["za16", "za32"] in {
-    def int_aarch64_sme_fp8_fmop4a_ # za # "_1x1" : SME_OuterProduct_QuarterTile_Single_Single;
-    def int_aarch64_sme_fp8_fmop4a_ # za # "_1x2" : SME_OuterProduct_QuarterTile_Single_Multi;
-    def int_aarch64_sme_fp8_fmop4a_ # za # "_2x1" : SME_OuterProduct_QuarterTile_Single_Multi;
-    def int_aarch64_sme_fp8_fmop4a_ # za # "_2x2" : SME_OuterProduct_QuarterTile_Multi_Multi;
+    def int_aarch64_sme_fp8_fmop4a_ # za # "_1x1" : SME_FP8_OuterProduct_QuarterTile_Single_Single;
+    def int_aarch64_sme_fp8_fmop4a_ # za # "_1x2" : SME_FP8_OuterProduct_QuarterTile_Single_Multi;
+    def int_aarch64_sme_fp8_fmop4a_ # za # "_2x1" : SME_FP8_OuterProduct_QuarterTile_Single_Multi;
+    def int_aarch64_sme_fp8_fmop4a_ # za # "_2x2" : SME_FP8_OuterProduct_QuarterTile_Multi_Multi;
   }
 
   class SME_AddVectorToTile_Intrinsic
@@ -4027,7 +4048,7 @@ let TargetPrefix = "aarch64" in {
   class SVE2_FP8_Cvt
     : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
                             [llvm_nxv16i8_ty],
-                            [IntrReadMem, IntrInaccessibleMemOnly]>;
+                            [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
 
   def int_aarch64_sve_fp8_cvt1   : SVE2_FP8_Cvt;
   def int_aarch64_sve_fp8_cvt2   : SVE2_FP8_Cvt;
@@ -4038,7 +4059,7 @@ let TargetPrefix = "aarch64" in {
   class SVE2_FP8_Narrow_Cvt
     : DefaultAttrsIntrinsic<[llvm_nxv16i8_ty],
                             [llvm_anyvector_ty, LLVMMatchType<0>],
-                            [IntrReadMem, IntrInaccessibleMemOnly]>;
+                            [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
   
   def int_aarch64_sve_fp8_cvtn  : SVE2_FP8_Narrow_Cvt;
   def int_aarch64_sve_fp8_cvtnb : SVE2_FP8_Narrow_Cvt;
@@ -4046,20 +4067,20 @@ let TargetPrefix = "aarch64" in {
   def int_aarch64_sve_fp8_cvtnt
     : DefaultAttrsIntrinsic<[llvm_nxv16i8_ty],
                             [llvm_nxv16i8_ty, llvm_anyvector_ty, LLVMMatchType<0>],
-                            [IntrReadMem, IntrInaccessibleMemOnly]>;
+                            [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
 
   // Dot product
   class SVE2_FP8_FMLA_FDOT
     : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
                             [LLVMMatchType<0>,
                              llvm_nxv16i8_ty, llvm_nxv16i8_ty],
-                            [IntrReadMem, IntrInaccessibleMemOnly]>;
+                            [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
   
   class SVE2_FP8_FMLA_FDOT_Lane
     : DefaultAttrsIntrinsic<[llvm_anyvector_ty],
                             [LLVMMatchType<0>,
                              llvm_nxv16i8_ty, llvm_nxv16i8_ty, llvm_i32_ty],
-                            [IntrReadMem, IntrInaccessibleMemOnly, ImmArg<ArgIndex<3>>]>;
+                            [IntrInaccessibleReadMemOnly<AArch64_FPMR>, ImmArg<ArgIndex<3>>]>;
   
   def int_aarch64_sve_fp8_fdot      : SVE2_FP8_FMLA_FDOT;
   def int_aarch64_sve_fp8_fdot_lane : SVE2_FP8_FMLA_FDOT_Lane;
@@ -4086,69 +4107,69 @@ let TargetPrefix = "aarch64" in {
   class SVE2_FP8_CVT_X2_Single_Intrinsic
     : DefaultAttrsIntrinsic<[llvm_anyvector_ty, LLVMMatchType<0>],
                             [llvm_nxv16i8_ty],
-                            [IntrReadMem, IntrInaccessibleMemOnly]>;
+                            [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
 
   class SVE2_FP8_CVT_Single_X4_Intrinsic
     : DefaultAttrsIntrinsic<[llvm_nxv16i8_ty],
                             [llvm_nxv4f32_ty, llvm_nxv4f32_ty, llvm_nxv4f32_ty, llvm_nxv4f32_ty],
-                            [IntrReadMem, IntrInaccessibleMemOnly]>;
+                            [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
 
   class SME_FP8_OuterProduct_Intrinsic
       : DefaultAttrsIntrinsic<[],
           [llvm_i32_ty,
           llvm_nxv16i1_ty, llvm_nxv16i1_ty,
           llvm_nxv16i8_ty, llvm_nxv16i8_ty],
-          [ImmArg<ArgIndex<0>>, IntrInaccessibleMemOnly]>;
+          [ImmArg<ArgIndex<0>>, IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>]>;
 
   class SME_FP8_ZA_LANE_VGx1_Intrinsic
    : DefaultAttrsIntrinsic<[], [llvm_i32_ty,
                                llvm_nxv16i8_ty,
                                llvm_nxv16i8_ty,
                                llvm_i32_ty],
-                          [IntrInaccessibleMemOnly, ImmArg<ArgIndex<3>>]>;
+                          [IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>, ImmArg<ArgIndex<3>>]>;
 
   class SME_FP8_ZA_LANE_VGx2_Intrinsic
     : DefaultAttrsIntrinsic<[], [llvm_i32_ty,
                                 llvm_nxv16i8_ty, llvm_nxv16i8_ty,
                                 llvm_nxv16i8_ty,
                                 llvm_i32_ty],
-                            [IntrInaccessibleMemOnly, ImmArg<ArgIndex<4>>]>;
+                            [IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>, ImmArg<ArgIndex<4>>]>;
 
   class SME_FP8_ZA_LANE_VGx4_Intrinsic
    : DefaultAttrsIntrinsic<[], [llvm_i32_ty,
                                 llvm_nxv16i8_ty, llvm_nxv16i8_ty, llvm_nxv16i8_ty, llvm_nxv16i8_ty,
                                 llvm_nxv16i8_ty,
                                 llvm_i32_ty],
-                            [IntrInaccessibleMemOnly, ImmArg<ArgIndex<6>>]>;
+                            [IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>, ImmArg<ArgIndex<6>>]>;
   class SME_FP8_ZA_SINGLE_VGx1_Intrinsic
     : DefaultAttrsIntrinsic<[], [llvm_i32_ty,
                                 llvm_nxv16i8_ty,
                                 llvm_nxv16i8_ty],
-                            [IntrInaccessibleMemOnly]>;
+                            [IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>]>;
 
   class SME_FP8_ZA_SINGLE_VGx2_Intrinsic
     : DefaultAttrsIntrinsic<[], [llvm_i32_ty,
                                 llvm_nxv16i8_ty, llvm_nxv16i8_ty,
                                 llvm_nxv16i8_ty],
-                            [IntrInaccessibleMemOnly]>;
+                            [IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>]>;
 
   class SME_FP8_ZA_SINGLE_VGx4_Intrinsic
     : DefaultAttrsIntrinsic<[], [llvm_i32_ty,
                                 llvm_nxv16i8_ty, llvm_nxv16i8_ty, llvm_nxv16i8_ty, llvm_nxv16i8_ty,
                                 llvm_nxv16i8_ty],
-                              [IntrInaccessibleMemOnly]>;
+                              [IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>]>;
 
   class SME_FP8_ZA_MULTI_VGx2_Intrinsic
     : DefaultAttrsIntrinsic<[], [llvm_i32_ty,
                                  llvm_nxv16i8_ty, llvm_nxv16i8_ty,
                                  llvm_nxv16i8_ty, llvm_nxv16i8_ty],
-                            [IntrInaccessibleMemOnly]>;
+                            [IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>]>;
 
   class SME_FP8_ZA_MULTI_VGx4_Intrinsic
     : DefaultAttrsIntrinsic<[], [llvm_i32_ty,
                                  llvm_nxv16i8_ty, llvm_nxv16i8_ty, llvm_nxv16i8_ty, llvm_nxv16i8_ty,
                                  llvm_nxv16i8_ty, llvm_nxv16i8_ty, llvm_nxv16i8_ty, llvm_nxv16i8_ty],
-                            [IntrInaccessibleMemOnly]>;
+                            [IntrInaccessibleReadMemOnly<AArch64_FPMR>, IntrInaccessibleReadWriteMem<AArch64_ZA>]>;
   //
   // CVT from FP8 to half-precision/BFloat16 multi-vector
   //
@@ -4167,7 +4188,7 @@ let TargetPrefix = "aarch64" in {
   def int_aarch64_sve_fp8_cvt_x2
     : DefaultAttrsIntrinsic<[llvm_nxv16i8_ty],
                             [llvm_anyvector_ty, LLVMMatchType<0>],
-                            [IntrReadMem, IntrInaccessibleMemOnly]>;
+                            [IntrInaccessibleReadMemOnly<AArch64_FPMR>]>;
 
   def int_aarch64_sve_fp8_cvt_x4  : SVE2_FP8_CVT_Single_X4_Intrinsic;
   def int_aarch64_sve_fp8_cvtn_x4 : SVE2_FP8_CVT_Single_X4_Intrinsic;
diff --git a/llvm/include/llvm/Support/ModRef.h b/llvm/include/llvm/Support/ModRef.h
index 71f3b5bcb9c2b..53d14717f486b 100644
--- a/llvm/include/llvm/Support/ModRef.h
+++ b/llvm/include/llvm/Support/ModRef.h
@@ -56,6 +56,11 @@ enum class ModRefInfo : uint8_t {
 /// Debug print ModRefInfo.
 LLVM_ABI raw_ostream &operator<<(raw_ostream &OS, ModRefInfo MR);
 
+enum class InaccessibleTargetMemLocation {
+  AARCH64_FPMR = 3,
+  AARCH64_ZA = 4,
+};
+
 /// The locations at which a function might access memory.
 enum class IRMemLocation {
   /// Access to memory via argument pointers.
@@ -65,7 +70,7 @@ enum class IRMemLocation {
   /// Errno memory.
   ErrnoMem = 2,
   /// Any other memory.
-  Other = 3,
+  Other = 5,
 
   /// Helpers to iterate all locations in the MemoryEffectsBase class.
   First = ArgMem,
@@ -152,6 +157,46 @@ template <typename LocationEnum> class MemoryEffectsBase {
     return MemoryEffectsBase(Location::Other, MR);
   }
 
+  /// Create MemoryEffectsBase that can only read inaccessible memory.
+  static MemoryEffectsBase
+  inaccessibleReadMemOnly(Location Loc = Location::InaccessibleMem) {
+    return MemoryEffectsBase(Loc, ModRefInfo::Ref);
+  }
+
+  /// Create MemoryEffectsBase that can only write inaccessible memory.
+  static MemoryEffectsBase
+  inaccessibleWriteMemOnly(Location Loc = Location::InaccessibleMem) {
+    return MemoryEffectsBase(Loc, ModRefInfo::Mod);
+  }
+
+  /// Create MemoryEffectsBase that can read write inaccessible memory.
+  static MemoryEffectsBase
+  inaccessibleReadWriteMem(Location Loc = Location::InaccessibleMem) {
+    return MemoryEffectsBase(Loc, ModRefInfo::ModRef);
+  }
+
+  /// Checks if only target-specific memory locations are set.
+  /// Ignores standard locations like ArgMe...
[truncated]

@CarolineConcatto CarolineConcatto marked this pull request as draft August 27, 2025 10:21
Copy link

github-actions bot commented Aug 27, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

…sics

Enable more precise alias and dependency analysis between calls when reasoning
about its  operation on the same target memory location.

The key motivation is to break unnecessary dependencies between calls when one
only reads from a target memory location if followed by a call that only modifies
it. If the second call does not access any other memory location, we conclude
that the two calls are independent.

For example:
```
  call void @llvm.aarch64.set.fpmr(i64)                 ; Call0
  call void @llvm.aarch64.sme.fp8.fdot.lane.za16.vg1x2(...) ; Call1
  call void @llvm.aarch64.set.fpmr(i64)                 ; Call2
```

Here, the dependency should exist only between Call0 (write) and Call1 (read).
Call1 and Call2 both touch the same target location, but since Call1 is a
read and Call2 is a write with no other side effects, they are
independent of each other..

The implementation modifies the MemoryEffects query by checking target-specific
memory locations (IRMemLocation) and relaxing Mod/Ref relations accordingly.
This allows the optimizer to avoid conservatively chaining dependencies across
otherwise independent target memory operations.

    This patch depends on how we implement PR#148650
@CarolineConcatto CarolineConcatto force-pushed the fpmr_metadata_update_fp8_intrinsics_cont branch from 66b1c28 to fb5b7ca Compare August 27, 2025 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 clang Clang issues not falling into any other category llvm:analysis Includes value tracking, cost tables and constant folding llvm:ir llvm:support llvm:transforms tablegen
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants