Skip to content

Conversation

choikwa
Copy link
Contributor

@choikwa choikwa commented Aug 20, 2025

This was motivated from looking at composable kernel benchmark where IPSCCP was observed replacing noalias ptr's and their derivatives with a global alias. Doing so would lose the noalias information and target backend was more pessimistic, emitting unneeded WAITCNT instructions.

Making it a draft as it's unclear if it's beneficially moving the needle.

…atives

This was motivated from looking at composable kernel benchmark where IPSCCP was observed replacing noalias ptr's and their derivatives with a global alias.
Doing so would lose the noalias information and target backend was more pessimistic, emitting unneeded WAITCNT instructions.
@llvmbot
Copy link
Member

llvmbot commented Aug 20, 2025

@llvm/pr-subscribers-function-specialization

@llvm/pr-subscribers-llvm-transforms

Author: choikwa (choikwa)

Changes

This was motivated from looking at composable kernel benchmark where IPSCCP was observed replacing noalias ptr's and their derivatives with a global alias. Doing so would lose the noalias information and target backend was more pessimistic, emitting unneeded WAITCNT instructions.

Making it a draft as it's unclear if it's beneficially moving the needle.


Full diff: https://github.com/llvm/llvm-project/pull/154522.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/SCCPSolver.cpp (+15)
diff --git a/llvm/lib/Transforms/Utils/SCCPSolver.cpp b/llvm/lib/Transforms/Utils/SCCPSolver.cpp
index 84485176ad4ff..3c06452a91f36 100644
--- a/llvm/lib/Transforms/Utils/SCCPSolver.cpp
+++ b/llvm/lib/Transforms/Utils/SCCPSolver.cpp
@@ -62,6 +62,21 @@ bool SCCPSolver::tryToReplaceWithConstant(Value *V) {
   Constant *Const = getConstantOrNull(V);
   if (!Const)
     return false;
+
+  // Don't replace noalias arg or derivatives
+  if (isa<PointerType>(V->getType())) {
+    SmallVector<const Value *, 4> Objects;
+    getUnderlyingObjects(V, Objects, nullptr);
+
+    for (const auto Obj : Objects) {
+      if (const auto *Arg = dyn_cast<Argument>(Obj)) {
+        if (isa<PointerType>(Arg->getType()) &&
+            Arg->hasNoAliasAttr())
+          return false;
+      }
+    }
+  }
+
   // Replacing `musttail` instructions with constant breaks `musttail` invariant
   // unless the call itself can be removed.
   // Calls with "clang.arc.attachedcall" implicitly use the return value and

@choikwa choikwa requested review from dobbelaj-snps and nikic August 20, 2025 12:22
Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions cpp -- llvm/lib/Transforms/Utils/SCCPSolver.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/Transforms/Utils/SCCPSolver.cpp b/llvm/lib/Transforms/Utils/SCCPSolver.cpp
index 3c06452a9..ac1a615a2 100644
--- a/llvm/lib/Transforms/Utils/SCCPSolver.cpp
+++ b/llvm/lib/Transforms/Utils/SCCPSolver.cpp
@@ -70,8 +70,7 @@ bool SCCPSolver::tryToReplaceWithConstant(Value *V) {
 
     for (const auto Obj : Objects) {
       if (const auto *Arg = dyn_cast<Argument>(Obj)) {
-        if (isa<PointerType>(Arg->getType()) &&
-            Arg->hasNoAliasAttr())
+        if (isa<PointerType>(Arg->getType()) && Arg->hasNoAliasAttr())
           return false;
       }
     }

@choikwa choikwa marked this pull request as draft August 20, 2025 12:37
@dobbelaj-snps
Copy link
Contributor

@choikwa What is the driving testcase for this change ?

@choikwa
Copy link
Contributor Author

choikwa commented Aug 21, 2025

The testcase is from composable kernel. I've attached the module in question, but the log dump is far too large to attach so I've attached cmd in lieu.
composable_kernel_example.zip

Some excerpts that demonstrate my findings:
define internal %"struct.ck_tile::buffer_view" @ZN7ck_tile16make_buffer_viewILNS_18address_space_enumE1ELNS_25amd_buffer_coherence_enumE11EKDF16blEEDaPT1_T2(ptr noalias noundef %p, i64 noundef %buffer_size) #3 {
entry:
...
Constant: ptr addrspacecast (ptr addrspace(3) @_ZZZNK7ck_tile13FmhaFwdKernelINS_34BlockFmhaPipelineQRKSVSAsyncTrloadINS_24BlockFmhaPipelineProblemIDF16bDF16bDF16bffDF16bhfDF16bfDF16bNS_13TileFmhaShapeINS_8sequenceIJLi128ELi64ELi32ELi128ELi16ELi128EEEENS4_IJLi4ELi1ELi1EEEENS4_IJLi32ELi32ELi16EEEES6_S7_Lb1EEELb0ENS_17ComposedAttentionILj0ELb1EEENS_30SimplifiedGenericAttentionMaskILb0EEELb1ENS_14TileFmhaTraitsILb0ELb0ELb0ELb0ELb0ELNS_22BlockAttentionBiasEnumE0ELb0ELb0ELb0ELb0ELin1ELb0EEEEENS_47BlockFmhaPipelineQRKSVSAsyncTrloadDefaultPolicyEEENS_17Default2DEpilogueINS_24Default2DEpilogueProblemIfDF16bLb0ELb0ELb1ELNS_21memory_operation_enumE0EEEvEEE4run_ENSO_21FmhaFwdBatchModeKargsEENKUlvE7_clEvE10smem_ptrk0 to ptr) = ptr %p

@dobbelaj-snps
Copy link
Contributor

Can you create a reduced testcase that shows the wanted effect ? (llvm-ir based). This is needed any way if you want this change to be acceptable. As you have access to a compiler with and without your change, you could use creduce and/or llvm-reduce to help you out for this.

@dobbelaj-snps
Copy link
Contributor

Thanks for the testcase. Did you happen to try this out with the Full Restrict version. My understanding is that that should keep the 'restrict' information just fine ?

@choikwa
Copy link
Contributor Author

choikwa commented Aug 22, 2025

I've tried to run llvm-reduce on CK module in the past, but even after running for 1.5day, it would fail and the reduced llir was still close to original size. Instead, I was luckily able to cook up a simpler example that demonstrates aforementioned behavior in the testcase commit.

The Diff:
{5A2295E7-0EB9-47D2-B279-407CB09903B6}

log dump:
opt.log

The problem starts with IPSCCP replacing %p in the callee body with @arr, ignoring the fact that %p is noalias ptr argument :
Constant: @arr = global [100 x i32] zeroinitializer, align 16 = ptr %p

@choikwa
Copy link
Contributor Author

choikwa commented Aug 22, 2025

Thanks for the testcase. Did you happen to try this out with the Full Restrict version. My understanding is that that should keep the 'restrict' information just fine ?

I have not. I can try with the full_restrict-update-20231215-02_ptr_provenance branch.

@choikwa
Copy link
Contributor Author

choikwa commented Aug 22, 2025

noalias-test.gz

Not sure why but the loads and store have !noalias !(p, q, r)

@choikwa
Copy link
Contributor Author

choikwa commented Aug 26, 2025

Not sure if the full restrict branch handles this case, but one thing seems certain -- if IPSCCP (in the trunk form) is allowed to run before the lowering of noalias arg attribute, it could potentially replace uses of them losing the noalias info. I'm not aware if this issue was seen and addressed already, but it seems like it needs either phase ordering change or passes like IPSCCP needs to be made aware of losing noalias info.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to create a PhaseOrdering test that shows this causes end-to-end optimization issues? Just losing noalias metadata by itself may not be a problem, as accesses on globals have better analysis capabilities than random pointers (especially with GlobalsAA in use).

@choikwa
Copy link
Contributor Author

choikwa commented Aug 26, 2025

I tried with a simple unrolled body, but most backends were able to schedule stores to the end. And it looks like the reason may be that ScopedNoAliasAAResult::alias implementation does allow for asymmetry (ie. missing scope in store) when determining NoAlias.

For my specific case, the issue was related to the existence of scope metadata on the instruction (Using LDS, a special cache-like memory). AMDGPU would pessimize the load from LDS by forcing stalls to wait for all but one of the memory traffic to finish before continuing.

So yes, some backends like AMDGPU may have quirks that may be more pessimistic when asymmetry is found, but the general case seems to be unaffected.

ret void
}

define void @callee(ptr noalias noundef %p, ptr noalias noundef %q, ptr noalias noundef %r, i32 noundef %len) #1 align 2 {
Copy link
Contributor

@brunodf-snps brunodf-snps Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the function @callee in your test case needs to be "internal" before IPSCCP can transform it. (Also, the alignment on the function is redundant for the test.)

@brunodf-snps
Copy link
Contributor

Is it possible to create a PhaseOrdering test that shows this causes end-to-end optimization issues?

FWIW, I think this can cause end-to-end optimization issues. Consider:

LINKAGE void loop(int * __restrict a, int *b, int *c) {
    for (int i = 0; i < 100; ++i)
        a[i] = b[i] + c[i];
}

extern int a[];

void test(int n) {
    loop(a, a + n, a + n + n);
}

Compiling this with an empty LINKAGE value prevents IPSCCP:

$ clang -target riscv32-unknown-linux-gnu -DLINKAGE= -S -emit-llvm -O2 -o - mytest.c | opt -disable-output -passes=aa-eval -evaluate-aa-metadata -print-all-alias-modref-info
...
Function: test: 3 pointers, 1 call sites
  MayAlias:     i32* %arrayidx.i, i32* %arrayidx1.i
  MayAlias:     i32* %arrayidx.i, i32* %arrayidx2.i
  MayAlias:     i32* %arrayidx1.i, i32* %arrayidx2.i
  NoAlias:   %0 = load i32, ptr %arrayidx.i, align 4, !tbaa !12, !noalias !9 <->   store i32 %add.i, ptr %arrayidx2.i, align 4, !tbaa !12, !alias.scope !9
  NoAlias:   %1 = load i32, ptr %arrayidx1.i, align 4, !tbaa !12, !noalias !9 <->   store i32 %add.i, ptr %arrayidx2.i, align 4, !tbaa !12, !alias.scope !9
  NoModRef:  Ptr: i32* %arrayidx.i      <->  tail call void @llvm.experimental.noalias.scope.decl(metadata !15)
  NoModRef:  Ptr: i32* %arrayidx1.i     <->  tail call void @llvm.experimental.noalias.scope.decl(metadata !15)
  NoModRef:  Ptr: i32* %arrayidx2.i     <->  tail call void @llvm.experimental.noalias.scope.decl(metadata !15)
...

Whereas with LINKAGE=static we enable IPSCPP:

$ clang -target riscv32-unknown-linux-gnu -DLINKAGE=static -S -emit-llvm -O2 -o - mytest.c | opt -disable-output -passes=aa-eval -evaluate-aa-metadata -print-all-alias-modref-info
Function: test: 3 pointers, 0 call sites
  MayAlias:     i32* %arrayidx.i, i32* %arrayidx1.i
  MayAlias:     i32* %arrayidx.i, i32* %arrayidx2.i
  MayAlias:     i32* %arrayidx1.i, i32* %arrayidx2.i
  MayAlias:   %0 = load i32, ptr %arrayidx.i, align 4, !tbaa !9 <->   store i32 %add.i, ptr %arrayidx2.i, align 4, !tbaa !9
  MayAlias:   %1 = load i32, ptr %arrayidx1.i, align 4, !tbaa !9 <->   store i32 %add.i, ptr %arrayidx2.i, align 4, !tbaa !9
...

So IPSCCP turns NoAlias load/store relations into MayAlias load/store relations.

I don't know if preventing the IPSCCP transformation is the answer though. With the full restrict patches, I think the restrict usage is captured in a noalias intrinsic and it would not show this problem.

@choikwa
Copy link
Contributor Author

choikwa commented Aug 28, 2025

I think with the trunk the only theoretical issue other than AMDGPU backend would be if IPSCCP replaced what would have been asymmetric scoped-noalias relation and stopped producing NoAlias result (but not sure how to generate such test or how likely that is).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants