-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[IPSCCP] Don't replace with constant if Value is noalias ptr or derivatives #154522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…atives This was motivated from looking at composable kernel benchmark where IPSCCP was observed replacing noalias ptr's and their derivatives with a global alias. Doing so would lose the noalias information and target backend was more pessimistic, emitting unneeded WAITCNT instructions.
@llvm/pr-subscribers-function-specialization @llvm/pr-subscribers-llvm-transforms Author: choikwa (choikwa) ChangesThis was motivated from looking at composable kernel benchmark where IPSCCP was observed replacing noalias ptr's and their derivatives with a global alias. Doing so would lose the noalias information and target backend was more pessimistic, emitting unneeded WAITCNT instructions. Making it a draft as it's unclear if it's beneficially moving the needle. Full diff: https://github.com/llvm/llvm-project/pull/154522.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/SCCPSolver.cpp b/llvm/lib/Transforms/Utils/SCCPSolver.cpp
index 84485176ad4ff..3c06452a91f36 100644
--- a/llvm/lib/Transforms/Utils/SCCPSolver.cpp
+++ b/llvm/lib/Transforms/Utils/SCCPSolver.cpp
@@ -62,6 +62,21 @@ bool SCCPSolver::tryToReplaceWithConstant(Value *V) {
Constant *Const = getConstantOrNull(V);
if (!Const)
return false;
+
+ // Don't replace noalias arg or derivatives
+ if (isa<PointerType>(V->getType())) {
+ SmallVector<const Value *, 4> Objects;
+ getUnderlyingObjects(V, Objects, nullptr);
+
+ for (const auto Obj : Objects) {
+ if (const auto *Arg = dyn_cast<Argument>(Obj)) {
+ if (isa<PointerType>(Arg->getType()) &&
+ Arg->hasNoAliasAttr())
+ return false;
+ }
+ }
+ }
+
// Replacing `musttail` instructions with constant breaks `musttail` invariant
// unless the call itself can be removed.
// Calls with "clang.arc.attachedcall" implicitly use the return value and
|
You can test this locally with the following command:git-clang-format --diff HEAD~1 HEAD --extensions cpp -- llvm/lib/Transforms/Utils/SCCPSolver.cpp View the diff from clang-format here.diff --git a/llvm/lib/Transforms/Utils/SCCPSolver.cpp b/llvm/lib/Transforms/Utils/SCCPSolver.cpp
index 3c06452a9..ac1a615a2 100644
--- a/llvm/lib/Transforms/Utils/SCCPSolver.cpp
+++ b/llvm/lib/Transforms/Utils/SCCPSolver.cpp
@@ -70,8 +70,7 @@ bool SCCPSolver::tryToReplaceWithConstant(Value *V) {
for (const auto Obj : Objects) {
if (const auto *Arg = dyn_cast<Argument>(Obj)) {
- if (isa<PointerType>(Arg->getType()) &&
- Arg->hasNoAliasAttr())
+ if (isa<PointerType>(Arg->getType()) && Arg->hasNoAliasAttr())
return false;
}
}
|
@choikwa What is the driving testcase for this change ? |
The testcase is from composable kernel. I've attached the module in question, but the log dump is far too large to attach so I've attached cmd in lieu. Some excerpts that demonstrate my findings: |
Can you create a reduced testcase that shows the wanted effect ? (llvm-ir based). This is needed any way if you want this change to be acceptable. As you have access to a compiler with and without your change, you could use creduce and/or llvm-reduce to help you out for this. |
Thanks for the testcase. Did you happen to try this out with the Full Restrict version. My understanding is that that should keep the 'restrict' information just fine ? |
I've tried to run llvm-reduce on CK module in the past, but even after running for 1.5day, it would fail and the reduced llir was still close to original size. Instead, I was luckily able to cook up a simpler example that demonstrates aforementioned behavior in the testcase commit. log dump: The problem starts with IPSCCP replacing %p in the callee body with @arr, ignoring the fact that %p is noalias ptr argument : |
I have not. I can try with the full_restrict-update-20231215-02_ptr_provenance branch. |
Not sure why but the loads and store have !noalias !(p, q, r) |
Not sure if the full restrict branch handles this case, but one thing seems certain -- if IPSCCP (in the trunk form) is allowed to run before the lowering of noalias arg attribute, it could potentially replace uses of them losing the noalias info. I'm not aware if this issue was seen and addressed already, but it seems like it needs either phase ordering change or passes like IPSCCP needs to be made aware of losing noalias info. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to create a PhaseOrdering test that shows this causes end-to-end optimization issues? Just losing noalias metadata by itself may not be a problem, as accesses on globals have better analysis capabilities than random pointers (especially with GlobalsAA in use).
I tried with a simple unrolled body, but most backends were able to schedule stores to the end. And it looks like the reason may be that ScopedNoAliasAAResult::alias implementation does allow for asymmetry (ie. missing scope in store) when determining NoAlias. For my specific case, the issue was related to the existence of scope metadata on the instruction (Using LDS, a special cache-like memory). AMDGPU would pessimize the load from LDS by forcing stalls to wait for all but one of the memory traffic to finish before continuing. So yes, some backends like AMDGPU may have quirks that may be more pessimistic when asymmetry is found, but the general case seems to be unaffected. |
ret void | ||
} | ||
|
||
define void @callee(ptr noalias noundef %p, ptr noalias noundef %q, ptr noalias noundef %r, i32 noundef %len) #1 align 2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the function @callee
in your test case needs to be "internal" before IPSCCP can transform it. (Also, the alignment on the function is redundant for the test.)
FWIW, I think this can cause end-to-end optimization issues. Consider: LINKAGE void loop(int * __restrict a, int *b, int *c) {
for (int i = 0; i < 100; ++i)
a[i] = b[i] + c[i];
}
extern int a[];
void test(int n) {
loop(a, a + n, a + n + n);
} Compiling this with an empty LINKAGE value prevents IPSCCP:
Whereas with LINKAGE=static we enable IPSCPP:
So IPSCCP turns NoAlias load/store relations into MayAlias load/store relations. I don't know if preventing the IPSCCP transformation is the answer though. With the full restrict patches, I think the restrict usage is captured in a noalias intrinsic and it would not show this problem. |
I think with the trunk the only theoretical issue other than AMDGPU backend would be if IPSCCP replaced what would have been asymmetric scoped-noalias relation and stopped producing NoAlias result (but not sure how to generate such test or how likely that is). |
This was motivated from looking at composable kernel benchmark where IPSCCP was observed replacing noalias ptr's and their derivatives with a global alias. Doing so would lose the noalias information and target backend was more pessimistic, emitting unneeded WAITCNT instructions.
Making it a draft as it's unclear if it's beneficially moving the needle.