-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[LV] Stop using the legacy cost model for udiv + friends #152707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: David Sherwood (david-arm) ChangesIn VPWidenRecipe::computeCost for the instructions udiv, sdiv, urem and srem we fall back on the legacy cost unnecessarily. At this point we know that the vplan must be functionally correct, i.e. if the divide/remainder is not safe to speculatively execute then we must have either:
For 2) it's necessary to add the select operation to VPInstruction::computeCost so that we mirror the cost of the legacy cost model. The only problem with this is that we also generate selects in vplan for predicated loops with reductions, which aren't accounted for in the legacy cost model. In order to prevent asserts firing I've also added the selects to precomputeCosts to ensure the legacy costs match the vplan costs for reductions. Full diff: https://github.com/llvm/llvm-project/pull/152707.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index be00fd6a416e5..e79ae67b09a12 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -4337,6 +4337,9 @@ VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor() {
if (!VPI)
continue;
switch (VPI->getOpcode()) {
+ // Selects are not modelled in the legacy cost model if they are
+ // inserted for reductions.
+ case Instruction::Select:
case VPInstruction::ActiveLaneMask:
case VPInstruction::ExplicitVectorLength:
C += VPI->cost(VF, CostCtx);
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index e971ba1aac15c..d0406214a6364 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -952,6 +952,15 @@ InstructionCost VPInstruction::computeCost(ElementCount VF,
}
switch (getOpcode()) {
+ case Instruction::Select: {
+ // TODO: It may be possible to improve this by analyzing where the
+ // condition operand comes from.
+ CmpInst::Predicate Pred = CmpInst::BAD_ICMP_PREDICATE;
+ auto *CondTy = toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF);
+ auto *VecTy = toVectorTy(Ctx.Types.inferScalarType(getOperand(1)), VF);
+ return Ctx.TTI.getCmpSelInstrCost(Instruction::Select, VecTy, CondTy, Pred,
+ Ctx.CostKind);
+ }
case Instruction::ExtractElement:
case VPInstruction::ExtractLane: {
// Add on the cost of extracting the element.
@@ -2007,8 +2016,10 @@ InstructionCost VPWidenRecipe::computeCost(ElementCount VF,
case Instruction::SDiv:
case Instruction::SRem:
case Instruction::URem:
- // More complex computation, let the legacy cost-model handle this for now.
- return Ctx.getLegacyCost(cast<Instruction>(getUnderlyingValue()), VF);
+ // If the div/rem operation isn't safe to speculate and requires
+ // predication, then the only way we can even create a vplan is to insert
+ // a select on the second input operand to ensure we use the value of 1
+ // for the inactive lanes. The select will be costed separately.
case Instruction::Add:
case Instruction::FAdd:
case Instruction::Sub:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable to me.
auto *CondTy = toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF); | ||
auto *VecTy = toVectorTy(Ctx.Types.inferScalarType(getOperand(1)), VF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the type is a scalar type? Maybe we need to consider vputils::onlyFirstLaneUsed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think you're right it could be a scalar. I missed that, thanks for spotting! In tryAddExplicitVectorLength we do:
VPValue *AVLSafe =
Plan.getOrAddLiveIn(ConstantInt::get(CanIVTy, *MaxSafeElements));
VPValue *Cmp = Builder.createICmp(ICmpInst::ICMP_ULT, AVL, AVLSafe);
AVL = Builder.createSelect(Cmp, AVL, AVLSafe, DebugLoc(), "safe_avl");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll run this PR on a set of different configs to check for potential regressions.
Hi @fhahn, did you get a chance to run this PR on some configs? I'd like to land the patch today or tomorrow as part of work to remove the legacy cost model from vplan and ultimately remove the legacy/vplan cost model assert. I figure the worst thing that can happen is it triggers the assert somewhere and the patch needs reverting or tweaking. |
Yeah I hit a number of crashes, just need to clean up the test cases. Will add them tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test that showed a crash should be https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll#L1359. There were a few others I think, but probably the same root cause for all of them |
OK I didn't see any crashes when I tested this, so perhaps something has changed? |
I rebased just now and ran the tests in llvm/test/Transforms/LoopVectorize and they all pass fine with this PR. |
Can you provide steps to reproduce the crash you observed? Do I need to run the test files with different arguments or modify them in any way? |
I pulled the changes from the PR and rebased on current main and it crashed as expected. Not sure what the difference could be, is there any chance you have some other local changes? Could just update the PR to be on top of current main? |
OK, I've reproduced the problem. It's a marathon, not a sprint it seems. :) So basically in order to remove the legacy cost model for the divides from vplan you need to also add the cost of the selects to vplan. Adding the cost of selects to vplan then triggers the legacy cost model assert because of completely unrelated reduction loops that require selects, except the selects in this case aren't modelled in the legacy cost model. That meant I had to add the cost of the selects for reductions back in to the legacy cost model. Doing so now triggers the assert again because the legacy cost model is adding the select cost twice for the divides! I'll have to hack something into |
5ee1232
to
31c6ec4
Compare
In VPWidenRecipe::computeCost for the instructions udiv, sdiv, urem and srem we fall back on the legacy cost unnecessarily. At this point we know that the vplan must be functionally correct, i.e. if the divide/remainder is not safe to speculatively execute then we must have either: 1. Scalarised the operation, in which case we wouldn't be using a VPWidenRecipe, or 2. We've inserted a select for the second operand to ensure we don't fault through divide-by-zero. For 2) it's necessary to add the select operation to VPInstruction::computeCost so that we mirror the cost of the legacy cost model. The only problem with this is that we also generate selects in vplan for predicated loops with reductions, which *aren't* accounted for in the legacy cost model. In order to prevent asserts firing I've also added the selects to precomputeCosts to ensure the legacy costs match the vplan costs for reductions.
31c6ec4
to
2ccec61
Compare
VPValue *V = | ||
R.getNumDefinedValues() == 1 ? R.getVPSingleValue() : nullptr; | ||
if (V && V->getNumUsers() == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a VPInstruction not always a VPSingleDefRecipe? Does this work
VPValue *V = | |
R.getNumDefinedValues() == 1 ? R.getVPSingleValue() : nullptr; | |
if (V && V->getNumUsers() == 1) { | |
if (VPI->getVPSingleValue()->getNumUsers() == 1) { |
// Selects are not modelled in the legacy cost model if they are | ||
// inserted for reductions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a bit to understand the relation between reductions and divisions below, would it be a bit clearer to frame it something like "Selects are only modelled in the legacy cost model for safe divisors"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, will check again if there are any crashes remaining
Rebased, ran make check-all and ran the LLVM test suite downstream. All seemed fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be good, I didn't hit any other crashes with the latest version, thanks
This has now been fixed by #152707
In VPWidenRecipe::computeCost for the instructions udiv, sdiv, urem and srem we fall back on the legacy cost unnecessarily. At this point we know that the vplan must be functionally correct, i.e. if the divide/remainder is not safe to speculatively execute then we must have either:
For 2) it's necessary to add the select operation to VPInstruction::computeCost so that we mirror the cost of the legacy cost model. The only problem with this is that we also generate selects in vplan for predicated loops with reductions, which aren't accounted for in the legacy cost model. In order to prevent asserts firing I've also added the selects to precomputeCosts to ensure the legacy costs match the vplan costs for reductions.