Skip to content

Conversation

david-arm
Copy link
Contributor

In VPWidenRecipe::computeCost for the instructions udiv, sdiv, urem and srem we fall back on the legacy cost unnecessarily. At this point we know that the vplan must be functionally correct, i.e. if the divide/remainder is not safe to speculatively execute then we must have either:

  1. Scalarised the operation, in which case we wouldn't be using a VPWidenRecipe, or
  2. We've inserted a select for the second operand to ensure we don't fault through divide-by-zero.

For 2) it's necessary to add the select operation to VPInstruction::computeCost so that we mirror the cost of the legacy cost model. The only problem with this is that we also generate selects in vplan for predicated loops with reductions, which aren't accounted for in the legacy cost model. In order to prevent asserts firing I've also added the selects to precomputeCosts to ensure the legacy costs match the vplan costs for reductions.

@llvmbot
Copy link
Member

llvmbot commented Aug 8, 2025

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: David Sherwood (david-arm)

Changes

In VPWidenRecipe::computeCost for the instructions udiv, sdiv, urem and srem we fall back on the legacy cost unnecessarily. At this point we know that the vplan must be functionally correct, i.e. if the divide/remainder is not safe to speculatively execute then we must have either:

  1. Scalarised the operation, in which case we wouldn't be using a VPWidenRecipe, or
  2. We've inserted a select for the second operand to ensure we don't fault through divide-by-zero.

For 2) it's necessary to add the select operation to VPInstruction::computeCost so that we mirror the cost of the legacy cost model. The only problem with this is that we also generate selects in vplan for predicated loops with reductions, which aren't accounted for in the legacy cost model. In order to prevent asserts firing I've also added the selects to precomputeCosts to ensure the legacy costs match the vplan costs for reductions.


Full diff: https://github.com/llvm/llvm-project/pull/152707.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+3)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+13-2)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index be00fd6a416e5..e79ae67b09a12 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -4337,6 +4337,9 @@ VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor() {
           if (!VPI)
             continue;
           switch (VPI->getOpcode()) {
+          // Selects are not modelled in the legacy cost model if they are
+          // inserted for reductions.
+          case Instruction::Select:
           case VPInstruction::ActiveLaneMask:
           case VPInstruction::ExplicitVectorLength:
             C += VPI->cost(VF, CostCtx);
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index e971ba1aac15c..d0406214a6364 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -952,6 +952,15 @@ InstructionCost VPInstruction::computeCost(ElementCount VF,
   }
 
   switch (getOpcode()) {
+  case Instruction::Select: {
+    // TODO: It may be possible to improve this by analyzing where the
+    // condition operand comes from.
+    CmpInst::Predicate Pred = CmpInst::BAD_ICMP_PREDICATE;
+    auto *CondTy = toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF);
+    auto *VecTy = toVectorTy(Ctx.Types.inferScalarType(getOperand(1)), VF);
+    return Ctx.TTI.getCmpSelInstrCost(Instruction::Select, VecTy, CondTy, Pred,
+                                      Ctx.CostKind);
+  }
   case Instruction::ExtractElement:
   case VPInstruction::ExtractLane: {
     // Add on the cost of extracting the element.
@@ -2007,8 +2016,10 @@ InstructionCost VPWidenRecipe::computeCost(ElementCount VF,
   case Instruction::SDiv:
   case Instruction::SRem:
   case Instruction::URem:
-    // More complex computation, let the legacy cost-model handle this for now.
-    return Ctx.getLegacyCost(cast<Instruction>(getUnderlyingValue()), VF);
+    // If the div/rem operation isn't safe to speculate and requires
+    // predication, then the only way we can even create a vplan is to insert
+    // a select on the second input operand to ensure we use the value of 1
+    // for the inactive lanes. The select will be costed separately.
   case Instruction::Add:
   case Instruction::FAdd:
   case Instruction::Sub:

Copy link
Contributor

@kmclaughlin-arm kmclaughlin-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me.

Comment on lines 959 to 960
auto *CondTy = toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF);
auto *VecTy = toVectorTy(Ctx.Types.inferScalarType(getOperand(1)), VF);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the type is a scalar type? Maybe we need to consider vputils::onlyFirstLaneUsed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think you're right it could be a scalar. I missed that, thanks for spotting! In tryAddExplicitVectorLength we do:

    VPValue *AVLSafe =
        Plan.getOrAddLiveIn(ConstantInt::get(CanIVTy, *MaxSafeElements));
    VPValue *Cmp = Builder.createICmp(ICmpInst::ICMP_ULT, AVL, AVLSafe);
    AVL = Builder.createSelect(Cmp, AVL, AVLSafe, DebugLoc(), "safe_avl");

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll run this PR on a set of different configs to check for potential regressions.

@david-arm
Copy link
Contributor Author

I'll run this PR on a set of different configs to check for potential regressions.

Hi @fhahn, did you get a chance to run this PR on some configs? I'd like to land the patch today or tomorrow as part of work to remove the legacy cost model from vplan and ultimately remove the legacy/vplan cost model assert. I figure the worst thing that can happen is it triggers the assert somewhere and the patch needs reverting or tweaking.

@fhahn
Copy link
Contributor

fhahn commented Aug 18, 2025

Yeah I hit a number of crashes, just need to clean up the test cases. Will add them tomorrow

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I accidentally ended up writing the same patch in #154468 to fix #154103 :)

@fhahn
Copy link
Contributor

fhahn commented Aug 20, 2025

I think the test that showed a crash should be https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll#L1359.

There were a few others I think, but probably the same root cause for all of them

@david-arm
Copy link
Contributor Author

I think the test that showed a crash should be https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll#L1359.

There were a few others I think, but probably the same root cause for all of them

OK I didn't see any crashes when I tested this, so perhaps something has changed?

@david-arm
Copy link
Contributor Author

I rebased just now and ran the tests in llvm/test/Transforms/LoopVectorize and they all pass fine with this PR.

@david-arm
Copy link
Contributor Author

Can you provide steps to reproduce the crash you observed? Do I need to run the test files with different arguments or modify them in any way?

@fhahn
Copy link
Contributor

fhahn commented Aug 20, 2025

I rebased just now and ran the tests in llvm/test/Transforms/LoopVectorize and they all pass fine with this PR.

I pulled the changes from the PR and rebased on current main and it crashed as expected. Not sure what the difference could be, is there any chance you have some other local changes? Could just update the PR to be on top of current main?

@david-arm
Copy link
Contributor Author

OK, I've reproduced the problem. It's a marathon, not a sprint it seems. :) So basically in order to remove the legacy cost model for the divides from vplan you need to also add the cost of the selects to vplan. Adding the cost of selects to vplan then triggers the legacy cost model assert because of completely unrelated reduction loops that require selects, except the selects in this case aren't modelled in the legacy cost model. That meant I had to add the cost of the selects for reductions back in to the legacy cost model. Doing so now triggers the assert again because the legacy cost model is adding the select cost twice for the divides! I'll have to hack something into LoopVectorizationPlanner::selectVectorizationFactor that only adds the cost of the select to the loop if it's used by something that isn't a divide.

In VPWidenRecipe::computeCost for the instructions udiv, sdiv,
urem and srem we fall back on the legacy cost unnecessarily. At
this point we know that the vplan must be functionally correct, i.e.
if the divide/remainder is not safe to speculatively execute then
we must have either:

1. Scalarised the operation, in which case we wouldn't be using
a VPWidenRecipe, or
2. We've inserted a select for the second operand to ensure we
don't fault through divide-by-zero.

For 2) it's necessary to add the select operation to
VPInstruction::computeCost so that we mirror the cost of the
legacy cost model. The only problem with this is that we also
generate selects in vplan for predicated loops with reductions,
which *aren't* accounted for in the legacy cost model. In order
to prevent asserts firing I've also added the selects to
precomputeCosts to ensure the legacy costs match the vplan costs
for reductions.
Comment on lines 4284 to 4286
VPValue *V =
R.getNumDefinedValues() == 1 ? R.getVPSingleValue() : nullptr;
if (V && V->getNumUsers() == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a VPInstruction not always a VPSingleDefRecipe? Does this work

Suggested change
VPValue *V =
R.getNumDefinedValues() == 1 ? R.getVPSingleValue() : nullptr;
if (V && V->getNumUsers() == 1) {
if (VPI->getVPSingleValue()->getNumUsers() == 1) {

Comment on lines 4281 to 4282
// Selects are not modelled in the legacy cost model if they are
// inserted for reductions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a bit to understand the relation between reductions and divisions below, would it be a bit clearer to frame it something like "Selects are only modelled in the legacy cost model for safe divisors"?

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, will check again if there are any crashes remaining

@lukel97
Copy link
Contributor

lukel97 commented Aug 21, 2025

I checked this out locally and can confirm this fixes #154103, and it also builds llvm-test-suite + SPEC CPU 2017 without crashing. I can commit the test case for #154103 after this lands

@david-arm
Copy link
Contributor Author

Rebased, ran make check-all and ran the LLVM test suite downstream. All seemed fine.

@david-arm david-arm merged commit d606eae into llvm:main Aug 26, 2025
9 checks passed
Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be good, I didn't hit any other crashes with the latest version, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants