Skip to content

Commit d606eae

Browse files
authored
[LV] Stop using the legacy cost model for udiv + friends (#152707)
In VPWidenRecipe::computeCost for the instructions udiv, sdiv, urem and srem we fall back on the legacy cost unnecessarily. At this point we know that the vplan must be functionally correct, i.e. if the divide/remainder is not safe to speculatively execute then we must have either: 1. Scalarised the operation, in which case we wouldn't be using a VPWidenRecipe, or 2. We've inserted a select for the second operand to ensure we don't fault through divide-by-zero. For 2) it's necessary to add the select operation to VPInstruction::computeCost so that we mirror the cost of the legacy cost model. The only problem with this is that we also generate selects in vplan for predicated loops with reductions, which *aren't* accounted for in the legacy cost model. In order to prevent asserts firing I've also added the selects to precomputeCosts to ensure the legacy costs match the vplan costs for reductions.
1 parent a1937d2 commit d606eae

File tree

2 files changed

+36
-2
lines changed

2 files changed

+36
-2
lines changed

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4286,6 +4286,25 @@ VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor() {
42864286
if (!VPI)
42874287
continue;
42884288
switch (VPI->getOpcode()) {
4289+
// Selects are only modelled in the legacy cost model for safe
4290+
// divisors.
4291+
case Instruction::Select: {
4292+
VPValue *VPV = VPI->getVPSingleValue();
4293+
if (VPV->getNumUsers() == 1) {
4294+
if (auto *WR = dyn_cast<VPWidenRecipe>(*VPV->user_begin())) {
4295+
switch (WR->getOpcode()) {
4296+
case Instruction::UDiv:
4297+
case Instruction::SDiv:
4298+
case Instruction::URem:
4299+
case Instruction::SRem:
4300+
continue;
4301+
default:
4302+
break;
4303+
}
4304+
}
4305+
}
4306+
[[fallthrough]];
4307+
}
42894308
case VPInstruction::ActiveLaneMask:
42904309
case VPInstruction::ExplicitVectorLength:
42914310
C += VPI->cost(VF, CostCtx);

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1034,6 +1034,19 @@ InstructionCost VPInstruction::computeCost(ElementCount VF,
10341034
}
10351035

10361036
switch (getOpcode()) {
1037+
case Instruction::Select: {
1038+
// TODO: It may be possible to improve this by analyzing where the
1039+
// condition operand comes from.
1040+
CmpInst::Predicate Pred = CmpInst::BAD_ICMP_PREDICATE;
1041+
auto *CondTy = Ctx.Types.inferScalarType(getOperand(0));
1042+
auto *VecTy = Ctx.Types.inferScalarType(getOperand(1));
1043+
if (!vputils::onlyFirstLaneUsed(this)) {
1044+
CondTy = toVectorTy(CondTy, VF);
1045+
VecTy = toVectorTy(VecTy, VF);
1046+
}
1047+
return Ctx.TTI.getCmpSelInstrCost(Instruction::Select, VecTy, CondTy, Pred,
1048+
Ctx.CostKind);
1049+
}
10371050
case Instruction::ExtractElement:
10381051
case VPInstruction::ExtractLane: {
10391052
if (VF.isScalar()) {
@@ -2128,8 +2141,10 @@ InstructionCost VPWidenRecipe::computeCost(ElementCount VF,
21282141
case Instruction::SDiv:
21292142
case Instruction::SRem:
21302143
case Instruction::URem:
2131-
// More complex computation, let the legacy cost-model handle this for now.
2132-
return Ctx.getLegacyCost(cast<Instruction>(getUnderlyingValue()), VF);
2144+
// If the div/rem operation isn't safe to speculate and requires
2145+
// predication, then the only way we can even create a vplan is to insert
2146+
// a select on the second input operand to ensure we use the value of 1
2147+
// for the inactive lanes. The select will be costed separately.
21332148
case Instruction::FNeg:
21342149
case Instruction::Add:
21352150
case Instruction::FAdd:

0 commit comments

Comments
 (0)