-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[DependenceAnalysis] Extending SIV to handle fusable loops #128782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: Alireza Torabian (1997alireza) ChangesWhen there is a dependency between two memory instructions in separate loops, SIV will be able to test them and compute the direction and the distance of the dependency. Patch is 48.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128782.diff 4 Files Affected:
diff --git a/llvm/include/llvm/Analysis/DependenceAnalysis.h b/llvm/include/llvm/Analysis/DependenceAnalysis.h
index 426ac757b4b0d..8e86c091c60a2 100644
--- a/llvm/include/llvm/Analysis/DependenceAnalysis.h
+++ b/llvm/include/llvm/Analysis/DependenceAnalysis.h
@@ -97,10 +97,11 @@ namespace llvm {
bool PeelFirst : 1; // Peeling the first iteration will break dependence.
bool PeelLast : 1; // Peeling the last iteration will break the dependence.
bool Splitable : 1; // Splitting the loop will break dependence.
+ bool SeparateLoops : 1; // Is performed across two separate loop nests.
const SCEV *Distance = nullptr; // NULL implies no distance available.
DVEntry()
: Direction(ALL), Scalar(true), PeelFirst(false), PeelLast(false),
- Splitable(false) {}
+ Splitable(false), SeparateLoops(false) {}
};
/// getSrc - Returns the source instruction for this dependence.
@@ -182,6 +183,10 @@ namespace llvm {
/// the dependence.
virtual bool isSplitable(unsigned Level) const { return false; }
+ /// inSeparateLoops - Returns true if this level is performed across
+ /// two separate loop nests.
+ virtual bool inSeparateLoops(unsigned Level) const { return false; }
+
/// isScalar - Returns true if a particular level is scalar; that is,
/// if no subscript in the source or destination mention the induction
/// variable associated with the loop at this level.
@@ -275,6 +280,10 @@ namespace llvm {
/// the dependence.
bool isSplitable(unsigned Level) const override;
+ /// inSeparateLoops - Returns true if this level is performed across
+ /// two separate loop nests.
+ bool inSeparateLoops(unsigned Level) const override;
+
/// isScalar - Returns true if a particular level is scalar; that is,
/// if no subscript in the source or destination mention the induction
/// variable associated with the loop at this level.
@@ -405,7 +414,8 @@ namespace llvm {
const SCEV *A;
const SCEV *B;
const SCEV *C;
- const Loop *AssociatedLoop;
+ const Loop *AssociatedSrcLoop;
+ const Loop *AssociatedDstLoop;
public:
/// isEmpty - Return true if the constraint is of kind Empty.
@@ -449,18 +459,25 @@ namespace llvm {
/// Otherwise assert.
const SCEV *getD() const;
- /// getAssociatedLoop - Returns the loop associated with this constraint.
- const Loop *getAssociatedLoop() const;
+ /// getAssociatedSrcLoop - Returns the source loop associated with this
+ /// constraint.
+ const Loop *getAssociatedSrcLoop() const;
+
+ /// getAssociatedDstLoop - Returns the destination loop associated with
+ /// this constraint.
+ const Loop *getAssociatedDstLoop() const;
/// setPoint - Change a constraint to Point.
- void setPoint(const SCEV *X, const SCEV *Y, const Loop *CurrentLoop);
+ void setPoint(const SCEV *X, const SCEV *Y, const Loop *CurrentSrcLoop,
+ const Loop *CurrentDstLoop);
/// setLine - Change a constraint to Line.
- void setLine(const SCEV *A, const SCEV *B,
- const SCEV *C, const Loop *CurrentLoop);
+ void setLine(const SCEV *A, const SCEV *B, const SCEV *C,
+ const Loop *CurrentSrcLoop, const Loop *CurrentDstLoop);
/// setDistance - Change a constraint to Distance.
- void setDistance(const SCEV *D, const Loop *CurrentLoop);
+ void setDistance(const SCEV *D, const Loop *CurrentSrcLoop,
+ const Loop *CurrentDstLoop);
/// setEmpty - Change a constraint to Empty.
void setEmpty();
@@ -473,6 +490,10 @@ namespace llvm {
void dump(raw_ostream &OS) const;
};
+ /// Returns true if two loops are the same or they have the same upperbound
+ /// and depth
+ bool areLoopsSimilar(const Loop *SrcLoop, const Loop *DstLoop) const;
+
/// establishNestingLevels - Examines the loop nesting of the Src and Dst
/// instructions and establishes their shared loops. Sets the variables
/// CommonLevels, SrcLevels, and MaxLevels.
@@ -523,10 +544,22 @@ namespace llvm {
/// e - 5
/// f - 6
/// g - 7 = MaxLevels
- void establishNestingLevels(const Instruction *Src,
- const Instruction *Dst);
-
- unsigned CommonLevels, SrcLevels, MaxLevels;
+ /// If ConsiderSeparateLoops is true then we also want to consider similar
+ /// seperate loops. Assume that loop nests at level c and e are similar,
+ /// meaning that they have the same upperbound and depth. Then we consider
+ /// them as a common level.
+ /// a - 1
+ /// b - 2
+ /// <c, e> - 3 = CommonLevels
+ /// d - 4 = SrcLevels
+ /// f - 5
+ /// g - 6 = MaxLevels
+ /// SeparateLevels means that how many of the last common levels are
+ /// separated, which is 1 in this case.
+ void establishNestingLevels(const Instruction *Src, const Instruction *Dst,
+ bool ConsiderSeparateLoops = false);
+
+ unsigned CommonLevels, SrcLevels, MaxLevels, SeparateLevels;
/// mapSrcLoop - Given one of the loops containing the source, return
/// its level index in our numbering scheme.
@@ -668,7 +701,8 @@ namespace llvm {
bool strongSIVtest(const SCEV *Coeff,
const SCEV *SrcConst,
const SCEV *DstConst,
- const Loop *CurrentLoop,
+ const Loop *CurrentSrcLoop,
+ const Loop *CurrentDstLoop,
unsigned Level,
FullDependence &Result,
Constraint &NewConstraint) const;
@@ -686,7 +720,8 @@ namespace llvm {
bool weakCrossingSIVtest(const SCEV *SrcCoeff,
const SCEV *SrcConst,
const SCEV *DstConst,
- const Loop *CurrentLoop,
+ const Loop *CurrentSrcLoop,
+ const Loop *CurrentDstLoop,
unsigned Level,
FullDependence &Result,
Constraint &NewConstraint,
@@ -705,7 +740,8 @@ namespace llvm {
const SCEV *DstCoeff,
const SCEV *SrcConst,
const SCEV *DstConst,
- const Loop *CurrentLoop,
+ const Loop *CurrentSrcLoop,
+ const Loop *CurrentDstLoop,
unsigned Level,
FullDependence &Result,
Constraint &NewConstraint) const;
@@ -723,7 +759,8 @@ namespace llvm {
bool weakZeroSrcSIVtest(const SCEV *DstCoeff,
const SCEV *SrcConst,
const SCEV *DstConst,
- const Loop *CurrentLoop,
+ const Loop *CurrentSrcLoop,
+ const Loop *CurrentDstLoop,
unsigned Level,
FullDependence &Result,
Constraint &NewConstraint) const;
@@ -741,7 +778,8 @@ namespace llvm {
bool weakZeroDstSIVtest(const SCEV *SrcCoeff,
const SCEV *SrcConst,
const SCEV *DstConst,
- const Loop *CurrentLoop,
+ const Loop *CurrentSrcLoop,
+ const Loop *CurrentDstLoop,
unsigned Level,
FullDependence &Result,
Constraint &NewConstraint) const;
diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp b/llvm/lib/Analysis/DependenceAnalysis.cpp
index dc0ed22dbcc0b..b947e92a6375b 100644
--- a/llvm/lib/Analysis/DependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/DependenceAnalysis.cpp
@@ -104,6 +104,7 @@ STATISTIC(GCDindependence, "GCD independence");
STATISTIC(BanerjeeApplications, "Banerjee applications");
STATISTIC(BanerjeeIndependence, "Banerjee independence");
STATISTIC(BanerjeeSuccesses, "Banerjee successes");
+STATISTIC(SeparateLoopsConsidered, "Separate loops considered");
static cl::opt<bool>
Delinearize("da-delinearize", cl::init(true), cl::Hidden,
@@ -377,6 +378,13 @@ bool FullDependence::isSplitable(unsigned Level) const {
}
+// Returns true if this level is performed across two separate loop nests.
+bool FullDependence::inSeparateLoops(unsigned Level) const {
+ assert(0 < Level && Level <= Levels && "Level out of range");
+ return DV[Level - 1].SeparateLoops;
+}
+
+
//===----------------------------------------------------------------------===//
// DependenceInfo::Constraint methods
@@ -431,37 +439,52 @@ const SCEV *DependenceInfo::Constraint::getD() const {
}
-// Returns the loop associated with this constraint.
-const Loop *DependenceInfo::Constraint::getAssociatedLoop() const {
+// Returns the source loop associated with this constraint.
+const Loop *DependenceInfo::Constraint::getAssociatedSrcLoop() const {
+ assert((Kind == Distance || Kind == Line || Kind == Point) &&
+ "Kind should be Distance, Line, or Point");
+ return AssociatedSrcLoop;
+}
+
+
+// Returns the destination loop associated with this constraint.
+const Loop *DependenceInfo::Constraint::getAssociatedDstLoop() const {
assert((Kind == Distance || Kind == Line || Kind == Point) &&
"Kind should be Distance, Line, or Point");
- return AssociatedLoop;
+ return AssociatedDstLoop;
}
+
void DependenceInfo::Constraint::setPoint(const SCEV *X, const SCEV *Y,
- const Loop *CurLoop) {
+ const Loop *CurSrcLoop,
+ const Loop *CurDstLoop) {
Kind = Point;
A = X;
B = Y;
- AssociatedLoop = CurLoop;
+ AssociatedSrcLoop = CurSrcLoop;
+ AssociatedDstLoop = CurDstLoop;
}
void DependenceInfo::Constraint::setLine(const SCEV *AA, const SCEV *BB,
- const SCEV *CC, const Loop *CurLoop) {
+ const SCEV *CC, const Loop *CurSrcLoop,
+ const Loop *CurDstLoop) {
Kind = Line;
A = AA;
B = BB;
C = CC;
- AssociatedLoop = CurLoop;
+ AssociatedSrcLoop = CurSrcLoop;
+ AssociatedDstLoop = CurDstLoop;
}
void DependenceInfo::Constraint::setDistance(const SCEV *D,
- const Loop *CurLoop) {
+ const Loop *CurSrcLoop,
+ const Loop *CurDstLoop) {
Kind = Distance;
A = SE->getOne(D->getType());
B = SE->getNegativeSCEV(A);
C = SE->getNegativeSCEV(D);
- AssociatedLoop = CurLoop;
+ AssociatedSrcLoop = CurSrcLoop;
+ AssociatedDstLoop = CurDstLoop;
}
void DependenceInfo::Constraint::setEmpty() { Kind = Empty; }
@@ -608,8 +631,8 @@ bool DependenceInfo::intersectConstraints(Constraint *X, const Constraint *Y) {
++DeltaSuccesses;
return true;
}
- if (const SCEVConstant *CUB =
- collectConstantUpperBound(X->getAssociatedLoop(), Prod1->getType())) {
+ if (const SCEVConstant *CUB = collectConstantUpperBound(
+ X->getAssociatedSrcLoop(), Prod1->getType())) {
const APInt &UpperBound = CUB->getAPInt();
LLVM_DEBUG(dbgs() << "\t\tupper bound = " << UpperBound << "\n");
if (Xq.sgt(UpperBound) || Yq.sgt(UpperBound)) {
@@ -620,7 +643,8 @@ bool DependenceInfo::intersectConstraints(Constraint *X, const Constraint *Y) {
}
X->setPoint(SE->getConstant(Xq),
SE->getConstant(Yq),
- X->getAssociatedLoop());
+ X->getAssociatedSrcLoop(),
+ X->getAssociatedDstLoop());
++DeltaSuccesses;
return true;
}
@@ -656,6 +680,7 @@ bool DependenceInfo::intersectConstraints(Constraint *X, const Constraint *Y) {
// For debugging purposes. Dumps a dependence to OS.
void Dependence::dump(raw_ostream &OS) const {
bool Splitable = false;
+ bool SeparatesStarted = false;
if (isConfused())
OS << "confused";
else {
@@ -672,6 +697,10 @@ void Dependence::dump(raw_ostream &OS) const {
unsigned Levels = getLevels();
OS << " [";
for (unsigned II = 1; II <= Levels; ++II) {
+ if (!SeparatesStarted && inSeparateLoops(II)) {
+ SeparatesStarted = true;
+ OS << "/ ";
+ }
if (isSplitable(II))
Splitable = true;
if (isPeelFirst(II))
@@ -758,6 +787,35 @@ bool isLoadOrStore(const Instruction *I) {
return false;
}
+// Returns true if two loops are the same or they have the same tripcount and
+// depth
+bool DependenceInfo::areLoopsSimilar(const Loop *SrcLoop,
+ const Loop *DstLoop) const {
+ if (SrcLoop == DstLoop)
+ return true;
+
+ if (SrcLoop->getLoopDepth() != DstLoop->getLoopDepth())
+ return false;
+
+ if (!SrcLoop || !SrcLoop->getLoopLatch() || !DstLoop ||
+ !DstLoop->getLoopLatch())
+ return false;
+
+ const SCEV *SrcUB, *DstUP;
+ if (SE->hasLoopInvariantBackedgeTakenCount(SrcLoop))
+ SrcUB = SE->getBackedgeTakenCount(SrcLoop);
+ if (SE->hasLoopInvariantBackedgeTakenCount(DstLoop))
+ DstUP = SE->getBackedgeTakenCount(DstLoop);
+
+ if (SrcUB == nullptr || DstUP == nullptr)
+ return false;
+
+ if (SE->isKnownPredicate(ICmpInst::ICMP_EQ, SrcUB, DstUP))
+ return true;
+
+ return false;
+}
+
// Examines the loop nesting of the Src and Dst
// instructions and establishes their shared loops. Sets the variables
@@ -809,8 +867,21 @@ bool isLoadOrStore(const Instruction *I) {
// e - 5
// f - 6
// g - 7 = MaxLevels
+// If ConsiderSeparateLoops is true then we also want to consider similar
+// seperate loops. Assume that loop nests at level c and e are similar,
+// meaning that they have the same tripcount and depth. Then we consider
+// them as a common level.
+// a - 1
+// b - 2
+// <c, e> - 3 = CommonLevels
+// d - 4 = SrcLevels
+// f - 5
+// g - 6 = MaxLevels
+// SeparateLevels means that how many of the last common levels are
+// separated, which is 1 in this case.
void DependenceInfo::establishNestingLevels(const Instruction *Src,
- const Instruction *Dst) {
+ const Instruction *Dst,
+ bool ConsiderSeparateLoops) {
const BasicBlock *SrcBlock = Src->getParent();
const BasicBlock *DstBlock = Dst->getParent();
unsigned SrcLevel = LI->getLoopDepth(SrcBlock);
@@ -819,6 +890,7 @@ void DependenceInfo::establishNestingLevels(const Instruction *Src,
const Loop *DstLoop = LI->getLoopFor(DstBlock);
SrcLevels = SrcLevel;
MaxLevels = SrcLevel + DstLevel;
+ SeparateLevels = 0;
while (SrcLevel > DstLevel) {
SrcLoop = SrcLoop->getParentLoop();
SrcLevel--;
@@ -827,11 +899,23 @@ void DependenceInfo::establishNestingLevels(const Instruction *Src,
DstLoop = DstLoop->getParentLoop();
DstLevel--;
}
- while (SrcLoop != DstLoop) {
- SrcLoop = SrcLoop->getParentLoop();
- DstLoop = DstLoop->getParentLoop();
- SrcLevel--;
- }
+ if (ConsiderSeparateLoops) {
+ while (!areLoopsSimilar(SrcLoop, DstLoop)) {
+ SrcLoop = SrcLoop->getParentLoop();
+ DstLoop = DstLoop->getParentLoop();
+ SrcLevel--;
+ }
+ while (SrcLoop != DstLoop) {
+ SrcLoop = SrcLoop->getParentLoop();
+ DstLoop = DstLoop->getParentLoop();
+ SeparateLevels++;
+ }
+ } else
+ while (SrcLoop != DstLoop) {
+ SrcLoop = SrcLoop->getParentLoop();
+ DstLoop = DstLoop->getParentLoop();
+ SrcLevel--;
+ }
CommonLevels = SrcLevel;
MaxLevels -= CommonLevels;
}
@@ -1223,8 +1307,9 @@ bool DependenceInfo::testZIV(const SCEV *Src, const SCEV *Dst,
//
// Return true if dependence disproved.
bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
- const SCEV *DstConst, const Loop *CurLoop,
- unsigned Level, FullDependence &Result,
+ const SCEV *DstConst, const Loop *CurSrcLoop,
+ const Loop *CurDstLoop, unsigned Level,
+ FullDependence &Result,
Constraint &NewConstraint) const {
LLVM_DEBUG(dbgs() << "\tStrong SIV test\n");
LLVM_DEBUG(dbgs() << "\t Coeff = " << *Coeff);
@@ -1242,7 +1327,8 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
LLVM_DEBUG(dbgs() << ", " << *Delta->getType() << "\n");
// check that |Delta| < iteration count
- if (const SCEV *UpperBound = collectUpperBound(CurLoop, Delta->getType())) {
+ if (const SCEV *UpperBound =
+ collectUpperBound(CurSrcLoop, Delta->getType())) {
LLVM_DEBUG(dbgs() << "\t UpperBound = " << *UpperBound);
LLVM_DEBUG(dbgs() << ", " << *UpperBound->getType() << "\n");
const SCEV *AbsDelta =
@@ -1275,7 +1361,8 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
return true;
}
Result.DV[Level].Distance = SE->getConstant(Distance);
- NewConstraint.setDistance(SE->getConstant(Distance), CurLoop);
+ NewConstraint.setDistance(SE->getConstant(Distance), CurSrcLoop,
+ CurDstLoop);
if (Distance.sgt(0))
Result.DV[Level].Direction &= Dependence::DVEntry::LT;
else if (Distance.slt(0))
@@ -1287,7 +1374,7 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
else if (Delta->isZero()) {
// since 0/X == 0
Result.DV[Level].Distance = Delta;
- NewConstraint.setDistance(Delta, CurLoop);
+ NewConstraint.setDistance(Delta, CurSrcLoop, CurDstLoop);
Result.DV[Level].Direction &= Dependence::DVEntry::EQ;
++StrongSIVsuccesses;
}
@@ -1295,13 +1382,12 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
if (Coeff->isOne()) {
LLVM_DEBUG(dbgs() << "\t Distance = " << *Delta << "\n");
Result.DV[Level].Distance = Delta; // since X/1 == X
- NewConstraint.setDistance(Delta, CurLoop);
+ NewConstraint.setDistance(Delta, CurSrcLoop, CurDstLoop);
}
else {
Result.Consistent = false;
- NewConstraint.setLine(Coeff,
- SE->getNegativeSCEV(Coeff),
- SE->getNegativeSCEV(Delta), CurLoop);
+ NewConstraint.setLine(Coeff, SE->getNegativeSCEV(Coeff),
+ SE->getNegativeSCEV(Delta), CurSrcLoop, CurDstLoop);
}
// maybe we can get a useful direction
@@ -1360,8 +1446,9 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
// Return true if dependence disproved.
bool DependenceInfo::weakCrossingSIVtest(
const SCEV *Coeff, const SCEV *SrcConst, const SCEV *DstConst,
- const Loop *CurLoop, unsigned Level, FullDependence &Result,
- Constraint &NewConstraint, const SCEV *&SplitIter) const {
+ const Loop *CurSrcLoop, const Loop *CurDstLoop, unsigned Level,
+ FullDependence &Result, Constraint &NewConstraint,
+ const SCEV *&SplitIter) const {
LLVM_DEBUG(dbgs() << "\tWeak-Crossing SIV test\n");
LLVM_DEBUG(dbgs() << "\t Coeff = " << *Coeff << "\n");
LLVM_DEBUG(dbgs() << "\t SrcConst = " << *SrcConst << "\n");
@@ -1372,7 +1459,7 @@ bool DependenceInfo::weakCrossingSIVtest(
Result.Consistent = false;
const SCEV *Delta = SE->getMinusSCEV(DstConst, SrcConst);
LLVM_DEBUG(dbgs() << "\t Delta = " << *Delta << "\n");
- NewConstraint.setLine(Coeff, Coeff, Delta, CurLoop);
+ NewConstraint.setLine(Coeff, Coeff, Delta, CurSrcLoop, CurDstLoop);
if (Delta->isZero()) {
Result.DV[Level].Direction &= ~Dependence::DVEntry::LT;
Result.DV[Level].Direction &= ~Dependence::DVEntry::GT;
@@ -1420,7 +1507,8 @@ bool DependenceInfo::weakCrossingSIVtest(
// We're certain that Delta > 0 and ConstCoeff > 0.
// Check Delta/(2*ConstCoeff) against ...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
68d430c
to
f840335
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Effectively, with SeparateLoops=true, this will handle loops as if they are fused.
But it also means that the result is plain wrong for non-fused loops. E.g. the dependence distance d
computed by e.g. strongSIVtest
may result in d > 0
, where in reality all the instances of write occur before the read because their loops are sequential, not nested.
Another concern is that the analysis makes the decision on how to loop fusion occurs. The FuseLoops pass may want to fust loops with non-equal trip count, then it has to make the decision which iterations are executed with which ones. Even in cases where the trip count matches such as
for (int i = 0; i < n; +=1)
A[i+1] += 1;
for (int i = 0; i < n; +=1)
A[i] += 2;
loop fusion would optimally be
for (int i = 0; i < n+1; +=1) {
if (i > 0) A[i] += 1;
if (i < n) A[i] += 2;
}
or after LoopBoundSplitPass etc.
A[0] += 2;
for (int i = 0; i < n+1; +=1)
A[i] += 3;
A[n] += 1;
i.e. not as naive as DA would assume. Ideally, we would chose an approach that allows us to extend FuseLoops over time.
I think instead of a SeparateLoops
parameter, DA should receive the info which loops are considered to be fused from FuseLoops -- otherwise they might be disagree. "SeparateLoops" isn't a good name anyway. It goes pack to the old problem that we want to analyze the result of a optimization without the optimization having been applied first. It would be great if we could leave pass-specific concerns out of DA itself, it does not scale well with the number of passes, but I concede that sometimes it might be a pragmatic solution.
f840335
to
3cfd2e0
Compare
Good point! To avoid the confusion among the original distances or directions and the new information we provide by this patch, now I will provide them in a different array. I added a new array of
Loop fusion or or any other optimization passes that want to use the analysis results can decide how to use it. For the case you mentioned, loop fusion can peel the iterations first to make the trip counts the same and then apply DA.
We prefer the DA to provide information across two loops, enabling loop fusion to identify dependencies before applying the optimization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear: I don't yet think this is worth integrating for a special case of loop fusion, especially if we don't even have the corresponding PR that makes use of this yet.
;; for (long int i = 0; i < n; i++) { | ||
;; for (long int j = 0; j < n; j++) { | ||
;; for (long int k = 0; k < n; k++) { | ||
;; A[i][j][k] = i; | ||
;; } | ||
;; for (long int k = 0; k < n; k++) { | ||
;; *B++ = A[i + 3][j + 2][k + 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same case as @p2
from PreliminaryNoValidityCheckFixedSize.ll ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is replaced with another test case in which there are two separate levels now.
// MIV is not handled yet on separate loops; check if there is any MIV test | ||
for (unsigned P = 0; P < Pairs; ++P) { | ||
Pair[P].Loops.resize(MaxLevels + 1); | ||
auto classification = classifyPair( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Style] Local variables start with capital letters; No Almost-Always-Auto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
// a - 1 | ||
// b - 2 = CommonLevels | ||
// <c, e> - 3 : A SeparateLevel | ||
// d - 4 = SrcLevels | ||
// f - 6 | ||
// g - 7 = MaxLevels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened with level 5?
There are lots of references to the loop numbering scheme other than SIV that don't account for the change, e.g. getSplitIteration
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I updated my changes such that it does not affect the loop numbering scheme anymore. Now this api will only provide an additional info which is SeparateLevels
.
auto classification = classifyPair( | ||
Pair[P].Src, LI->getLoopFor(Src->getParent()), Pair[P].Dst, | ||
LI->getLoopFor(Dst->getParent()), Pair[P].Loops); | ||
if (classification == Subscript::MIV) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MIV
gets it special treatment here, with reverting to old scheme, but what about RDIV, ZIV?
classifyPair
itself relies on the old counting scheme, and will classify e.g. what was RDIV before to SIV.
There is also the case that with non-fused interpretation, we actually might be able to resolve dependencies, but do not with ConsiderSeparateLoops. What ensures that does not happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SIV, RDIV or ZIV are able to handle pairs from different loops. RDIV and ZIV could do that before and SIV is able to handle it by this patch.
I cannot see a case where we may miss resolving any dependencies by considering separate loops optimization. Could you please clarify on that?
// the separate level extraction at the end of the depends api we have | ||
// a - 1 | ||
// b - 2 = CommonLevels | ||
// <c, e> - 3 : A SeparateLevel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// <c, e> - 3 : A SeparateLevel | |
// <c, e> - 3 = SeparateLevels |
This introduced 3 different meanings of levels and it is not clear what interpretation at what point it the "correct" one. Better integrate into the text above, where SeparateLevels is zero unless ConsiderSeparateLoops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the confusion as mentioned in another comment above.
assert(Levels < Level && Level <= Levels + SeparateLevels && | ||
"Separate level out of range"); | ||
return DVSeparate[Level - Levels - 1].Direction; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not introduce a helper function that returns the correct index into DVSeparate
so you don't need the if/else here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. Added.
3cfd2e0
to
9d511f5
Compare
9d511f5
to
8e263a3
Compare
Thanks Michael for your time and comments. I tried to resolve them and provide a more clean patch along with changes applied to the loop fusion pass to use the info provided by DA. Please take a look and let me know what you think. I apologize for the delay, I was caught up with some other projects. |
b3df426
to
4420416
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several unrelated formatting changes in there. If it is easer, we can commit a NFC clang-format change so you don't have to worry about clang-format introducing changes anymore.
With not seeing the loop fusion code, I meant to open a seaparate commit that modifies loop fusion, so this PR does mix two concerns. See https://llvm.org/docs/GitHub.html#stacked-pull-requests.
@@ -825,16 +874,23 @@ void DependenceInfo::establishNestingLevels(const Instruction *Src, | |||
DstLoop = DstLoop->getParentLoop(); | |||
DstLevel--; | |||
} | |||
// find the first separate similar level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This goes from innermost loops to outer ones, and stop at the innermost unequal one. But what if the second innermost is equal just the outermost is not:
for (int i = 0; i < 42; ++i) { // fusable
for (int j = 0; j < 21; ++j) { // not fusable
}
}
for (int i = 0; i < 42; ++i) { // fusable
for (int j = 0; j < 42; ++j) {
}
}
I think the i-loop should be comon levels, just the j-loops not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! It is fixed now in this code fragment.
/// SeparateLevels counts the number of loop levels after the common levels | ||
/// that are not identical but are considered similar. Two levels are | ||
/// considered similar if they have the same trip count and the same | ||
/// nesting depth. | ||
/// For example, if loops `c` and `e` are similar, then they contribute to | ||
/// the SeparateLevels count and SeparateLevels is set to 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not messing up with the level count is a good thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you write some about how the interface works for methods that take a "Separate" argument? In this example, c and e are considered similar, which are loops 3 and 5, but with Separate==true
, to get the result for the fused loop, one passes just 3 (which could be considered either the loop depth which is the same of loop c and e, or the SrcLevels representing the fused loops). The pre-existing use of "level" which is not actually the loop depth (number of surrounding loops it is nested in) but a loop id in depth-first order is a bit unfortunate. Some disambiguation here may help.
[not a change request] Instead of "Separate" I think I would have preferred using something "like AssumeFusedLoops" consistently throughout, but not something that I would required for this review. "Separate" does not convey a lot of meaning, it could be loops from different functions. For methods that don't even make sense for non-fused loop such as getDistance
(hence level
should rather mean depth), the parameter isn't strictly nessary, a client that doesn't know about assumed-fused loops would not call it with argument beyond getLevels() anyways.
/// this loop will break this dependence. | ||
bool isPeelFirst(unsigned Level) const override; | ||
/// this loop will break this dependence. If Separate is set to true, | ||
/// information about a separate level is provided. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is nothing in the header file that even explains what a "separate level" actually is. Rather than having this sentence not saying anything here, consider central documentation whati t means, e.g. as class comment.
Could contain:
- Number of loops inside common loops that are "similar" (+definition of similar)
- Have different
llvm::Loop
objects but can be interpreted as a single fused loop withSeparate=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is added in the header file.
@@ -20,7 +20,7 @@ define void @p2(i64 %n, ptr %A, ptr %B) nounwind uwtable ssp { | |||
; CHECK-NEXT: Src: store i64 %i.011, ptr %arrayidx8, align 8 --> Dst: store i64 %i.011, ptr %arrayidx8, align 8 | |||
; CHECK-NEXT: da analyze - none! | |||
; CHECK-NEXT: Src: store i64 %i.011, ptr %arrayidx8, align 8 --> Dst: %0 = load i64, ptr %arrayidx17, align 8 | |||
; CHECK-NEXT: da analyze - flow [-3 -2]! | |||
; CHECK-NEXT: da analyze - flow [-3 -2 / -1]! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than merging two DA interpretations into the same output, have you considered keeping the separate for better understandability. E.g.:
da analyze - flow [-3 -2]! / assuming 1 fused loop: [-3 -2 -1]!
Because of how FileCheck works, this test wouldn't even needed to be changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied.
17552ab
to
21ef00f
Compare
A separate commit and PR has been created for the changes applied to the loop fusion pass. |
Do you mean 1997alireza#1 ? You already merged it into this PR so https://github.com/llvm/llvm-project/pull/128782/files includes the LoopFuse changes again. The point of Stacked PRs is to not merge them. |
7466249
to
21ef00f
Compare
Reverted the merged PR. Now the loop fusion changes are in another PR #146383. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some bikeshedding left but otherwise the patch looks good
} | ||
|
||
/// inSeparateLoops - Returns true if this level is a separate level, i.e., | ||
/// performed across two separate loop nests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// performed across two separate loop nests. | |
/// performed across two separate loop nests that a treated like a single fused loop. |
/// SeparateLevels counts the number of loop levels after the common levels | ||
/// that are not identical but are considered similar. Two levels are | ||
/// considered similar if they have the same trip count and the same | ||
/// nesting depth. | ||
/// For example, if loops `c` and `e` are similar, then they contribute to | ||
/// the SeparateLevels count and SeparateLevels is set to 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you write some about how the interface works for methods that take a "Separate" argument? In this example, c and e are considered similar, which are loops 3 and 5, but with Separate==true
, to get the result for the fused loop, one passes just 3 (which could be considered either the loop depth which is the same of loop c and e, or the SrcLevels representing the fused loops). The pre-existing use of "level" which is not actually the loop depth (number of surrounding loops it is nested in) but a loop id in depth-first order is a bit unfortunate. Some disambiguation here may help.
[not a change request] Instead of "Separate" I think I would have preferred using something "like AssumeFusedLoops" consistently throughout, but not something that I would required for this review. "Separate" does not convey a lot of meaning, it could be loops from different functions. For methods that don't even make sense for non-fused loop such as getDistance
(hence level
should rather mean depth), the parameter isn't strictly nessary, a client that doesn't know about assumed-fused loops would not call it with argument beyond getLevels() anyways.
But they are not actually fused, and the client of DA is not only LoopFuse. I didn't mean that the result is incorrect, but has changed due to the assumption of loop fusion. This part seems to attempt to split the result into two parts, with or without the assumption. However, the situation isn't that straightforward. The dependencies for |
Yes, in some cases the result may change. My point is that this change will be an improvment as your earlier example demonstrates. We can not stop development of new features because they improve results that other components consume. What I was trying to highlight was that this change (of DA results) will not be incorrect; it will be an improvement. |
Obviously IF this change of DA result, causes something unexpected (a functional or a performance issue) then we need to look into it and decide how to proceed. That would depend on the nature of the problem encountered. But a priori, I don't see any problem with this occasional change of result, when it is actually an improvement. |
My concerns are:
As for the first one, at least the change to What I'm more concerned about is the second question. For example, consider a case where a loop guard exists between the two fusable loops, such as: for (int i = 0; i < N; i++) {
for (int j = 0; j < M; j++)
Stmt0;
if (cond)
for (int j = 0; j < M; j++)
Stmt1;
}
// Case 1
for (int i = 0; i < N; i++)
for (int j = 0; j < M; j++) {
Stmt0;
if (cond)
Stmt1;
}
// Case 2
for (int i = 0; i < N; i++)
if (cond)
for (int j = 0; j < M; j++) {
Stmt0;
Stmt1;
} IIUC, LoopFuse excludes such cases before calling In conclusion, I'm happy with the improvements themselves when there is evidence supporting their soundness. However, at least at this point, I think it has not been given much consideration. Footnotes
|
About (1) I think forcing About (2) I believe this is fine. IIRC, the question of whether two statements from two different loops are dependent or not is not relevant to any existing DA client other than loop fusion. If it is relevant to another pass, then same concerns that you bring up is applicable to the results that |
Let me rephrase my answer to question (2). If a pass asks a question about depdency of two statements from two different loops, then that pass needs to have good answers to the question that you raised. Right now, loop fusion can handle this. Any other pass that wants to do this in future needs to be clear about this as well. |
The comment of
It means that this function returns true if we know the dependence is consistent. Returning false doesn't mean it is inconsistent. Always returning false is permitted. for (int i = 0; i < 100; i++) {
for (int j = 0; j < 100; j++)
A[i][j] = 42;
if (rand() % 2 == 0)
for (int j = 0; j < 100; j++)
A[i][j] = 43;
} The dependence between
If this patch adds new assumptions to the analysis results, we may need to update other existing uses. For example, if the improved results assume the conditions under which "LoopFuse works fine," then we need to replace code like: Dependence *Dep = DI->depends(Src, Dst); with something like: Dependence *Dep = DI->depends(Src, Dst);
if (The same condition as in LoopFuse does not hold)
return; // early exit
// Or, more crudely
if (Dep->getFusableLevel() != 0)
return; Footnotes
|
So, at the very least, I personally think it would be better to insert an assertion like |
I would say this patch adds a new functionality to DA. Personally I don't see any reason to worry about this. I am not exactly sure what can go wrong and how, that is not an existing risk. However, I believe it is possible to add a new flag to DA interface to enable this new functionality. The default value could be false, to turn it off, and in loop fusion we can turn it on. Alternatively we can do something like you suggest. We can add an assertion or condition for other clients of DA that checks source and destination are within one loop. But I am not sure how that would protect a future potential client of DA. In any case, I don't see much benefit to either of these solutions, but if you think it is helpful I have no objections. |
If you think about common customers of DA, whether in llvm or other compilers (loop interchange, loop distribution, loop vectorization) chances of accidentally passing two statements from two different loops to DA is very low. Pretty much any user of DA needs to deal with enough details, that makes such mistakes low probability. But again I don't have objections if you think that is helpful. Right now I skipped the first part of your comment about Thank you very much for taking the time to review and letting us know your opinion. |
4563cf6
to
7918545
Compare
|
000bc7b
to
3e48d86
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think about common customers of DA, whether in llvm or other compilers (loop interchange, loop distribution, loop vectorization) chances of accidentally passing two statements from two different loops to DA is very low. Pretty much any user of DA needs to deal with enough details, that makes such mistakes low probability.
I don't think so. For example, regarding LoopInterchange, the following case isn't currently supported, but I believe it could be extended relatively easily
for (int i = 0; i < N; i++)
for (int j = 0; j < M; j++) {
for (int k = 0; k < O; k++)
Stmt1;
for (int l = 0; l < O; l++)
Stmt2;
}
In this case, LoopInterchange is only concerned with the i-loop and j-loop; the k-loop and l-loop don't really matter, especially since LoopInterchange only checks the first two elements of DV
. In that sense, the current DA works fine, as it essentially analyzes loops whose levels are less than CommonLevels
. However, this patch runs the analysis under the assumption that the k-loop and l-loop are fused, which can affect the dependency results for other loops; "improved results", as mentioned in earlier comments.
So my question is: is it safe to use these "improved results"? Or does LoopInterchange need to verify whether the fusion assumption affects the outcome, and whether it's safe to rely on it? If it's the latter, I think it introduces unnecessary noise. Moreover, I believe this issue isn't limited to LoopInterchange. It likely applies to other loop transformations as well. These transformations may only be concerned with common loops and disregard deeper-level loops entirely, or treat them not as loops but simply as "complex control flow". What I want to emphasize is that this patch appears to impose additional constraints on DA clients. If that's the case, it should at the very least be documented explicitly.
I believe it is possible to add a new flag to DA interface to enable this new functionality. The default value could be false, to turn it off, and in loop fusion we can turn it on.
It may be an optional choice now, but it could become technical debt when we attempt to cache analysis results and reuse them across passes in the future.
isConsistent
is set to false whenever there are fusable levels to be analyzed.
Yes, and it might be annoying for other passes. That said, isConsistent
seems already incorrect in some cases and nobody uses it, so it might be worth removing it altogether.
for (unsigned level = 0; level < CommonLevels; ++level) | ||
DV[level] = Result.DV[level]; | ||
for (unsigned level = 0; level < FusableLevels; ++level) | ||
DVFusable[level] = Result.DV[CommonLevels + level]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (unsigned level = 0; level < CommonLevels; ++level) | |
DV[level] = Result.DV[level]; | |
for (unsigned level = 0; level < FusableLevels; ++level) | |
DVFusable[level] = Result.DV[CommonLevels + level]; | |
for (unsigned Level = 0; Level < CommonLevels; ++Level) | |
DV[Level] = Result.DV[Level]; | |
for (unsigned Level = 0; Level < FusableLevels; ++Level) | |
DVFusable[Level] = Result.DV[CommonLevels + Level]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied.
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" | ||
target triple = "x86_64-apple-macosx10.6.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
/// a - 1 | ||
/// b - 2 | ||
/// c,e - 3 = CommonLevels | ||
/// d - 4 = SrcLevels | ||
/// f - 5 | ||
/// g - 6 = MaxLevels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how @Meinersbur ran the tests, but since DA isn't triggered by default, I assume it's manually registered, e.g., by editing PassBuilderPipelines.cpp
.
I believe there is no need to reset the array
Pair
(Assuming we are talking about this part)
Hm, after thinking it over for a while, I also think the reset is unnecessary, as they are overwritten in the subsequent loop. Maybe using Pair[P].Loops
is confusing. It would be better to use a temporal (local) variable instead of Pair[P].Loops
to keep the code clean. Also, I think it would be better to add a test where CommonLevels
and MaxLevels
are reverted since TestClass
is neither ZIV
, SIV
nor RDIV
.
Your example is not a perfect loop nest (the j-loop has two inner loops) and the legality check for loop interchange assumes a perfect loop nest (see the code snippet below taken from loop interchange source code). I am not sure extension of loop interchange to cover this case will be easy. But anyways, let's use that flag to address this concern.
|
Yes, they are not perfectly nested loops if we take the k-loop and l-loop into account. However, I believe it is sound to ignore them and focus only on the i-loop and j-loop. LoopInterchange currently doesn't have the logic to ignore the k-loop and l-loop, but such functionality could be added in the future. In that case, I think the fusion assumption could lead to something unexpected. In addition, I suspect a similar problem could arise in LoopFuse. What happens when we attempt to fuse the k0-loop and k1-loop in the example below? Is it sound to assume that the l0-loop and l1-loop are fused? for (int i = 0; i < M; i++)
for (int j = 0; j < N; j++) {
for (int k0 = 0; k0 < O; k0++)
for (int l0 = 0; l0 < P; l0++)
Stmt1;
for (int k1 = 0; k1 < O; k1++)
for (int l1 = 0; l1 < P; l1++)
Stmt2;
}
Just in case: I don't request to add such a flag. I think it's also okay if you can prove the soundness of the new assumption. But if adding such a flag is more convenient, then I don't have strong objections at the moment. |
JFYI: the code you’re looking at appears to be slightly outdated. llvm-project/llvm/lib/Transforms/Scalar/LoopInterchange.cpp Lines 321 to 335 in 5479b7e
|
The IR provided in the link appears to be incorrect. Specifically, k.0 should be replaced with k.1 in line 31. With this change, the current DA can already determine that there is no dependency by using the RDIV test. Therefore, my patch does not change the result in this case. |
You are completely correct. I was stupid, sorry about that... for (int i = 0; i < 10; i++) {
for (int j = 1; j < 11; j++)
A[i][j+1][j-1] = 42;
for (int j = 0; j < 10; j++)
A[i][j][j] = 43;
} The current DA produces the following result (godbolt: https://godbolt.org/z/5d6xej3bv):
I also ran this locally with your patch, which produced:
This time it should be correct, and I'm still not sure whether using this result is sound for callers other than LoopFuse. |
Yes, my patch will modify certain cases such as this one. May I ask what your specific objection is? Do you believe that my patch produces incorrect results? If not, I do not see any issue with it. The patch enhances the results and can even detect additional cases where no dependency exists. The point you raise regarding changes to the results could equally apply to the RDIV test. For example, what if someone improved RDIV to address cases similar to the one above? |
Basically yes. For example, consider the following case: for (int i = 0; i < M; i++) {
if (i < 10)
for (int j = 0; j < N; j++)
A[i+10][j] = 42;
for (int j = 0; j < N; j++)
A[i][j] = 43;
} What does "assuming fusion" mean in this case? Doesn't it also imply that Stmt2 is executed only when |
IIUIC, RDIV seems to have similar potential issues as well (unfortunately, DA is already pretty broken...). |
I do not believe such checks are necessary within DA. If you run this code with the original DA, the result is [10], which represents the dependency distance with respect to loop i. With my patch applied, DA is able to capture the dependency for loop j as well, resulting in [10 0]. As you can see, DA does not report no dependency; rather, it now provides the dependency distance as well. The point you mentioned regarding loop-guards is not directly related to what DA is designed to provide. "assuming fusion" means what would be the dependency between two accesses if they were in the same loop like this: for (int i = 0; i < M; i++) {
for (int j = 0; j < N; j++)
A[i+10][j] = 42;
A[i][j] = 43;
} DA doesn't need to know about the loop-guards to provide such information. It is similar to the following case where DA generates [10, 0]. for (int i = 0; i < M; i++) {
for (int j = 0; j < N; j++) {
if (i < 10)
A[i+10][j] = 42;
A[i][j] = 43;
}
} |
When there is a dependency between two memory instructions in separate fusable loops, SIV will be able to test them and compute the direction and the distance of the dependency. Two loop levels are considered fusable if they have the same tripcount and depth.
3e48d86
to
2a56dea
Compare
At the very least, In addition, IIUIC, the following two codes are not considered equivalent by if (i < 10)
for (int j = 0; j < N; j++)
Stmt;
for (int j = 0; j < N; j++)
if (i < 10)
Stmt; |
I appreciate your intention to ensure the utmost quality of the open source compiler. Let me clarify a couple of things. DA gets two mem access instructions in one or two loops. The actual memory accessed may vary depending on the value of the induction variables of the loops. All that DA is responsible for is to check these two instructions may access the same memory for some of the possible values of the induction variables or not. Whether the loops have guards or not, control flow inside the loop, whether the instructions are actually executed for any or all values of the induction variable is irrelevant to DA. There is another issue about fusability of the loops. None of conditions of fusability (loop guards, etc. ) is relevant here. The only thing relevant is that we have two loops with induction variables I and J that takes values |
(Sorry to bother you with my nitpicky comments.) Let me clarify my point, using the same example. for (int i = 0; i < M; i++) {
if (i < 10)
for (int j = 0; j < N; j++)
A[i+10][j] = 42;
for (int j = 0; j < N; j++)
A[i][j] = 43;
} Your argument seems correct if we are analyzing the C language (or other high-level languages). But our target is LLVM IR which does not have explicit loop constructs such as The current patch seems to be saying "let's regard the two fusable loops as the same", in this example the two j-loops. If the second j-loop is "mapped" to the first j-loop, it appears to me that this may potentially imply considering the following code rather than the original: for (int i = 0; i < M; i++) {
if (i < 10)
for (int j = 0; j < N; j++) {
A[i+10][j] = 42;
A[i][j] = 43;
}
}
// Or, written in a form closer to LLVM IR
for (int i = 0; i < M; i++) {
if (i < 10 && N > 0) {
int j = 0;
do {
A[i+10][j] = 42;
A[i][j] = 43;
j++;
} while (j < N)
}
} I've been thinking about why this happens, and I've come to feel that the root cause is trying to map different loops to the same Level. If this patch is modified so that |
When there is a dependency between two memory instructions in separate fusable loops, SIV will be able to test them and compute the direction and the distance of the dependency. Two loop levels are considered fusable if they have the same tripcount and depth.