Skip to content

Conversation

1997alireza
Copy link
Contributor

@1997alireza 1997alireza commented Feb 25, 2025

When there is a dependency between two memory instructions in separate fusable loops, SIV will be able to test them and compute the direction and the distance of the dependency. Two loop levels are considered fusable if they have the same tripcount and depth.

@llvmbot llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Feb 25, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 25, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Alireza Torabian (1997alireza)

Changes

When there is a dependency between two memory instructions in separate loops, SIV will be able to test them and compute the direction and the distance of the dependency.


Patch is 48.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128782.diff

4 Files Affected:

  • (modified) llvm/include/llvm/Analysis/DependenceAnalysis.h (+55-17)
  • (modified) llvm/lib/Analysis/DependenceAnalysis.cpp (+222-92)
  • (modified) llvm/test/Analysis/DependenceAnalysis/PreliminaryNoValidityCheckFixedSize.ll (+1-1)
  • (added) llvm/test/Analysis/DependenceAnalysis/SIVSeparateLoops.ll (+209)
diff --git a/llvm/include/llvm/Analysis/DependenceAnalysis.h b/llvm/include/llvm/Analysis/DependenceAnalysis.h
index 426ac757b4b0d..8e86c091c60a2 100644
--- a/llvm/include/llvm/Analysis/DependenceAnalysis.h
+++ b/llvm/include/llvm/Analysis/DependenceAnalysis.h
@@ -97,10 +97,11 @@ namespace llvm {
       bool PeelFirst : 1; // Peeling the first iteration will break dependence.
       bool PeelLast  : 1; // Peeling the last iteration will break the dependence.
       bool Splitable : 1; // Splitting the loop will break dependence.
+      bool SeparateLoops : 1; // Is performed across two separate loop nests.
       const SCEV *Distance = nullptr; // NULL implies no distance available.
       DVEntry()
           : Direction(ALL), Scalar(true), PeelFirst(false), PeelLast(false),
-            Splitable(false) {}
+            Splitable(false), SeparateLoops(false) {}
     };
 
     /// getSrc - Returns the source instruction for this dependence.
@@ -182,6 +183,10 @@ namespace llvm {
     /// the dependence.
     virtual bool isSplitable(unsigned Level) const { return false; }
 
+    /// inSeparateLoops - Returns true if this level is performed across
+    /// two separate loop nests.
+    virtual bool inSeparateLoops(unsigned Level) const { return false; }
+
     /// isScalar - Returns true if a particular level is scalar; that is,
     /// if no subscript in the source or destination mention the induction
     /// variable associated with the loop at this level.
@@ -275,6 +280,10 @@ namespace llvm {
     /// the dependence.
     bool isSplitable(unsigned Level) const override;
 
+    /// inSeparateLoops - Returns true if this level is performed across
+    /// two separate loop nests.
+    bool inSeparateLoops(unsigned Level) const override;
+
     /// isScalar - Returns true if a particular level is scalar; that is,
     /// if no subscript in the source or destination mention the induction
     /// variable associated with the loop at this level.
@@ -405,7 +414,8 @@ namespace llvm {
       const SCEV *A;
       const SCEV *B;
       const SCEV *C;
-      const Loop *AssociatedLoop;
+      const Loop *AssociatedSrcLoop;
+      const Loop *AssociatedDstLoop;
 
     public:
       /// isEmpty - Return true if the constraint is of kind Empty.
@@ -449,18 +459,25 @@ namespace llvm {
       /// Otherwise assert.
       const SCEV *getD() const;
 
-      /// getAssociatedLoop - Returns the loop associated with this constraint.
-      const Loop *getAssociatedLoop() const;
+      /// getAssociatedSrcLoop - Returns the source loop associated with this
+      /// constraint.
+      const Loop *getAssociatedSrcLoop() const;
+
+      /// getAssociatedDstLoop - Returns the destination loop associated with
+      /// this constraint.
+      const Loop *getAssociatedDstLoop() const;
 
       /// setPoint - Change a constraint to Point.
-      void setPoint(const SCEV *X, const SCEV *Y, const Loop *CurrentLoop);
+      void setPoint(const SCEV *X, const SCEV *Y, const Loop *CurrentSrcLoop,
+                    const Loop *CurrentDstLoop);
 
       /// setLine - Change a constraint to Line.
-      void setLine(const SCEV *A, const SCEV *B,
-                   const SCEV *C, const Loop *CurrentLoop);
+      void setLine(const SCEV *A, const SCEV *B, const SCEV *C,
+                   const Loop *CurrentSrcLoop, const Loop *CurrentDstLoop);
 
       /// setDistance - Change a constraint to Distance.
-      void setDistance(const SCEV *D, const Loop *CurrentLoop);
+      void setDistance(const SCEV *D, const Loop *CurrentSrcLoop,
+                       const Loop *CurrentDstLoop);
 
       /// setEmpty - Change a constraint to Empty.
       void setEmpty();
@@ -473,6 +490,10 @@ namespace llvm {
       void dump(raw_ostream &OS) const;
     };
 
+    /// Returns true if two loops are the same or they have the same upperbound
+    /// and depth
+    bool areLoopsSimilar(const Loop *SrcLoop, const Loop *DstLoop) const;
+
     /// establishNestingLevels - Examines the loop nesting of the Src and Dst
     /// instructions and establishes their shared loops. Sets the variables
     /// CommonLevels, SrcLevels, and MaxLevels.
@@ -523,10 +544,22 @@ namespace llvm {
     ///     e - 5
     ///     f - 6
     ///     g - 7 = MaxLevels
-    void establishNestingLevels(const Instruction *Src,
-                                const Instruction *Dst);
-
-    unsigned CommonLevels, SrcLevels, MaxLevels;
+    /// If ConsiderSeparateLoops is true then we also want to consider similar
+    /// seperate loops. Assume that loop nests at level c and e are similar,
+    /// meaning that they have the same upperbound and depth. Then we consider
+    /// them as a common level.
+    ///     a      - 1
+    ///     b      - 2
+    ///     <c, e> - 3 = CommonLevels
+    ///     d      - 4 = SrcLevels
+    ///     f      - 5
+    ///     g      - 6 = MaxLevels
+    /// SeparateLevels means that how many of the last common levels are
+    /// separated, which is 1 in this case.
+    void establishNestingLevels(const Instruction *Src, const Instruction *Dst,
+                                bool ConsiderSeparateLoops = false);
+
+    unsigned CommonLevels, SrcLevels, MaxLevels, SeparateLevels;
 
     /// mapSrcLoop - Given one of the loops containing the source, return
     /// its level index in our numbering scheme.
@@ -668,7 +701,8 @@ namespace llvm {
     bool strongSIVtest(const SCEV *Coeff,
                        const SCEV *SrcConst,
                        const SCEV *DstConst,
-                       const Loop *CurrentLoop,
+                       const Loop *CurrentSrcLoop,
+                       const Loop *CurrentDstLoop,
                        unsigned Level,
                        FullDependence &Result,
                        Constraint &NewConstraint) const;
@@ -686,7 +720,8 @@ namespace llvm {
     bool weakCrossingSIVtest(const SCEV *SrcCoeff,
                              const SCEV *SrcConst,
                              const SCEV *DstConst,
-                             const Loop *CurrentLoop,
+                             const Loop *CurrentSrcLoop,
+                             const Loop *CurrentDstLoop,
                              unsigned Level,
                              FullDependence &Result,
                              Constraint &NewConstraint,
@@ -705,7 +740,8 @@ namespace llvm {
                       const SCEV *DstCoeff,
                       const SCEV *SrcConst,
                       const SCEV *DstConst,
-                      const Loop *CurrentLoop,
+                      const Loop *CurrentSrcLoop,
+                      const Loop *CurrentDstLoop,
                       unsigned Level,
                       FullDependence &Result,
                       Constraint &NewConstraint) const;
@@ -723,7 +759,8 @@ namespace llvm {
     bool weakZeroSrcSIVtest(const SCEV *DstCoeff,
                             const SCEV *SrcConst,
                             const SCEV *DstConst,
-                            const Loop *CurrentLoop,
+                            const Loop *CurrentSrcLoop,
+                            const Loop *CurrentDstLoop,
                             unsigned Level,
                             FullDependence &Result,
                             Constraint &NewConstraint) const;
@@ -741,7 +778,8 @@ namespace llvm {
     bool weakZeroDstSIVtest(const SCEV *SrcCoeff,
                             const SCEV *SrcConst,
                             const SCEV *DstConst,
-                            const Loop *CurrentLoop,
+                            const Loop *CurrentSrcLoop,
+                            const Loop *CurrentDstLoop,
                             unsigned Level,
                             FullDependence &Result,
                             Constraint &NewConstraint) const;
diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp b/llvm/lib/Analysis/DependenceAnalysis.cpp
index dc0ed22dbcc0b..b947e92a6375b 100644
--- a/llvm/lib/Analysis/DependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/DependenceAnalysis.cpp
@@ -104,6 +104,7 @@ STATISTIC(GCDindependence, "GCD independence");
 STATISTIC(BanerjeeApplications, "Banerjee applications");
 STATISTIC(BanerjeeIndependence, "Banerjee independence");
 STATISTIC(BanerjeeSuccesses, "Banerjee successes");
+STATISTIC(SeparateLoopsConsidered, "Separate loops considered");
 
 static cl::opt<bool>
     Delinearize("da-delinearize", cl::init(true), cl::Hidden,
@@ -377,6 +378,13 @@ bool FullDependence::isSplitable(unsigned Level) const {
 }
 
 
+// Returns true if this level is performed across two separate loop nests.
+bool FullDependence::inSeparateLoops(unsigned Level) const {
+  assert(0 < Level && Level <= Levels && "Level out of range");
+  return DV[Level - 1].SeparateLoops;
+}
+
+
 //===----------------------------------------------------------------------===//
 // DependenceInfo::Constraint methods
 
@@ -431,37 +439,52 @@ const SCEV *DependenceInfo::Constraint::getD() const {
 }
 
 
-// Returns the loop associated with this constraint.
-const Loop *DependenceInfo::Constraint::getAssociatedLoop() const {
+// Returns the source loop associated with this constraint.
+const Loop *DependenceInfo::Constraint::getAssociatedSrcLoop() const {
+  assert((Kind == Distance || Kind == Line || Kind == Point) &&
+         "Kind should be Distance, Line, or Point");
+  return AssociatedSrcLoop;
+}
+
+
+// Returns the destination loop associated with this constraint.
+const Loop *DependenceInfo::Constraint::getAssociatedDstLoop() const {
   assert((Kind == Distance || Kind == Line || Kind == Point) &&
          "Kind should be Distance, Line, or Point");
-  return AssociatedLoop;
+  return AssociatedDstLoop;
 }
 
+
 void DependenceInfo::Constraint::setPoint(const SCEV *X, const SCEV *Y,
-                                          const Loop *CurLoop) {
+                                          const Loop *CurSrcLoop,
+                                          const Loop *CurDstLoop) {
   Kind = Point;
   A = X;
   B = Y;
-  AssociatedLoop = CurLoop;
+  AssociatedSrcLoop = CurSrcLoop;
+  AssociatedDstLoop = CurDstLoop;
 }
 
 void DependenceInfo::Constraint::setLine(const SCEV *AA, const SCEV *BB,
-                                         const SCEV *CC, const Loop *CurLoop) {
+                                         const SCEV *CC, const Loop *CurSrcLoop,
+                                         const Loop *CurDstLoop) {
   Kind = Line;
   A = AA;
   B = BB;
   C = CC;
-  AssociatedLoop = CurLoop;
+  AssociatedSrcLoop = CurSrcLoop;
+  AssociatedDstLoop = CurDstLoop;
 }
 
 void DependenceInfo::Constraint::setDistance(const SCEV *D,
-                                             const Loop *CurLoop) {
+                                             const Loop *CurSrcLoop,
+                                             const Loop *CurDstLoop) {
   Kind = Distance;
   A = SE->getOne(D->getType());
   B = SE->getNegativeSCEV(A);
   C = SE->getNegativeSCEV(D);
-  AssociatedLoop = CurLoop;
+  AssociatedSrcLoop = CurSrcLoop;
+  AssociatedDstLoop = CurDstLoop;
 }
 
 void DependenceInfo::Constraint::setEmpty() { Kind = Empty; }
@@ -608,8 +631,8 @@ bool DependenceInfo::intersectConstraints(Constraint *X, const Constraint *Y) {
         ++DeltaSuccesses;
         return true;
       }
-      if (const SCEVConstant *CUB =
-          collectConstantUpperBound(X->getAssociatedLoop(), Prod1->getType())) {
+      if (const SCEVConstant *CUB = collectConstantUpperBound(
+              X->getAssociatedSrcLoop(), Prod1->getType())) {
         const APInt &UpperBound = CUB->getAPInt();
         LLVM_DEBUG(dbgs() << "\t\tupper bound = " << UpperBound << "\n");
         if (Xq.sgt(UpperBound) || Yq.sgt(UpperBound)) {
@@ -620,7 +643,8 @@ bool DependenceInfo::intersectConstraints(Constraint *X, const Constraint *Y) {
       }
       X->setPoint(SE->getConstant(Xq),
                   SE->getConstant(Yq),
-                  X->getAssociatedLoop());
+                  X->getAssociatedSrcLoop(),
+                  X->getAssociatedDstLoop());
       ++DeltaSuccesses;
       return true;
     }
@@ -656,6 +680,7 @@ bool DependenceInfo::intersectConstraints(Constraint *X, const Constraint *Y) {
 // For debugging purposes. Dumps a dependence to OS.
 void Dependence::dump(raw_ostream &OS) const {
   bool Splitable = false;
+  bool SeparatesStarted = false;
   if (isConfused())
     OS << "confused";
   else {
@@ -672,6 +697,10 @@ void Dependence::dump(raw_ostream &OS) const {
     unsigned Levels = getLevels();
     OS << " [";
     for (unsigned II = 1; II <= Levels; ++II) {
+      if (!SeparatesStarted && inSeparateLoops(II)) {
+        SeparatesStarted = true;
+        OS << "/ ";
+      }
       if (isSplitable(II))
         Splitable = true;
       if (isPeelFirst(II))
@@ -758,6 +787,35 @@ bool isLoadOrStore(const Instruction *I) {
   return false;
 }
 
+// Returns true if two loops are the same or they have the same tripcount and
+// depth
+bool DependenceInfo::areLoopsSimilar(const Loop *SrcLoop,
+                                     const Loop *DstLoop) const {
+  if (SrcLoop == DstLoop)
+    return true;
+
+  if (SrcLoop->getLoopDepth() != DstLoop->getLoopDepth())
+    return false;
+
+  if (!SrcLoop || !SrcLoop->getLoopLatch() || !DstLoop ||
+      !DstLoop->getLoopLatch())
+    return false;
+
+  const SCEV *SrcUB, *DstUP;
+  if (SE->hasLoopInvariantBackedgeTakenCount(SrcLoop))
+    SrcUB = SE->getBackedgeTakenCount(SrcLoop);
+  if (SE->hasLoopInvariantBackedgeTakenCount(DstLoop))
+    DstUP = SE->getBackedgeTakenCount(DstLoop);
+
+  if (SrcUB == nullptr || DstUP == nullptr)
+    return false;
+
+  if (SE->isKnownPredicate(ICmpInst::ICMP_EQ, SrcUB, DstUP))
+    return true;
+
+  return false;
+}
+
 
 // Examines the loop nesting of the Src and Dst
 // instructions and establishes their shared loops. Sets the variables
@@ -809,8 +867,21 @@ bool isLoadOrStore(const Instruction *I) {
 //     e - 5
 //     f - 6
 //     g - 7 = MaxLevels
+// If ConsiderSeparateLoops is true then we also want to consider similar
+// seperate loops. Assume that loop nests at level c and e are similar,
+// meaning that they have the same tripcount and depth. Then we consider
+// them as a common level.
+//     a      - 1
+//     b      - 2
+//     <c, e> - 3 = CommonLevels
+//     d      - 4 = SrcLevels
+//     f      - 5
+//     g      - 6 = MaxLevels
+// SeparateLevels means that how many of the last common levels are
+// separated, which is 1 in this case.
 void DependenceInfo::establishNestingLevels(const Instruction *Src,
-                                            const Instruction *Dst) {
+                                            const Instruction *Dst,
+                                            bool ConsiderSeparateLoops) {
   const BasicBlock *SrcBlock = Src->getParent();
   const BasicBlock *DstBlock = Dst->getParent();
   unsigned SrcLevel = LI->getLoopDepth(SrcBlock);
@@ -819,6 +890,7 @@ void DependenceInfo::establishNestingLevels(const Instruction *Src,
   const Loop *DstLoop = LI->getLoopFor(DstBlock);
   SrcLevels = SrcLevel;
   MaxLevels = SrcLevel + DstLevel;
+  SeparateLevels = 0;
   while (SrcLevel > DstLevel) {
     SrcLoop = SrcLoop->getParentLoop();
     SrcLevel--;
@@ -827,11 +899,23 @@ void DependenceInfo::establishNestingLevels(const Instruction *Src,
     DstLoop = DstLoop->getParentLoop();
     DstLevel--;
   }
-  while (SrcLoop != DstLoop) {
-    SrcLoop = SrcLoop->getParentLoop();
-    DstLoop = DstLoop->getParentLoop();
-    SrcLevel--;
-  }
+  if (ConsiderSeparateLoops) {
+    while (!areLoopsSimilar(SrcLoop, DstLoop)) {
+      SrcLoop = SrcLoop->getParentLoop();
+      DstLoop = DstLoop->getParentLoop();
+      SrcLevel--;
+    }
+    while (SrcLoop != DstLoop) {
+      SrcLoop = SrcLoop->getParentLoop();
+      DstLoop = DstLoop->getParentLoop();
+      SeparateLevels++;
+    }
+  } else
+    while (SrcLoop != DstLoop) {
+      SrcLoop = SrcLoop->getParentLoop();
+      DstLoop = DstLoop->getParentLoop();
+      SrcLevel--;
+    }
   CommonLevels = SrcLevel;
   MaxLevels -= CommonLevels;
 }
@@ -1223,8 +1307,9 @@ bool DependenceInfo::testZIV(const SCEV *Src, const SCEV *Dst,
 //
 // Return true if dependence disproved.
 bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
-                                   const SCEV *DstConst, const Loop *CurLoop,
-                                   unsigned Level, FullDependence &Result,
+                                   const SCEV *DstConst, const Loop *CurSrcLoop,
+                                   const Loop *CurDstLoop, unsigned Level,
+                                   FullDependence &Result,
                                    Constraint &NewConstraint) const {
   LLVM_DEBUG(dbgs() << "\tStrong SIV test\n");
   LLVM_DEBUG(dbgs() << "\t    Coeff = " << *Coeff);
@@ -1242,7 +1327,8 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
   LLVM_DEBUG(dbgs() << ", " << *Delta->getType() << "\n");
 
   // check that |Delta| < iteration count
-  if (const SCEV *UpperBound = collectUpperBound(CurLoop, Delta->getType())) {
+  if (const SCEV *UpperBound =
+          collectUpperBound(CurSrcLoop, Delta->getType())) {
     LLVM_DEBUG(dbgs() << "\t    UpperBound = " << *UpperBound);
     LLVM_DEBUG(dbgs() << ", " << *UpperBound->getType() << "\n");
     const SCEV *AbsDelta =
@@ -1275,7 +1361,8 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
       return true;
     }
     Result.DV[Level].Distance = SE->getConstant(Distance);
-    NewConstraint.setDistance(SE->getConstant(Distance), CurLoop);
+    NewConstraint.setDistance(SE->getConstant(Distance), CurSrcLoop,
+                              CurDstLoop);
     if (Distance.sgt(0))
       Result.DV[Level].Direction &= Dependence::DVEntry::LT;
     else if (Distance.slt(0))
@@ -1287,7 +1374,7 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
   else if (Delta->isZero()) {
     // since 0/X == 0
     Result.DV[Level].Distance = Delta;
-    NewConstraint.setDistance(Delta, CurLoop);
+    NewConstraint.setDistance(Delta, CurSrcLoop, CurDstLoop);
     Result.DV[Level].Direction &= Dependence::DVEntry::EQ;
     ++StrongSIVsuccesses;
   }
@@ -1295,13 +1382,12 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
     if (Coeff->isOne()) {
       LLVM_DEBUG(dbgs() << "\t    Distance = " << *Delta << "\n");
       Result.DV[Level].Distance = Delta; // since X/1 == X
-      NewConstraint.setDistance(Delta, CurLoop);
+      NewConstraint.setDistance(Delta, CurSrcLoop, CurDstLoop);
     }
     else {
       Result.Consistent = false;
-      NewConstraint.setLine(Coeff,
-                            SE->getNegativeSCEV(Coeff),
-                            SE->getNegativeSCEV(Delta), CurLoop);
+      NewConstraint.setLine(Coeff, SE->getNegativeSCEV(Coeff),
+                            SE->getNegativeSCEV(Delta), CurSrcLoop, CurDstLoop);
     }
 
     // maybe we can get a useful direction
@@ -1360,8 +1446,9 @@ bool DependenceInfo::strongSIVtest(const SCEV *Coeff, const SCEV *SrcConst,
 // Return true if dependence disproved.
 bool DependenceInfo::weakCrossingSIVtest(
     const SCEV *Coeff, const SCEV *SrcConst, const SCEV *DstConst,
-    const Loop *CurLoop, unsigned Level, FullDependence &Result,
-    Constraint &NewConstraint, const SCEV *&SplitIter) const {
+    const Loop *CurSrcLoop, const Loop *CurDstLoop, unsigned Level,
+    FullDependence &Result, Constraint &NewConstraint,
+    const SCEV *&SplitIter) const {
   LLVM_DEBUG(dbgs() << "\tWeak-Crossing SIV test\n");
   LLVM_DEBUG(dbgs() << "\t    Coeff = " << *Coeff << "\n");
   LLVM_DEBUG(dbgs() << "\t    SrcConst = " << *SrcConst << "\n");
@@ -1372,7 +1459,7 @@ bool DependenceInfo::weakCrossingSIVtest(
   Result.Consistent = false;
   const SCEV *Delta = SE->getMinusSCEV(DstConst, SrcConst);
   LLVM_DEBUG(dbgs() << "\t    Delta = " << *Delta << "\n");
-  NewConstraint.setLine(Coeff, Coeff, Delta, CurLoop);
+  NewConstraint.setLine(Coeff, Coeff, Delta, CurSrcLoop, CurDstLoop);
   if (Delta->isZero()) {
     Result.DV[Level].Direction &= ~Dependence::DVEntry::LT;
     Result.DV[Level].Direction &= ~Dependence::DVEntry::GT;
@@ -1420,7 +1507,8 @@ bool DependenceInfo::weakCrossingSIVtest(
 
   // We're certain that Delta > 0 and ConstCoeff > 0.
   // Check Delta/(2*ConstCoeff) against ...
[truncated]

Copy link

github-actions bot commented Feb 25, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@1997alireza 1997alireza force-pushed the da-siv-separate-loops branch from 68d430c to f840335 Compare February 25, 2025 22:50
Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Effectively, with SeparateLoops=true, this will handle loops as if they are fused.

But it also means that the result is plain wrong for non-fused loops. E.g. the dependence distance d computed by e.g. strongSIVtest may result in d > 0, where in reality all the instances of write occur before the read because their loops are sequential, not nested.

Another concern is that the analysis makes the decision on how to loop fusion occurs. The FuseLoops pass may want to fust loops with non-equal trip count, then it has to make the decision which iterations are executed with which ones. Even in cases where the trip count matches such as

for (int i = 0; i < n; +=1)
  A[i+1] += 1;
for (int i = 0; i < n; +=1)
  A[i] += 2;

loop fusion would optimally be

for (int i = 0; i < n+1; +=1) {
  if (i > 0) A[i] += 1;
  if (i < n) A[i] += 2;
}

or after LoopBoundSplitPass etc.

A[0] += 2;
for (int i = 0; i < n+1; +=1) 
  A[i] += 3;
A[n] +=  1;

i.e. not as naive as DA would assume. Ideally, we would chose an approach that allows us to extend FuseLoops over time.

I think instead of a SeparateLoops parameter, DA should receive the info which loops are considered to be fused from FuseLoops -- otherwise they might be disagree. "SeparateLoops" isn't a good name anyway. It goes pack to the old problem that we want to analyze the result of a optimization without the optimization having been applied first. It would be great if we could leave pass-specific concerns out of DA itself, it does not scale well with the number of passes, but I concede that sometimes it might be a pragmatic solution.

@1997alireza 1997alireza force-pushed the da-siv-separate-loops branch from f840335 to 3cfd2e0 Compare March 10, 2025 21:01
@1997alireza
Copy link
Contributor Author

Effectively, with SeparateLoops=true, this will handle loops as if they are fused.

But it also means that the result is plain wrong for non-fused loops. E.g. the dependence distance d computed by e.g. strongSIVtest may result in d > 0, where in reality all the instances of write occur before the read because their loops are sequential, not nested.

Good point! To avoid the confusion among the original distances or directions and the new information we provide by this patch, now I will provide them in a different array. I added a new array of DVEntry named DVSeparate to include the DVEntry for separate levels and keep the DV array unchanged.

Another concern is that the analysis makes the decision on how to loop fusion occurs. The FuseLoops pass may want to fust loops with non-equal trip count, then it has to make the decision which iterations are executed with which ones. Even in cases where the trip count matches such as

for (int i = 0; i < n; +=1)
  A[i+1] += 1;
for (int i = 0; i < n; +=1)
  A[i] += 2;

loop fusion would optimally be

for (int i = 0; i < n+1; +=1) {
  if (i > 0) A[i] += 1;
  if (i < n) A[i] += 2;
}

or after LoopBoundSplitPass etc.

A[0] += 2;
for (int i = 0; i < n+1; +=1) 
  A[i] += 3;
A[n] +=  1;

i.e. not as naive as DA would assume. Ideally, we would chose an approach that allows us to extend FuseLoops over time.

Loop fusion or or any other optimization passes that want to use the analysis results can decide how to use it. For the case you mentioned, loop fusion can peel the iterations first to make the trip counts the same and then apply DA.

I think instead of a SeparateLoops parameter, DA should receive the info which loops are considered to be fused from FuseLoops -- otherwise they might be disagree. "SeparateLoops" isn't a good name anyway. It goes pack to the old problem that we want to analyze the result of a optimization without the optimization having been applied first. It would be great if we could leave pass-specific concerns out of DA itself, it does not scale well with the number of passes, but I concede that sometimes it might be a pragmatic solution.

We prefer the DA to provide information across two loops, enabling loop fusion to identify dependencies before applying the optimization.

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: I don't yet think this is worth integrating for a special case of loop fusion, especially if we don't even have the corresponding PR that makes use of this yet.

Comment on lines 7 to 13
;; for (long int i = 0; i < n; i++) {
;; for (long int j = 0; j < n; j++) {
;; for (long int k = 0; k < n; k++) {
;; A[i][j][k] = i;
;; }
;; for (long int k = 0; k < n; k++) {
;; *B++ = A[i + 3][j + 2][k + 1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same case as @p2 from PreliminaryNoValidityCheckFixedSize.ll ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is replaced with another test case in which there are two separate levels now.

// MIV is not handled yet on separate loops; check if there is any MIV test
for (unsigned P = 0; P < Pairs; ++P) {
Pair[P].Loops.resize(MaxLevels + 1);
auto classification = classifyPair(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Style] Local variables start with capital letters; No Almost-Always-Auto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines 912 to 917
// a - 1
// b - 2 = CommonLevels
// <c, e> - 3 : A SeparateLevel
// d - 4 = SrcLevels
// f - 6
// g - 7 = MaxLevels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened with level 5?

There are lots of references to the loop numbering scheme other than SIV that don't account for the change, e.g. getSplitIteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I updated my changes such that it does not affect the loop numbering scheme anymore. Now this api will only provide an additional info which is SeparateLevels.

auto classification = classifyPair(
Pair[P].Src, LI->getLoopFor(Src->getParent()), Pair[P].Dst,
LI->getLoopFor(Dst->getParent()), Pair[P].Loops);
if (classification == Subscript::MIV) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MIV gets it special treatment here, with reverting to old scheme, but what about RDIV, ZIV?

classifyPair itself relies on the old counting scheme, and will classify e.g. what was RDIV before to SIV.

There is also the case that with non-fused interpretation, we actually might be able to resolve dependencies, but do not with ConsiderSeparateLoops. What ensures that does not happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIV, RDIV or ZIV are able to handle pairs from different loops. RDIV and ZIV could do that before and SIV is able to handle it by this patch.

I cannot see a case where we may miss resolving any dependencies by considering separate loops optimization. Could you please clarify on that?

// the separate level extraction at the end of the depends api we have
// a - 1
// b - 2 = CommonLevels
// <c, e> - 3 : A SeparateLevel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// <c, e> - 3 : A SeparateLevel
// <c, e> - 3 = SeparateLevels

This introduced 3 different meanings of levels and it is not clear what interpretation at what point it the "correct" one. Better integrate into the text above, where SeparateLevels is zero unless ConsiderSeparateLoops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the confusion as mentioned in another comment above.

Comment on lines 342 to 344
assert(Levels < Level && Level <= Levels + SeparateLevels &&
"Separate level out of range");
return DVSeparate[Level - Levels - 1].Direction;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not introduce a helper function that returns the correct index into DVSeparate so you don't need the if/else here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. Added.

@1997alireza
Copy link
Contributor Author

1997alireza commented Apr 22, 2025

To be clear: I don't yet think this is worth integrating for a special case of loop fusion, especially if we don't even have the corresponding PR that makes use of this yet.

Thanks Michael for your time and comments. I tried to resolve them and provide a more clean patch along with changes applied to the loop fusion pass to use the info provided by DA. Please take a look and let me know what you think. I apologize for the delay, I was caught up with some other projects.

@1997alireza 1997alireza force-pushed the da-siv-separate-loops branch 4 times, most recently from b3df426 to 4420416 Compare April 23, 2025 15:24
@CongzheUalberta CongzheUalberta requested review from fhahn and sebpop April 25, 2025 20:20
Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several unrelated formatting changes in there. If it is easer, we can commit a NFC clang-format change so you don't have to worry about clang-format introducing changes anymore.

With not seeing the loop fusion code, I meant to open a seaparate commit that modifies loop fusion, so this PR does mix two concerns. See https://llvm.org/docs/GitHub.html#stacked-pull-requests.

@@ -825,16 +874,23 @@ void DependenceInfo::establishNestingLevels(const Instruction *Src,
DstLoop = DstLoop->getParentLoop();
DstLevel--;
}
// find the first separate similar level
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes from innermost loops to outer ones, and stop at the innermost unequal one. But what if the second innermost is equal just the outermost is not:

for (int i = 0; i < 42; ++i) { // fusable
  for (int j = 0; j < 21; ++j) { // not fusable
  }
}
for (int i = 0; i < 42; ++i) { // fusable
  for (int j = 0; j < 42; ++j) {
  }
}

I think the i-loop should be comon levels, just the j-loops not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! It is fixed now in this code fragment.

Comment on lines 597 to 625
/// SeparateLevels counts the number of loop levels after the common levels
/// that are not identical but are considered similar. Two levels are
/// considered similar if they have the same trip count and the same
/// nesting depth.
/// For example, if loops `c` and `e` are similar, then they contribute to
/// the SeparateLevels count and SeparateLevels is set to 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not messing up with the level count is a good thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write some about how the interface works for methods that take a "Separate" argument? In this example, c and e are considered similar, which are loops 3 and 5, but with Separate==true, to get the result for the fused loop, one passes just 3 (which could be considered either the loop depth which is the same of loop c and e, or the SrcLevels representing the fused loops). The pre-existing use of "level" which is not actually the loop depth (number of surrounding loops it is nested in) but a loop id in depth-first order is a bit unfortunate. Some disambiguation here may help.

[not a change request] Instead of "Separate" I think I would have preferred using something "like AssumeFusedLoops" consistently throughout, but not something that I would required for this review. "Separate" does not convey a lot of meaning, it could be loops from different functions. For methods that don't even make sense for non-fused loop such as getDistance (hence level should rather mean depth), the parameter isn't strictly nessary, a client that doesn't know about assumed-fused loops would not call it with argument beyond getLevels() anyways.

/// this loop will break this dependence.
bool isPeelFirst(unsigned Level) const override;
/// this loop will break this dependence. If Separate is set to true,
/// information about a separate level is provided.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing in the header file that even explains what a "separate level" actually is. Rather than having this sentence not saying anything here, consider central documentation whati t means, e.g. as class comment.

Could contain:

  • Number of loops inside common loops that are "similar" (+definition of similar)
  • Have different llvm::Loop objects but can be interpreted as a single fused loop with Separate=true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is added in the header file.

@@ -20,7 +20,7 @@ define void @p2(i64 %n, ptr %A, ptr %B) nounwind uwtable ssp {
; CHECK-NEXT: Src: store i64 %i.011, ptr %arrayidx8, align 8 --> Dst: store i64 %i.011, ptr %arrayidx8, align 8
; CHECK-NEXT: da analyze - none!
; CHECK-NEXT: Src: store i64 %i.011, ptr %arrayidx8, align 8 --> Dst: %0 = load i64, ptr %arrayidx17, align 8
; CHECK-NEXT: da analyze - flow [-3 -2]!
; CHECK-NEXT: da analyze - flow [-3 -2 / -1]!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than merging two DA interpretations into the same output, have you considered keeping the separate for better understandability. E.g.:

da analyze - flow [-3 -2]! / assuming 1 fused loop: [-3 -2 -1]!

Because of how FileCheck works, this test wouldn't even needed to be changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied.

@1997alireza 1997alireza force-pushed the da-siv-separate-loops branch 4 times, most recently from 17552ab to 21ef00f Compare May 30, 2025 18:42
@1997alireza
Copy link
Contributor Author

There are several unrelated formatting changes in there. If it is easer, we can commit a NFC clang-format change so you don't have to worry about clang-format introducing changes anymore.

With not seeing the loop fusion code, I meant to open a seaparate commit that modifies loop fusion, so this PR does mix two concerns. See https://llvm.org/docs/GitHub.html#stacked-pull-requests.

A separate commit and PR has been created for the changes applied to the loop fusion pass.

@Meinersbur
Copy link
Member

A separate commit and PR has been created for the changes applied to the loop fusion pass.

Do you mean 1997alireza#1 ? You already merged it into this PR so https://github.com/llvm/llvm-project/pull/128782/files includes the LoopFuse changes again. The point of Stacked PRs is to not merge them.

@1997alireza
Copy link
Contributor Author

A separate commit and PR has been created for the changes applied to the loop fusion pass.

Do you mean 1997alireza#1 ? You already merged it into this PR so https://github.com/llvm/llvm-project/pull/128782/files includes the LoopFuse changes again. The point of Stacked PRs is to not merge them.

Reverted the merged PR. Now the loop fusion changes are in another PR #146383.

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some bikeshedding left but otherwise the patch looks good

}

/// inSeparateLoops - Returns true if this level is a separate level, i.e.,
/// performed across two separate loop nests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// performed across two separate loop nests.
/// performed across two separate loop nests that a treated like a single fused loop.

Comment on lines 597 to 625
/// SeparateLevels counts the number of loop levels after the common levels
/// that are not identical but are considered similar. Two levels are
/// considered similar if they have the same trip count and the same
/// nesting depth.
/// For example, if loops `c` and `e` are similar, then they contribute to
/// the SeparateLevels count and SeparateLevels is set to 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write some about how the interface works for methods that take a "Separate" argument? In this example, c and e are considered similar, which are loops 3 and 5, but with Separate==true, to get the result for the fused loop, one passes just 3 (which could be considered either the loop depth which is the same of loop c and e, or the SrcLevels representing the fused loops). The pre-existing use of "level" which is not actually the loop depth (number of surrounding loops it is nested in) but a loop id in depth-first order is a bit unfortunate. Some disambiguation here may help.

[not a change request] Instead of "Separate" I think I would have preferred using something "like AssumeFusedLoops" consistently throughout, but not something that I would required for this review. "Separate" does not convey a lot of meaning, it could be loops from different functions. For methods that don't even make sense for non-fused loop such as getDistance (hence level should rather mean depth), the parameter isn't strictly nessary, a client that doesn't know about assumed-fused loops would not call it with argument beyond getLevels() anyways.

@kasuga-fj
Copy link
Contributor

kasuga-fj commented Aug 8, 2025

If my understanding is correct this won't be a problem. Because the patch essentially says: let's assume the inner most loops are fused and then check the dependences

But they are not actually fused, and the client of DA is not only LoopFuse.

I didn't mean that the result is incorrect, but has changed due to the assumption of loop fusion. This part seems to attempt to split the result into two parts, with or without the assumption. However, the situation isn't that straightforward. The dependencies for Levels < CommonLevels would be affected by the assumption of loop-fusion. It might be noisy for clients other than LoopFuse (though I don't have a concrete example to illustrate it).

@amehsan
Copy link
Contributor

amehsan commented Aug 8, 2025

If my understanding is correct this won't be a problem. Because the patch essentially says: let's assume the inner most loops are fused and then check the dependences

But they are not actually fused, and the client of DA is not only LoopFuse.

I didn't mean that the result is incorrect, but has changed due to the assumption of loop fusion. This part seems to attempt to split the result into two parts, with or without the assumption. However, the situation isn't that straightforward. The dependencies for Levels < CommonLevels would be affected by the assumption of loop-fusion. It might be noisy for clients other than LoopFuse (though I don't have a concrete example to illustrate it).

Yes, in some cases the result may change. My point is that this change will be an improvment as your earlier example demonstrates. We can not stop development of new features because they improve results that other components consume. What I was trying to highlight was that this change (of DA results) will not be incorrect; it will be an improvement.

@amehsan
Copy link
Contributor

amehsan commented Aug 8, 2025

If my understanding is correct this won't be a problem. Because the patch essentially says: let's assume the inner most loops are fused and then check the dependences

But they are not actually fused, and the client of DA is not only LoopFuse.

I didn't mean that the result is incorrect, but has changed due to the assumption of loop fusion. This part seems to attempt to split the result into two parts, with or without the assumption. However, the situation isn't that straightforward. The dependencies for Levels < CommonLevels would be affected by the assumption of loop-fusion. It might be noisy for clients other than LoopFuse (though I don't have a concrete example to illustrate it).

Obviously IF this change of DA result, causes something unexpected (a functional or a performance issue) then we need to look into it and decide how to proceed. That would depend on the nature of the problem encountered. But a priori, I don't see any problem with this occasional change of result, when it is actually an improvement.

@kasuga-fj
Copy link
Contributor

My concerns are:

  • Are such changes of the results always improvements?
  • Is it sound for passes other than LoopFuse to use such improved results?

As for the first one, at least the change to isConsistent doesn't seem to be an improvement. Also, I think it’s difficult to manage it correctly while holding an assumption of fusion. That said, it also seems that no one actually uses that information in practice, so it may not matter (or perhaps it would be better to just remove it altogether). Regarding the other results, such as DV, it appears that the changes are always improvements. In the end, it seems to me that these result changes are basically improvements.

What I'm more concerned about is the second question. For example, consider a case where a loop guard exists between the two fusable loops, such as:

for (int i = 0; i < N; i++) {
  for (int j = 0; j < M; j++)
    Stmt0;
  if (cond)
    for (int j = 0; j < M; j++)
      Stmt1;
}

Stmt1 can have information inferred from cond, such as no-wrap flags. Then, what does "let's assume the inner most loops are fused and then check the dependences" mean? At least, there seem to be two possible interpretations:

// Case 1
for (int i = 0; i < N; i++)
  for (int j = 0; j < M; j++) {
    Stmt0;
    if (cond)
      Stmt1;
  }

// Case 2
for (int i = 0; i < N; i++)
  if (cond)
    for (int j = 0; j < M; j++) {
      Stmt0;
      Stmt1;
    }

IIUC, LoopFuse excludes such cases before calling DependenceInfo::depends, so it may not be relevant to LoopFuse. But what about other passes? Do they have to have additional checks like FusableLevels == 0 || isLoopAdjust(...)? Well, it seems to me that areLoopsSimilar is more pessimistic1.

In conclusion, I'm happy with the improvements themselves when there is evidence supporting their soundness. However, at least at this point, I think it has not been given much consideration.

Footnotes

  1. The same might be said of the current implementation. Maybe it should consider a case where Src and Dst are not in the same BB so that they have different contexts.

@amehsan
Copy link
Contributor

amehsan commented Aug 8, 2025

My concerns are:

  • Are such changes of the results always improvements?
  • Is it sound for passes other than LoopFuse to use such improved results?

About (1) I think forcing isConsistent to be false is incorrect. It should be what the algorithm sets it to. Thanks for highlighting the issue.

About (2) I believe this is fine. IIRC, the question of whether two statements from two different loops are dependent or not is not relevant to any existing DA client other than loop fusion. If it is relevant to another pass, then same concerns that you bring up is applicable to the results that DA produces today.

@amehsan
Copy link
Contributor

amehsan commented Aug 8, 2025

Let me rephrase my answer to question (2). If a pass asks a question about depdency of two statements from two different loops, then that pass needs to have good answers to the question that you raised. Right now, loop fusion can handle this. Any other pass that wants to do this in future needs to be clear about this as well.

@kasuga-fj
Copy link
Contributor

kasuga-fj commented Aug 8, 2025

About (1) I think forcing isConsistent to be false is incorrect. It should be what the algorithm sets it to.

The comment of isConsistent says:

/// isConsistent - Returns true if this dependence is consistent
/// (occurs every time the source and destination are executed).

It means that this function returns true if we know the dependence is consistent. Returning false doesn't mean it is inconsistent. Always returning false is permitted.
Using the same value as the current algorithm computes looks risky to me. Consider the following case:

for (int i = 0; i < 100; i++) {
  for (int j = 0; j < 100; j++)
    A[i][j] = 42;
  
  if (rand() % 2 == 0)
    for (int j = 0; j < 100; j++)
      A[i][j] = 43;
}

The dependence between A[i][j] = 42 and A[i][j] = 43 is not consistent. Then, what happens with this patch? I guess that the current algorithm set Consistent to true 1.

Let me rephrase my answer to question (2). If a pass asks a question about depdency of two statements from two different loops, then that pass needs to have good answers to the question that you raised. Right now, loop fusion can handle this. Any other pass that wants to do this in future needs to be clear about this as well.

If this patch adds new assumptions to the analysis results, we may need to update other existing uses. For example, if the improved results assume the conditions under which "LoopFuse works fine," then we need to replace code like:

Dependence *Dep = DI->depends(Src, Dst);

with something like:

Dependence *Dep = DI->depends(Src, Dst);
if (The same condition as in LoopFuse does not hold)
  return;  // early exit

// Or, more crudely
if (Dep->getFusableLevel() != 0)
  return;

Footnotes

  1. Anyway, it's already broken... https://godbolt.org/z/sc3ds9WaM

@kasuga-fj
Copy link
Contributor

kasuga-fj commented Aug 8, 2025

So, at the very least, I personally think it would be better to insert an assertion like assert(Dep->getFusableLevel() == 0) in every place where DA is used to prevent any accidents in the future.

@amehsan
Copy link
Contributor

amehsan commented Aug 10, 2025

If this patch adds new assumptions to the analysis results, we may need to update other existing uses. For example, if the improved results assume the conditions under which "LoopFuse works fine," then we need to replace code like:

Dependence *Dep = DI->depends(Src, Dst);

with something like:

Dependence *Dep = DI->depends(Src, Dst);
if (The same condition as in LoopFuse does not hold)
  return;

I would say this patch adds a new functionality to DA. Personally I don't see any reason to worry about this. I am not exactly sure what can go wrong and how, that is not an existing risk. However, I believe it is possible to add a new flag to DA interface to enable this new functionality. The default value could be false, to turn it off, and in loop fusion we can turn it on.

Alternatively we can do something like you suggest. We can add an assertion or condition for other clients of DA that checks source and destination are within one loop. But I am not sure how that would protect a future potential client of DA.

In any case, I don't see much benefit to either of these solutions, but if you think it is helpful I have no objections.

@amehsan
Copy link
Contributor

amehsan commented Aug 10, 2025

So, at the very least, I personally think it would be better to insert an assertion like assert(Dep->getFusableLevel() == 0) in every place where DA is used to prevent any accidents in the future.

If you think about common customers of DA, whether in llvm or other compilers (loop interchange, loop distribution, loop vectorization) chances of accidentally passing two statements from two different loops to DA is very low. Pretty much any user of DA needs to deal with enough details, that makes such mistakes low probability. But again I don't have objections if you think that is helpful.

Right now I skipped the first part of your comment about isConsistent. Will check it later.

Thank you very much for taking the time to review and letting us know your opinion.

@1997alireza 1997alireza force-pushed the da-siv-separate-loops branch from 4563cf6 to 7918545 Compare August 11, 2025 23:18
@1997alireza
Copy link
Contributor Author

1997alireza commented Aug 11, 2025

About (1) I think forcing isConsistent to be false is incorrect. It should be what the algorithm sets it to.

The comment of isConsistent says:

/// isConsistent - Returns true if this dependence is consistent
/// (occurs every time the source and destination are executed).

It means that this function returns true if we know the dependence is consistent. Returning false doesn't mean it is inconsistent. Always returning false is permitted. Using the same value as the current algorithm computes looks risky to me. Consider the following case:

for (int i = 0; i < 100; i++) {
  for (int j = 0; j < 100; j++)
    A[i][j] = 42;
  
  if (rand() % 2 == 0)
    for (int j = 0; j < 100; j++)
      A[i][j] = 43;
}

The dependence between A[i][j] = 42 and A[i][j] = 43 is not consistent. Then, what happens with this patch? I guess that the current algorithm set Consistent to true 1.

Let me rephrase my answer to question (2). If a pass asks a question about depdency of two statements from two different loops, then that pass needs to have good answers to the question that you raised. Right now, loop fusion can handle this. Any other pass that wants to do this in future needs to be clear about this as well.

If this patch adds new assumptions to the analysis results, we may need to update other existing uses. For example, if the improved results assume the conditions under which "LoopFuse works fine," then we need to replace code like:

Dependence *Dep = DI->depends(Src, Dst);

with something like:

Dependence *Dep = DI->depends(Src, Dst);
if (The same condition as in LoopFuse does not hold)
  return;  // early exit

// Or, more crudely
if (Dep->getFusableLevel() != 0)
  return;

Footnotes

  1. Anyway, it's already broken... https://godbolt.org/z/sc3ds9WaM

isConsistent is set to false whenever there are fusable levels to be analyzed.

@1997alireza 1997alireza force-pushed the da-siv-separate-loops branch 2 times, most recently from 000bc7b to 3e48d86 Compare August 12, 2025 17:38
Copy link
Contributor

@kasuga-fj kasuga-fj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think about common customers of DA, whether in llvm or other compilers (loop interchange, loop distribution, loop vectorization) chances of accidentally passing two statements from two different loops to DA is very low. Pretty much any user of DA needs to deal with enough details, that makes such mistakes low probability.

I don't think so. For example, regarding LoopInterchange, the following case isn't currently supported, but I believe it could be extended relatively easily

for (int i = 0; i < N; i++)
  for (int j = 0; j < M; j++) {
    for (int k = 0; k < O; k++)
      Stmt1;
    for (int l = 0; l < O; l++)
      Stmt2;
  }

In this case, LoopInterchange is only concerned with the i-loop and j-loop; the k-loop and l-loop don't really matter, especially since LoopInterchange only checks the first two elements of DV. In that sense, the current DA works fine, as it essentially analyzes loops whose levels are less than CommonLevels. However, this patch runs the analysis under the assumption that the k-loop and l-loop are fused, which can affect the dependency results for other loops; "improved results", as mentioned in earlier comments.
So my question is: is it safe to use these "improved results"? Or does LoopInterchange need to verify whether the fusion assumption affects the outcome, and whether it's safe to rely on it? If it's the latter, I think it introduces unnecessary noise. Moreover, I believe this issue isn't limited to LoopInterchange. It likely applies to other loop transformations as well. These transformations may only be concerned with common loops and disregard deeper-level loops entirely, or treat them not as loops but simply as "complex control flow". What I want to emphasize is that this patch appears to impose additional constraints on DA clients. If that's the case, it should at the very least be documented explicitly.

I believe it is possible to add a new flag to DA interface to enable this new functionality. The default value could be false, to turn it off, and in loop fusion we can turn it on.

It may be an optional choice now, but it could become technical debt when we attempt to cache analysis results and reuse them across passes in the future.

isConsistent is set to false whenever there are fusable levels to be analyzed.

Yes, and it might be annoying for other passes. That said, isConsistent seems already incorrect in some cases and nobody uses it, so it might be worth removing it altogether.

Comment on lines 4092 to 4095
for (unsigned level = 0; level < CommonLevels; ++level)
DV[level] = Result.DV[level];
for (unsigned level = 0; level < FusableLevels; ++level)
DVFusable[level] = Result.DV[CommonLevels + level];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (unsigned level = 0; level < CommonLevels; ++level)
DV[level] = Result.DV[level];
for (unsigned level = 0; level < FusableLevels; ++level)
DVFusable[level] = Result.DV[CommonLevels + level];
for (unsigned Level = 0; Level < CommonLevels; ++Level)
DV[Level] = Result.DV[Level];
for (unsigned Level = 0; Level < FusableLevels; ++Level)
DVFusable[Level] = Result.DV[CommonLevels + Level];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied.

Comment on lines 5 to 6
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.6.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Comment on lines 630 to 619
/// a - 1
/// b - 2
/// c,e - 3 = CommonLevels
/// d - 4 = SrcLevels
/// f - 5
/// g - 6 = MaxLevels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how @Meinersbur ran the tests, but since DA isn't triggered by default, I assume it's manually registered, e.g., by editing PassBuilderPipelines.cpp.

I believe there is no need to reset the array Pair

(Assuming we are talking about this part)
Hm, after thinking it over for a while, I also think the reset is unnecessary, as they are overwritten in the subsequent loop. Maybe using Pair[P].Loops is confusing. It would be better to use a temporal (local) variable instead of Pair[P].Loops to keep the code clean. Also, I think it would be better to add a test where CommonLevels and MaxLevels are reverted since TestClass is neither ZIV, SIV nor RDIV.

@amehsan
Copy link
Contributor

amehsan commented Aug 14, 2025

I don't think so. For example, regarding LoopInterchange, the following case isn't currently supported, but I believe it could be extended relatively easily

for (int i = 0; i < N; i++)
  for (int j = 0; j < M; j++) {
    for (int k = 0; k < O; k++)
      Stmt1;
    for (int l = 0; l < O; l++)
      Stmt2;
  }

Your example is not a perfect loop nest (the j-loop has two inner loops) and the legality check for loop interchange assumes a perfect loop nest (see the code snippet below taken from loop interchange source code). I am not sure extension of loop interchange to cover this case will be easy. But anyways, let's use that flag to address this concern.

// After interchanging, check if the direction vector is valid.
// [Theorem] A permutation of the loops in a perfect nest is legal if and only
// if the direction matrix, after the same permutation is applied to its
// columns, has no ">" direction as the leftmost non-"=" direction in any row.
static bool isLexicographicallyPositive(std::vector<char> &DV) {
  for (unsigned char Direction : DV) {
    if (Direction == '<')
      return true;
    if (Direction == '>' || Direction == '*')
      return false;
  }
  return true;
}

@kasuga-fj
Copy link
Contributor

I don't think so. For example, regarding LoopInterchange, the following case isn't currently supported, but I believe it could be extended relatively easily

for (int i = 0; i < N; i++)
  for (int j = 0; j < M; j++) {
    for (int k = 0; k < O; k++)
      Stmt1;
    for (int l = 0; l < O; l++)
      Stmt2;
  }

Your example is not a perfect loop nest (the j-loop has two inner loops) and the legality check for loop interchange assumes a perfect loop nest (see the code snippet below taken from loop interchange source code).

Yes, they are not perfectly nested loops if we take the k-loop and l-loop into account. However, I believe it is sound to ignore them and focus only on the i-loop and j-loop. LoopInterchange currently doesn't have the logic to ignore the k-loop and l-loop, but such functionality could be added in the future. In that case, I think the fusion assumption could lead to something unexpected.

In addition, I suspect a similar problem could arise in LoopFuse. What happens when we attempt to fuse the k0-loop and k1-loop in the example below? Is it sound to assume that the l0-loop and l1-loop are fused?

for (int i = 0; i < M; i++)
  for (int j = 0; j < N; j++) {
    for (int k0 = 0; k0 < O; k0++)
      for (int l0 = 0; l0 < P; l0++)
        Stmt1;
    for (int k1 = 0; k1 < O; k1++)
      for (int l1 = 0; l1 < P; l1++)
        Stmt2;
  }

But anyways, let's use that flag to address this concern.

Just in case: I don't request to add such a flag. I think it's also okay if you can prove the soundness of the new assumption. But if adding such a flag is more convenient, then I don't have strong objections at the moment.

@kasuga-fj
Copy link
Contributor

(see the code snippet below taken from loop interchange source code).

JFYI: the code you’re looking at appears to be slightly outdated.

// Check if a direction vector is lexicographically positive. Return true if it
// is positive, nullopt if it is "zero", otherwise false.
// [Theorem] A permutation of the loops in a perfect nest is legal if and only
// if the direction matrix, after the same permutation is applied to its
// columns, has no ">" direction as the leftmost non-"=" direction in any row.
static std::optional<bool>
isLexicographicallyPositive(ArrayRef<char> DV, unsigned Begin, unsigned End) {
for (unsigned char Direction : DV.slice(Begin, End - Begin)) {
if (Direction == '<')
return true;
if (Direction == '>' || Direction == '*')
return false;
}
return std::nullopt;
}

@1997alireza
Copy link
Contributor Author

1997alireza commented Aug 15, 2025

What happens in a case like the following?

for (int i = 0; i <100; i++)
  for (int j = 0; j < 100; j++) {
    for (int k = 0; k < 10; k++)
      A[i][j][2*k] = 42;
    for (int k = 0; k < 10; k++)
      A[i][j][2*k + 1] = 43;
  }

Currently DA returns [0 0] between the first store and the second one. However, once we fused the k-loops, SIV test will disprove the dependency and it would become none (godbolt: https://godbolt.org/z/4GT49nzdo).

So, this may be an improvement in this case, but at the very least, the result has changed. In general, assuming loops to be fused can affect the outer common loops, though I'm not sure if that's actually problematic.

The IR provided in the link appears to be incorrect. Specifically, k.0 should be replaced with k.1 in line 31. With this change, the current DA can already determine that there is no dependency by using the RDIV test. Therefore, my patch does not change the result in this case.
In general, if RDIV is functioning as intended, there should be no case in which my patch detects no dependency between two different loops where RDIV has not already done so. My patch simply tries to generate information about the distance of the dependencies if possible.

@kasuga-fj
Copy link
Contributor

The IR provided in the link appears to be incorrect. Specifically, k.0 should be replaced with k.1 in line 31. With this change, the current DA can already determine that there is no dependency by using the RDIV test. Therefore, my patch does not change the result in this case.

You are completely correct. I was stupid, sorry about that...
I now understand that the RDIV test already handles loops whose nesting levels are greater than CommonLevels. So, while I still suspect it might be incorrect in some situations, that's not relevant to your patch.
On the other hand, it turns out there are indeed cases where your patch changes the result. Consider the following case:

for (int i = 0; i < 10; i++) {
  for (int j = 1; j < 11; j++)
    A[i][j+1][j-1] = 42;
  for (int j = 0; j < 10; j++)
    A[i][j][j] = 43;
}

The current DA produces the following result (godbolt: https://godbolt.org/z/5d6xej3bv):

Printing analysis 'Dependence Analysis' for function 'f':
Src:  store i32 42, ptr %idx.0, align 4 --> Dst:  store i32 42, ptr %idx.0, align 4
  da analyze - none!
Src:  store i32 42, ptr %idx.0, align 4 --> Dst:  store i32 43, ptr %idx.1, align 4
  da analyze - output [0|<]!
Src:  store i32 43, ptr %idx.1, align 4 --> Dst:  store i32 43, ptr %idx.1, align 4
  da analyze - none!

I also ran this locally with your patch, which produced:

Printing analysis 'Dependence Analysis' for function 'f':
Src:  store i32 42, ptr %idx.0, align 4 --> Dst:  store i32 42, ptr %idx.0, align 4
  da analyze - none!
Src:  store i32 42, ptr %idx.0, align 4 --> Dst:  store i32 43, ptr %idx.1, align 4
  da analyze - none!
Src:  store i32 43, ptr %idx.1, align 4 --> Dst:  store i32 43, ptr %idx.1, align 4
  da analyze - none!

This time it should be correct, and I'm still not sure whether using this result is sound for callers other than LoopFuse.

@1997alireza
Copy link
Contributor Author

The IR provided in the link appears to be incorrect. Specifically, k.0 should be replaced with k.1 in line 31. With this change, the current DA can already determine that there is no dependency by using the RDIV test. Therefore, my patch does not change the result in this case.

You are completely correct. I was stupid, sorry about that... I now understand that the RDIV test already handles loops whose nesting levels are greater than CommonLevels. So, while I still suspect it might be incorrect in some situations, that's not relevant to your patch. On the other hand, it turns out there are indeed cases where your patch changes the result. Consider the following case:

for (int i = 0; i < 10; i++) {
  for (int j = 1; j < 11; j++)
    A[i][j+1][j-1] = 42;
  for (int j = 0; j < 10; j++)
    A[i][j][j] = 43;
}

The current DA produces the following result (godbolt: https://godbolt.org/z/5d6xej3bv):

Printing analysis 'Dependence Analysis' for function 'f':
Src:  store i32 42, ptr %idx.0, align 4 --> Dst:  store i32 42, ptr %idx.0, align 4
  da analyze - none!
Src:  store i32 42, ptr %idx.0, align 4 --> Dst:  store i32 43, ptr %idx.1, align 4
  da analyze - output [0|<]!
Src:  store i32 43, ptr %idx.1, align 4 --> Dst:  store i32 43, ptr %idx.1, align 4
  da analyze - none!

I also ran this locally with your patch, which produced:

Printing analysis 'Dependence Analysis' for function 'f':
Src:  store i32 42, ptr %idx.0, align 4 --> Dst:  store i32 42, ptr %idx.0, align 4
  da analyze - none!
Src:  store i32 42, ptr %idx.0, align 4 --> Dst:  store i32 43, ptr %idx.1, align 4
  da analyze - none!
Src:  store i32 43, ptr %idx.1, align 4 --> Dst:  store i32 43, ptr %idx.1, align 4
  da analyze - none!

This time it should be correct, and I'm still not sure whether using this result is sound for callers other than LoopFuse.

Yes, my patch will modify certain cases such as this one. May I ask what your specific objection is? Do you believe that my patch produces incorrect results? If not, I do not see any issue with it. The patch enhances the results and can even detect additional cases where no dependency exists. The point you raise regarding changes to the results could equally apply to the RDIV test. For example, what if someone improved RDIV to address cases similar to the one above?

@kasuga-fj
Copy link
Contributor

May I ask what your specific objection is? Do you believe that my patch produces incorrect results?

Basically yes. For example, consider the following case:

for (int i = 0; i < M; i++) {
  if (i < 10)
    for (int j = 0; j < N; j++)
      A[i+10][j] = 42;
  for (int j = 0; j < N; j++)
    A[i][j] = 43;
}

What does "assuming fusion" mean in this case? Doesn't it also imply that Stmt2 is executed only when i < 10? If so, i < i + 10 is implied and DA may report no dependency, which is obviously incorrect if M is greater than 10. It seems that this kind of condition isn't considered by DA at the moment, and I'm worried that miscompilations could occur if we try to account for it in the future, or that similar accidents could happen unnoticed due to improvements in ScalaEvolution.
This example, as well as the one in my earlier comment, represents just some specific cases, and I’m not sure when the changed result is actually "correct." Unless we can state that clearly, it would be ideal that the result remains the same, regardless of whether the fusion assumption is applied. Or at the very least, I think it would be better to ensure that there are no individual loop-guards between fusable loops within areLoopsSimilar.

@kasuga-fj
Copy link
Contributor

The point you raise regarding changes to the results could equally apply to the RDIV test. For example, what if someone improved RDIV to address cases similar to the one above?

IIUIC, RDIV seems to have similar potential issues as well (unfortunately, DA is already pretty broken...).

@1997alireza
Copy link
Contributor Author

May I ask what your specific objection is? Do you believe that my patch produces incorrect results?

Basically yes. For example, consider the following case:

for (int i = 0; i < M; i++) {
  if (i < 10)
    for (int j = 0; j < N; j++)
      A[i+10][j] = 42;
  for (int j = 0; j < N; j++)
    A[i][j] = 43;
}

What does "assuming fusion" mean in this case? Doesn't it also imply that Stmt2 is executed only when i < 10? If so, i < i + 10 is implied and DA may report no dependency, which is obviously incorrect if M is greater than 10. It seems that this kind of condition isn't considered by DA at the moment, and I'm worried that miscompilations could occur if we try to account for it in the future, or that similar accidents could happen unnoticed due to improvements in ScalaEvolution. This example, as well as the one in my earlier comment, represents just some specific cases, and I’m not sure when the changed result is actually "correct." Unless we can state that clearly, it would be ideal that the result remains the same, regardless of whether the fusion assumption is applied. Or at the very least, I think it would be better to ensure that there are no individual loop-guards between fusable loops within areLoopsSimilar.

I do not believe such checks are necessary within DA. If you run this code with the original DA, the result is [10], which represents the dependency distance with respect to loop i. With my patch applied, DA is able to capture the dependency for loop j as well, resulting in [10 0].

As you can see, DA does not report no dependency; rather, it now provides the dependency distance as well. The point you mentioned regarding loop-guards is not directly related to what DA is designed to provide. "assuming fusion" means what would be the dependency between two accesses if they were in the same loop like this:

for (int i = 0; i < M; i++) {
    for (int j = 0; j < N; j++)
      A[i+10][j] = 42;
      A[i][j] = 43;
}

DA doesn't need to know about the loop-guards to provide such information. It is similar to the following case where DA generates [10, 0].

for (int i = 0; i < M; i++) {
    for (int j = 0; j < N; j++) {
      if (i < 10)
        A[i+10][j] = 42;
      A[i][j] = 43;
    }
}

When there is a dependency between two memory instructions in separate
fusable loops, SIV will be able to test them and compute the direction
and the distance of the dependency. Two loop levels are considered
fusable if they have the same tripcount and depth.
@1997alireza 1997alireza force-pushed the da-siv-separate-loops branch from 3e48d86 to 2a56dea Compare August 19, 2025 15:08
@kasuga-fj
Copy link
Contributor

At the very least, ScalarEvolution uses loop-guards to prove something, and SCEVAddRecExprs are constructed based on them. In other words, they may contain information inferred from loop-guards, which we might be implicitly leveraging through the ScalarEvolution APIs. Therefore, I believe we need to ensure that we're not depending on properties derived by ScalarEvolution from loop-guards that are present in only one of the loops of the fusable-loops.

In addition, IIUIC, the following two codes are not considered equivalent by ScalarEvolution, even thought the condition in the if-statement is loop-invariant. In the first case, the if condition is a loop-guard and is taken into account when constructing SCEVAddRecExprs for expressions inside Stmt. In contrast, this is not true for the second case.

if (i < 10)
  for (int j = 0; j < N; j++)
    Stmt;

for (int j = 0; j < N; j++)
  if (i < 10)
    Stmt;

@amehsan
Copy link
Contributor

amehsan commented Aug 26, 2025

At the very least, ScalarEvolution uses loop-guards to prove something, and SCEVAddRecExprs are constructed based on them. In other words, they may contain information inferred from loop-guards, which we might be implicitly leveraging through the ScalarEvolution APIs. Therefore, I believe we need to ensure that we're not depending on properties derived by ScalarEvolution from loop-guards that are present in only one of the loops of the fusable-loops.

In addition, IIUIC, the following two codes are not considered equivalent by ScalarEvolution, even thought the condition in the if-statement is loop-invariant. In the first case, the if condition is a loop-guard and is taken into account when constructing SCEVAddRecExprs for expressions inside Stmt. In contrast, this is not true for the second case.

if (i < 10)
  for (int j = 0; j < N; j++)
    Stmt;

for (int j = 0; j < N; j++)
  if (i < 10)
    Stmt;

I appreciate your intention to ensure the utmost quality of the open source compiler. Let me clarify a couple of things.

DA gets two mem access instructions in one or two loops. The actual memory accessed may vary depending on the value of the induction variables of the loops. All that DA is responsible for is to check these two instructions may access the same memory for some of the possible values of the induction variables or not. Whether the loops have guards or not, control flow inside the loop, whether the instructions are actually executed for any or all values of the induction variable is irrelevant to DA.

There is another issue about fusability of the loops. None of conditions of fusability (loop guards, etc. ) is relevant here. The only thing relevant is that we have two loops with induction variables I and J that takes values I_start to I_end to J_stat to J_end. The implication of fusability for DA is that we assume these values change in a specific order. That is all. That is not going to cause any dependence to be reported as non-dependent. It only makes it possible to calculate a distance or directions on induction variables I and J (One that is useful for loop fusion). There won't be any impact on other components of the distance vector (with the exception that we can make them more accurate as discussed before), and no impact on distinguishing dependence and non-dependence.

@kasuga-fj
Copy link
Contributor

(Sorry to bother you with my nitpicky comments.)

Let me clarify my point, using the same example.

for (int i = 0; i < M; i++) {
  if (i < 10)
    for (int j = 0; j < N; j++)
      A[i+10][j] = 42;
  for (int j = 0; j < N; j++)
    A[i][j] = 43;
}

Your argument seems correct if we are analyzing the C language (or other high-level languages). But our target is LLVM IR which does not have explicit loop constructs such as for or while statements. Thus, loop-releated information (e.g., backedge-taken count) must be inferred in some way. Loop guards can be utilized for this purpose, for instance, a predicate like N > 0 may be used to compute the backedge-taken count of the loop. From this perspective, I think it can be said that DA makes use of loop guards. The condition i < 10 in the above example would also be treated as a loop guard. I think they are both equally loop guards and it should be impossible to distinguish information related to backedge-taken count from any other information. Therefore it's safer to think that DA might be indirectly using i < 10 (although I don't know if that condition actually provides useful information).

The current patch seems to be saying "let's regard the two fusable loops as the same", in this example the two j-loops. If the second j-loop is "mapped" to the first j-loop, it appears to me that this may potentially imply considering the following code rather than the original:

for (int i = 0; i < M; i++) {
  if (i < 10)
    for (int j = 0; j < N; j++) {
      A[i+10][j] = 42;
      A[i][j] = 43;
    }
}

// Or, written in a form closer to LLVM IR
for (int i = 0; i < M; i++) {
  if (i < 10 && N > 0) {
    int j = 0;
    do {
      A[i+10][j] = 42;
      A[i][j] = 43;
      j++;
    } while (j < N)
  }
}

I've been thinking about why this happens, and I've come to feel that the root cause is trying to map different loops to the same Level. If this patch is modified so that CommonLevels is not changed due to fusability, then it may also address my concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants