Skip to content

Conversation

arcbbb
Copy link
Contributor

@arcbbb arcbbb commented Aug 7, 2025

This patch splits out the legality checks from PR #151300, following the landing of PR #128593.

It is a step toward supporting vectorization of early-exit loops that contain potentially faulting loads.
In this commit, a loop is considered legal for vectorization if it satisfies the following criteria:

  1. The target supports first-faulting load intrinsics (e.g., vp.load.ff).
  2. Unbounded loads are unit-stride, which is the only type currently supported by vp.load.ff.

@llvmbot llvmbot added backend:RISC-V vectorizers llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Aug 7, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 7, 2025

@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-vectorizers
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-risc-v

Author: Shih-Po Hung (arcbbb)

Changes

This patch splits out the legality checks from PR #151300, following the landing of PR #128593.

It is a step toward supporting vectorization of early-exit loops that contain potentially faulting loads.
In this commit, a loop is considered legal for vectorization if it satisfies the following criteria:

  1. The target supports first-faulting load intrinsics (e.g., vp.load.ff).
  2. All unbounded loads are unit-stride, which is the only type currently supported by vp.load.ff.
  3. All unbounded loads are located in the loop header, ensuring that the header mask change dominates the loop.

Full diff: https://github.com/llvm/llvm-project/pull/152422.diff

9 Files Affected:

  • (modified) llvm/include/llvm/Analysis/Loads.h (+8)
  • (modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+3)
  • (modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+2)
  • (modified) llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h (+8)
  • (modified) llvm/lib/Analysis/Loads.cpp (+16)
  • (modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+4)
  • (modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h (+3)
  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp (+28-3)
  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+7)
diff --git a/llvm/include/llvm/Analysis/Loads.h b/llvm/include/llvm/Analysis/Loads.h
index 84564563de8e3..080757b6d1fe0 100644
--- a/llvm/include/llvm/Analysis/Loads.h
+++ b/llvm/include/llvm/Analysis/Loads.h
@@ -91,6 +91,14 @@ LLVM_ABI bool isDereferenceableReadOnlyLoop(
     Loop *L, ScalarEvolution *SE, DominatorTree *DT, AssumptionCache *AC,
     SmallVectorImpl<const SCEVPredicate *> *Predicates = nullptr);
 
+/// Return true if the loop \p L cannot fault on any iteration and only
+/// contains read-only memory accesses. Also collect loads that are not
+/// guaranteed to be dereferenceable.
+LLVM_ABI bool isReadOnlyLoopWithSafeOrSpeculativeLoads(
+    Loop *L, ScalarEvolution *SE, DominatorTree *DT, AssumptionCache *AC,
+    SmallVectorImpl<LoadInst *> *SpeculativeLoads,
+    SmallVectorImpl<const SCEVPredicate *> *Predicates = nullptr);
+
 /// Return true if we know that executing a load from this value cannot trap.
 ///
 /// If DT and ScanFrom are specified this method performs context-sensitive
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index aa4550de455e0..0671ec4f4db01 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1857,6 +1857,9 @@ class TargetTransformInfo {
   /// \returns True if the target supports scalable vectors.
   LLVM_ABI bool supportsScalableVectors() const;
 
+  /// \returns True if the target supports speculative load intrinsics (e.g., vp.load.ff).
+  LLVM_ABI bool supportsSpeculativeLoads() const;
+
   /// \return true when scalable vectorization is preferred.
   LLVM_ABI bool enableScalableVectorization() const;
 
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index abdbca04488db..1df93ecc7ec16 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -1106,6 +1106,8 @@ class TargetTransformInfoImplBase {
 
   virtual bool supportsScalableVectors() const { return false; }
 
+  virtual bool supportsSpeculativeLoads() const { return false; }
+
   virtual bool enableScalableVectorization() const { return false; }
 
   virtual bool hasActiveVectorLength() const { return false; }
diff --git a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
index 43ff084816d18..3b5638f3f570a 100644
--- a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
+++ b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
@@ -445,6 +445,11 @@ class LoopVectorizationLegality {
   /// Returns a list of all known histogram operations in the loop.
   bool hasHistograms() const { return !Histograms.empty(); }
 
+  /// Returns the loads that may fault and need to be speculative.
+  const SmallPtrSetImpl<const Instruction *> &getSpeculativeLoads() const {
+    return SpeculativeLoads;
+  }
+
   PredicatedScalarEvolution *getPredicatedScalarEvolution() const {
     return &PSE;
   }
@@ -630,6 +635,9 @@ class LoopVectorizationLegality {
   /// may work on the same memory location.
   SmallVector<HistogramInfo, 1> Histograms;
 
+  /// Hold all loads that need to be speculative.
+  SmallPtrSet<const Instruction *, 4> SpeculativeLoads;
+
   /// BFI and PSI are used to check for profile guided size optimizations.
   BlockFrequencyInfo *BFI;
   ProfileSummaryInfo *PSI;
diff --git a/llvm/lib/Analysis/Loads.cpp b/llvm/lib/Analysis/Loads.cpp
index 78d0887d5d87e..c5a55e9903d41 100644
--- a/llvm/lib/Analysis/Loads.cpp
+++ b/llvm/lib/Analysis/Loads.cpp
@@ -870,3 +870,19 @@ bool llvm::isDereferenceableReadOnlyLoop(
   }
   return true;
 }
+
+bool llvm::isReadOnlyLoopWithSafeOrSpeculativeLoads(
+    Loop *L, ScalarEvolution *SE, DominatorTree *DT, AssumptionCache *AC,
+    SmallVectorImpl<LoadInst *> *SpeculativeLoads,
+    SmallVectorImpl<const SCEVPredicate *> *Predicates) {
+  for (BasicBlock *BB : L->blocks()) {
+    for (Instruction &I : *BB) {
+      if (auto *LI = dyn_cast<LoadInst>(&I)) {
+        if (!isDereferenceableAndAlignedInLoop(LI, L, *SE, *DT, AC, Predicates))
+          SpeculativeLoads->push_back(LI);
+      } else if (I.mayReadFromMemory() || I.mayWriteToMemory() || I.mayThrow())
+        return false;
+    }
+  }
+  return true;
+}
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index c7eb2ec18c679..9f05e01d34781 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -1457,6 +1457,10 @@ bool TargetTransformInfo::supportsScalableVectors() const {
   return TTIImpl->supportsScalableVectors();
 }
 
+bool TargetTransformInfo::supportsSpeculativeLoads() const {
+  return TTIImpl->supportsSpeculativeLoads();
+}
+
 bool TargetTransformInfo::enableScalableVectorization() const {
   return TTIImpl->enableScalableVectorization();
 }
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
index 05d504cbcb6bb..54e9c8346b6e2 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
@@ -110,6 +110,9 @@ class RISCVTTIImpl final : public BasicTTIImplBase<RISCVTTIImpl> {
   bool supportsScalableVectors() const override {
     return ST->hasVInstructions();
   }
+  bool supportsSpeculativeLoads() const override {
+    return ST->hasVInstructions();
+  }
   bool enableOrderedReductions() const override { return true; }
   bool enableScalableVectorization() const override {
     return ST->hasVInstructions();
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index c47fd9421fddd..46660866741ea 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -1760,16 +1760,41 @@ bool LoopVectorizationLegality::isVectorizableEarlyExitLoop() {
   assert(LatchBB->getUniquePredecessor() == SingleUncountableExitingBlock &&
          "Expected latch predecessor to be the early exiting block");
 
-  // TODO: Handle loops that may fault.
   Predicates.clear();
-  if (!isDereferenceableReadOnlyLoop(TheLoop, PSE.getSE(), DT, AC,
-                                     &Predicates)) {
+  SmallVector<LoadInst *, 4> NonDerefLoads;
+  bool HasSafeAccess =
+      TTI->supportsSpeculativeLoads()
+          ? isReadOnlyLoopWithSafeOrSpeculativeLoads(
+                TheLoop, PSE.getSE(), DT, AC, &NonDerefLoads, &Predicates)
+          : isDereferenceableReadOnlyLoop(TheLoop, PSE.getSE(), DT, AC,
+                                          &Predicates);
+  if (!HasSafeAccess) {
     reportVectorizationFailure(
         "Loop may fault",
         "Cannot vectorize potentially faulting early exit loop",
         "PotentiallyFaultingEarlyExitLoop", ORE, TheLoop);
     return false;
   }
+  // Speculative loads need to be unit-stride.
+  for (LoadInst *LI : NonDerefLoads) {
+    if (LI->getParent() != TheLoop->getHeader()) {
+      reportVectorizationFailure("Cannot vectorize predicated speculative load",
+                                 "SpeculativeLoadNeedsPredication", ORE,
+                                 TheLoop);
+      return false;
+    }
+    int Stride = isConsecutivePtr(LI->getType(), LI->getPointerOperand());
+    if (Stride != 1) {
+      reportVectorizationFailure("Loop contains non-unit-stride load",
+                                 "Cannot vectorize early exit loop with "
+                                 "speculative non-unit-stride load",
+                                 "SpeculativeNonUnitStrideLoadEarlyExitLoop",
+                                 ORE, TheLoop);
+      return false;
+    }
+    SpeculativeLoads.insert(LI);
+    LLVM_DEBUG(dbgs() << "LV: Found speculative load: " << *LI << "\n");
+  }
 
   [[maybe_unused]] const SCEV *SymbolicMaxBTC =
       PSE.getSymbolicMaxBackedgeTakenCount();
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 9667b506e594f..790a5236d4f04 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -10041,6 +10041,13 @@ bool LoopVectorizePass::processLoop(Loop *L) {
     return false;
   }
 
+  if (!LVL.getSpeculativeLoads().empty()) {
+    reportVectorizationFailure("Auto-vectorization of loops with speculative "
+                               "load is not supported",
+                               "SpeculativeLoadsNotSupported", ORE, L);
+    return false;
+  }
+
   // Entrance to the VPlan-native vectorization path. Outer loops are processed
   // here. They may require CFG and instruction level transformations before
   // even evaluating whether vectorization is profitable. Since we cannot modify

Copy link
Member

@alexey-bataev alexey-bataev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests?

Comment on lines 883 to 884
} else if (I.mayReadFromMemory() || I.mayWriteToMemory() || I.mayThrow())
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Braces

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks!

@arcbbb
Copy link
Contributor Author

arcbbb commented Aug 9, 2025

Tests?

Added, and updated the criteria to limit the number of unbound accesses to one.

@david-arm
Copy link
Contributor

If you're interested I tried to solve this in a different way (#120603) using loop versioning based on alignment of the loads in the loop. However, this was rejected due to an undocumented feature of normal IR loads. Using different intrinsics with their own semantics certainly solves this problem!

@arcbbb
Copy link
Contributor Author

arcbbb commented Aug 19, 2025

ping

@arcbbb
Copy link
Contributor Author

arcbbb commented Sep 2, 2025

gentle ping

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return false;
}
// Check non-dereferenceable loads if any.
for (LoadInst *LI : NonDerefLoads) {
// Only support unit-stride access for now.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question about potential AArch64 support, SVE has first-fault gathers IIUC which could be used for strided accesses. I presume we don't need to check for alignment there? From quickly scanning the docs it looks like an unaligned access non-first-fault will be handled the same as any other non-first-fault.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, SVE's first faulting gathers have the same alignment rules as normal gathers, so no extra checks required.

Comment on lines 448 to 451
/// Returns the loads that need to be fault-only-first.
const SmallPtrSetImpl<const Instruction *> &getFaultOnlyFirstLoads() const {
return FaultOnlyFirstLoads;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me what fault-only-first means here. Those are just unit strided loads that may not be dereferenceable and may not be aligned?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would getFaultingLoads be an accurate description? Loads that may fault due to not being deferenceable or alignment and so need to be handled via fault-only-first instructions. They may be non-unit strided in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed it to PotentiallyFaultingLoads. Hope this makes it clearer.

Comment on lines 638 to 639
/// Hold all loads that need to be fault-only-first.
SmallPtrSet<const Instruction *, 4> FaultOnlyFirstLoads;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above regarding naming

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:RISC-V llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms vectorizers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants