Skip to content

Conversation

artagnon
Copy link
Contributor

@artagnon artagnon commented Jun 6, 2025

Optimize CRC loops using a Sarwate table-lookup by using the results of HashRecognize in LoopIdiomRecognize. The optimization is checked for correctness using the SingleSource/UnitTests/HashRecognize tests in llvm-test-suite.

@artagnon
Copy link
Contributor Author

artagnon commented Jun 7, 2025

Results from LLVM compile time tracker look okay, although there is a crash in ThinLTO.

The results from llvm-opt-benchmark indicate that there are quite a few opportunities successfully exploited by HashRecognize, although the results look a bit strange.

Copy link
Contributor

@pfusik pfusik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round. :)

@artagnon
Copy link
Contributor Author

Thanks for the review, and sorry about the slow response: I'm currently traveling, but should get back to work next week. I will clarify that I put up this draft mainly for results from LLVM Compile Time Tracker/llvm-opt-benchmark: it currently suffers from correctness issues.

@artagnon
Copy link
Contributor Author

artagnon commented Jun 17, 2025

Okay, the correctness issues should be fixed now: I've checked the output on different examples I wrote by hand. The diff from llvm-opt-benchmark also looks sound. LLVM Compile Time tracker results look clean. Within SingleSource/MultiSource, LLVM Test Suite has a single example of CRC which was optimized correctly: GCC-C-execute-pr84524.

Since the pass is enabled by default in this PR, some pass-pipeline printing tests failed. Also, there's a CRC optimizer in Hexagon that runs later in the pipeline, and should probably be stripped: the test is written with O2, and hence fails. I'm not sure we want to enable it by default in this PR, which is why I didn't update the tests. My biggest question is how we can rigorously test the analysis/transform: the pattern is too specific, and I think the best we can do is to check in a bunch of hand-written CRC examples into llvm-test-suite.

@artagnon artagnon force-pushed the perf/lir-hashrecognize branch from ea6c264 to fa60b28 Compare June 18, 2025 15:03
artagnon added a commit to artagnon/llvm-test-suite that referenced this pull request Jun 18, 2025
In preparation to optimize cyclic-redundancy-check loops using
table-lookup using the newly introduced HashRecognize analysis, add
tests with CRC loops (which will get optimized), and check the result
against a Sarwate table-lookup (which will not get optimized). The
companion change is llvm/llvm-project#143208.
artagnon added a commit to llvm/llvm-test-suite that referenced this pull request Jun 18, 2025
In preparation to optimize cyclic-redundancy-check loops using
table-lookup using the newly introduced HashRecognize analysis, add
tests with CRC loops (which will get optimized), and check the result
against a Sarwate table-lookup (which will not get optimized). The
companion change is llvm/llvm-project#143208.
@artagnon artagnon force-pushed the perf/lir-hashrecognize branch from be15c64 to 4b1419f Compare June 18, 2025 20:58
@artagnon
Copy link
Contributor Author

Okay, all the outstanding issues have been resolved. Pending the merge of #144742, and rebase, this PR is ready to review. There could be two sources of miscompiles:

  1. CRC is optimized incorrectly, due to the transform's logic. Apart from the casting logic which needs review, the rest should be exactly what's shown in crc_table functions in [HashRecognize] Add tests to verify CRC optz llvm-test-suite#258.
  2. Incorrect CRC is optimized to correct CRC, due to the analysis being too lax. No fuzzer can help us here, and we just have to wait for reports from developers writing buggy CRCs and seeing it get optimized incorrectly. This should not affect production code, so it's somewhat less severe.

@artagnon artagnon marked this pull request as ready for review June 18, 2025 21:10
@llvmbot llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Jun 18, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2025

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Ramkumar Ramachandra (artagnon)

Changes

Optimize CRC loops using a Sarwate table-lookup by using the results of HashRecognize in LoopIdiomRecognize. The optimization is checked for correctness using the SingleSource/UnitTests/HashRecognize tests in llvm-test-suite.

-- 8< --
Contains changes from #144742.


Patch is 54.72 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143208.diff

6 Files Affected:

  • (modified) llvm/include/llvm/Analysis/HashRecognize.h (+7-14)
  • (modified) llvm/include/llvm/Transforms/Scalar/LoopIdiomRecognize.h (+3)
  • (modified) llvm/lib/Analysis/HashRecognize.cpp (+17-16)
  • (modified) llvm/lib/Passes/PassRegistry.def (-1)
  • (modified) llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp (+173-1)
  • (added) llvm/test/Transforms/LoopIdiom/cyclic-redundancy-check.ll (+390)
diff --git a/llvm/include/llvm/Analysis/HashRecognize.h b/llvm/include/llvm/Analysis/HashRecognize.h
index c169383bf7b08..a00f27ac22b1d 100644
--- a/llvm/include/llvm/Analysis/HashRecognize.h
+++ b/llvm/include/llvm/Analysis/HashRecognize.h
@@ -66,6 +66,10 @@ struct PolynomialInfo {
   // Set to true in the case of big-endian.
   bool ByteOrderSwapped;
 
+  // The generated Sarwate lookup-table, which can be used to optimize CRC in
+  // the absence of target-specific instructions.
+  function_ref<CRCTable(const APInt &, bool)> GenSarwateTable;
+
   // An optional auxiliary checksum that augments the LHS. In the case of CRC,
   // it is XOR'ed with the LHS, so that the computation's final remainder is
   // zero.
@@ -73,6 +77,7 @@ struct PolynomialInfo {
 
   PolynomialInfo(unsigned TripCount, Value *LHS, const APInt &RHS,
                  Value *ComputedValue, bool ByteOrderSwapped,
+                 function_ref<CRCTable(const APInt &, bool)> GenSarwateTable,
                  Value *LHSAux = nullptr);
 };
 
@@ -84,12 +89,9 @@ class HashRecognize {
 public:
   HashRecognize(const Loop &L, ScalarEvolution &SE);
 
-  // The main analysis entry point.
+  // The main analysis entry points.
   std::variant<PolynomialInfo, ErrBits, StringRef> recognizeCRC() const;
-
-  // Auxilary entry point after analysis to interleave the generating polynomial
-  // and return a 256-entry CRC table.
-  CRCTable genSarwateTable(const APInt &GenPoly, bool ByteOrderSwapped) const;
+  std::optional<PolynomialInfo> getResult() const;
 
   void print(raw_ostream &OS) const;
 
@@ -107,15 +109,6 @@ class HashRecognizePrinterPass
   PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM,
                         LoopStandardAnalysisResults &AR, LPMUpdater &);
 };
-
-class HashRecognizeAnalysis : public AnalysisInfoMixin<HashRecognizeAnalysis> {
-  friend AnalysisInfoMixin<HashRecognizeAnalysis>;
-  static AnalysisKey Key;
-
-public:
-  using Result = HashRecognize;
-  Result run(Loop &L, LoopAnalysisManager &AM, LoopStandardAnalysisResults &AR);
-};
 } // namespace llvm
 
 #endif
diff --git a/llvm/include/llvm/Transforms/Scalar/LoopIdiomRecognize.h b/llvm/include/llvm/Transforms/Scalar/LoopIdiomRecognize.h
index 241a3fc109360..109b4520878cb 100644
--- a/llvm/include/llvm/Transforms/Scalar/LoopIdiomRecognize.h
+++ b/llvm/include/llvm/Transforms/Scalar/LoopIdiomRecognize.h
@@ -40,6 +40,9 @@ struct DisableLIRP {
 
   /// When true, Wcslen is disabled.
   static bool Wcslen;
+
+  /// When true, HashRecognize is disabled.
+  static bool HashRecognize;
 };
 
 /// Performs Loop Idiom Recognize Pass.
diff --git a/llvm/lib/Analysis/HashRecognize.cpp b/llvm/lib/Analysis/HashRecognize.cpp
index d11602f921872..850b6be412d16 100644
--- a/llvm/lib/Analysis/HashRecognize.cpp
+++ b/llvm/lib/Analysis/HashRecognize.cpp
@@ -442,11 +442,13 @@ getRecurrences(BasicBlock *LoopLatch, const PHINode *IndVar, const Loop &L) {
   return std::make_pair(SimpleRecurrence, ConditionalRecurrence);
 }
 
-PolynomialInfo::PolynomialInfo(unsigned TripCount, Value *LHS, const APInt &RHS,
-                               Value *ComputedValue, bool ByteOrderSwapped,
-                               Value *LHSAux)
+PolynomialInfo::PolynomialInfo(
+    unsigned TripCount, Value *LHS, const APInt &RHS, Value *ComputedValue,
+    bool ByteOrderSwapped,
+    function_ref<CRCTable(const APInt &, bool)> GenSarwateTable, Value *LHSAux)
     : TripCount(TripCount), LHS(LHS), RHS(RHS), ComputedValue(ComputedValue),
-      ByteOrderSwapped(ByteOrderSwapped), LHSAux(LHSAux) {}
+      ByteOrderSwapped(ByteOrderSwapped), GenSarwateTable(GenSarwateTable),
+      LHSAux(LHSAux) {}
 
 /// In the big-endian case, checks the bottom N bits against CheckFn, and that
 /// the rest are unknown. In the little-endian case, checks the top N bits
@@ -471,8 +473,7 @@ static bool checkExtractBits(const KnownBits &Known, unsigned N,
 /// Generate a lookup table of 256 entries by interleaving the generating
 /// polynomial. The optimization technique of table-lookup for CRC is also
 /// called the Sarwate algorithm.
-CRCTable HashRecognize::genSarwateTable(const APInt &GenPoly,
-                                        bool ByteOrderSwapped) const {
+static CRCTable genSarwateTable(const APInt &GenPoly, bool ByteOrderSwapped) {
   unsigned BW = GenPoly.getBitWidth();
   CRCTable Table;
   Table[0] = APInt::getZero(BW);
@@ -625,7 +626,7 @@ HashRecognize::recognizeCRC() const {
 
   Value *LHSAux = SimpleRecurrence ? SimpleRecurrence.Start : nullptr;
   return PolynomialInfo(TC, ConditionalRecurrence.Start, GenPoly, ComputedValue,
-                        *ByteOrderSwapped, LHSAux);
+                        *ByteOrderSwapped, genSarwateTable, LHSAux);
 }
 
 void CRCTable::print(raw_ostream &OS) const {
@@ -679,13 +680,20 @@ void HashRecognize::print(raw_ostream &OS) const {
     OS << "\n";
   }
   OS.indent(2) << "Computed CRC lookup table:\n";
-  genSarwateTable(Info.RHS, Info.ByteOrderSwapped).print(OS);
+  Info.GenSarwateTable(Info.RHS, Info.ByteOrderSwapped).print(OS);
 }
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
 void HashRecognize::dump() const { print(dbgs()); }
 #endif
 
+std::optional<PolynomialInfo> HashRecognize::getResult() const {
+  auto Res = HashRecognize(L, SE).recognizeCRC();
+  if (std::holds_alternative<PolynomialInfo>(Res))
+    return std::get<PolynomialInfo>(Res);
+  return std::nullopt;
+}
+
 HashRecognize::HashRecognize(const Loop &L, ScalarEvolution &SE)
     : L(L), SE(SE) {}
 
@@ -693,13 +701,6 @@ PreservedAnalyses HashRecognizePrinterPass::run(Loop &L,
                                                 LoopAnalysisManager &AM,
                                                 LoopStandardAnalysisResults &AR,
                                                 LPMUpdater &) {
-  AM.getResult<HashRecognizeAnalysis>(L, AR).print(OS);
+  HashRecognize(L, AR.SE).print(OS);
   return PreservedAnalyses::all();
 }
-
-HashRecognize HashRecognizeAnalysis::run(Loop &L, LoopAnalysisManager &AM,
-                                         LoopStandardAnalysisResults &AR) {
-  return {L, AR.SE};
-}
-
-AnalysisKey HashRecognizeAnalysis::Key;
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index f761d0dab09a8..ec14c6a9211d9 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -672,7 +672,6 @@ LOOPNEST_PASS("no-op-loopnest", NoOpLoopNestPass())
 #define LOOP_ANALYSIS(NAME, CREATE_PASS)
 #endif
 LOOP_ANALYSIS("ddg", DDGAnalysis())
-LOOP_ANALYSIS("hash-recognize", HashRecognizeAnalysis())
 LOOP_ANALYSIS("iv-users", IVUsersAnalysis())
 LOOP_ANALYSIS("no-op-loop", NoOpLoopAnalysis())
 LOOP_ANALYSIS("pass-instrumentation", PassInstrumentationAnalysis(PIC))
diff --git a/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp b/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
index 0967e90e24c5f..86e62311f7771 100644
--- a/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
@@ -39,6 +39,7 @@
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/Analysis/CmpInstAnalysis.h"
+#include "llvm/Analysis/HashRecognize.h"
 #include "llvm/Analysis/LoopInfo.h"
 #include "llvm/Analysis/LoopPass.h"
 #include "llvm/Analysis/MemoryLocation.h"
@@ -144,6 +145,14 @@ static cl::opt<bool, true>
                      cl::location(DisableLIRP::Wcslen), cl::init(false),
                      cl::ReallyHidden);
 
+bool DisableLIRP::HashRecognize;
+static cl::opt<bool, true>
+    DisableLIRPHashRecognize("disable-" DEBUG_TYPE "-hashrecognize",
+                             cl::desc("Proceed with loop idiom recognize pass, "
+                                      "but do not optimize CRC loops."),
+                             cl::location(DisableLIRP::HashRecognize),
+                             cl::init(false), cl::ReallyHidden);
+
 static cl::opt<bool> UseLIRCodeSizeHeurs(
     "use-lir-code-size-heurs",
     cl::desc("Use loop idiom recognition code size heuristics when compiling "
@@ -238,6 +247,7 @@ class LoopIdiomRecognize {
                                   const SCEV *BECount);
   bool avoidLIRForMultiBlockLoop(bool IsMemset = false,
                                  bool IsLoopMemset = false);
+  bool optimizeCRCLoop(const PolynomialInfo &Info);
 
   /// @}
   /// \name Noncountable Loop Idiom Handling
@@ -283,6 +293,8 @@ PreservedAnalyses LoopIdiomRecognizePass::run(Loop &L, LoopAnalysisManager &AM,
   // but ORE cannot be preserved (see comment before the pass definition).
   OptimizationRemarkEmitter ORE(L.getHeader()->getParent());
 
+  std::optional<PolynomialInfo> HR;
+
   LoopIdiomRecognize LIR(&AR.AA, &AR.DT, &AR.LI, &AR.SE, &AR.TLI, &AR.TTI,
                          AR.MSSA, DL, ORE);
   if (!LIR.runOnLoop(&L))
@@ -326,7 +338,7 @@ bool LoopIdiomRecognize::runOnLoop(Loop *L) {
   HasMemsetPattern = TLI->has(LibFunc_memset_pattern16);
   HasMemcpy = TLI->has(LibFunc_memcpy);
 
-  if (HasMemset || HasMemsetPattern || HasMemcpy)
+  if (HasMemset || HasMemsetPattern || HasMemcpy || !DisableLIRP::HashRecognize)
     if (SE->hasLoopInvariantBackedgeTakenCount(L))
       return runOnCountableLoop();
 
@@ -369,6 +381,12 @@ bool LoopIdiomRecognize::runOnCountableLoop() {
 
     MadeChange |= runOnLoopBlock(BB, BECount, ExitBlocks);
   }
+
+  // Optimize a CRC loop if HashRecognize found one.
+  if (!DisableLIRP::HashRecognize)
+    if (auto Res = HashRecognize(*CurLoop, *SE).getResult())
+      optimizeCRCLoop(*Res);
+
   return MadeChange;
 }
 
@@ -1473,6 +1491,160 @@ bool LoopIdiomRecognize::avoidLIRForMultiBlockLoop(bool IsMemset,
   return false;
 }
 
+bool LoopIdiomRecognize::optimizeCRCLoop(const PolynomialInfo &Info) {
+  // FIXME: Hexagon has a special HexagonLoopIdiom that optimizes CRC using
+  // carry-less multiplication instructions, which is more efficient than our
+  // Sarwate table-lookup optimization. Hence, until we're able to emit
+  // target-specific instructions for Hexagon, subsuming HexagonLoopIdiom,
+  // disable the optimization for Hexagon.
+  Module &M = *CurLoop->getHeader()->getModule();
+  Triple TT(M.getTargetTriple());
+  if (TT.getArch() == Triple::hexagon)
+    return false;
+
+  // First, create a new GlobalVariable corresponding to the
+  // Sarwate-lookup-table.
+  Type *CRCTy = Info.LHS->getType();
+  unsigned CRCBW = CRCTy->getIntegerBitWidth();
+  std::array<Constant *, 256> CRCConstants;
+  transform(Info.GenSarwateTable(Info.RHS, Info.ByteOrderSwapped),
+            CRCConstants.begin(),
+            [CRCTy](const APInt &E) { return ConstantInt::get(CRCTy, E); });
+  Constant *ConstArray =
+      ConstantArray::get(ArrayType::get(CRCTy, 256), CRCConstants);
+  GlobalVariable *GV =
+      new GlobalVariable(M, ConstArray->getType(), true,
+                         GlobalValue::PrivateLinkage, ConstArray, ".crctable");
+
+  PHINode *IV = CurLoop->getCanonicalInductionVariable();
+  SmallVector<PHINode *, 2> Cleanup;
+
+  // Next, mark all PHIs for removal except IV.
+  {
+    for (PHINode &PN : CurLoop->getHeader()->phis()) {
+      if (&PN == IV)
+        continue;
+      PN.replaceAllUsesWith(PoisonValue::get(PN.getType()));
+      Cleanup.push_back(&PN);
+    }
+  }
+
+  // Next, fix up the trip count.
+  {
+    unsigned NewBTC = (Info.TripCount / 8) - 1;
+    BasicBlock *LoopBlk = CurLoop->getLoopLatch();
+    BranchInst *BrInst = cast<BranchInst>(LoopBlk->getTerminator());
+    CmpPredicate ExitPred = BrInst->getSuccessor(0) == LoopBlk
+                                ? ICmpInst::Predicate::ICMP_NE
+                                : ICmpInst::Predicate::ICMP_EQ;
+    Instruction *ExitCond = CurLoop->getLatchCmpInst();
+    Value *ExitLimit = ConstantInt::get(IV->getType(), NewBTC);
+    IRBuilder<> Builder(ExitCond);
+    Value *NewExitCond =
+        Builder.CreateICmp(ExitPred, IV, ExitLimit, "exit.cond");
+    ExitCond->replaceAllUsesWith(NewExitCond);
+    deleteDeadInstruction(ExitCond);
+  }
+
+  // Finally, fill the loop with the Sarwate-table-lookup logic, and replace all
+  // uses of ComputedValue.
+  //
+  // Little-endian:
+  //   crc = (crc >> 8) ^ tbl[(iv'th byte of data) ^ (bottom byte of crc)]
+  // Big-Endian:
+  //   crc = (crc << 8) ^ tbl[(iv'th byte of data) ^ (top byte of crc)]
+  {
+    auto GetLoByte = [](IRBuilderBase &Builder, Value *Op, const Twine &Name) {
+      Type *OpTy = Op->getType();
+      unsigned OpBW = OpTy->getIntegerBitWidth();
+      return OpBW > 8
+                 ? Builder.CreateAnd(Op, ConstantInt::get(OpTy, 0XFF), Name)
+                 : Op;
+    };
+    auto GetHiByte = [](IRBuilderBase &Builder, Value *Op, const Twine &Name) {
+      Type *OpTy = Op->getType();
+      unsigned OpBW = OpTy->getIntegerBitWidth();
+      return OpBW > 8 ? Builder.CreateLShr(Op, ConstantInt::get(OpTy, OpBW - 8),
+                                           Name)
+                      : Op;
+    };
+
+    IRBuilder<> Builder(CurLoop->getHeader(),
+                        CurLoop->getHeader()->getFirstNonPHIIt());
+
+    // Create the CRC PHI, and initialize its incoming value to the initial
+    // value of CRC.
+    PHINode *CRCPhi = Builder.CreatePHI(CRCTy, 2, "crc");
+    CRCPhi->addIncoming(Info.LHS, CurLoop->getLoopPreheader());
+
+    // CRC is now an evolving variable, initialized to the PHI.
+    Value *CRC = CRCPhi;
+
+    // TableIndexer = ((top|bottom) byte of CRC). It is XOR'ed with (iv'th byte
+    // of LHSAux), if LHSAux is non-nullptr.
+    Value *Indexer = Info.ByteOrderSwapped
+                         ? GetHiByte(Builder, CRC, "crc.indexer.hi")
+                         : GetLoByte(Builder, CRC, "crc.indexer.lo");
+
+    if (Value *Data = Info.LHSAux) {
+      Type *DataTy = Data->getType();
+
+      // To index into the (iv'th byte of LHSAux), we multiply iv by 8, and we
+      // shift right by that amount, and take the lo-byte (in the little-endian
+      // case), or shift left by that amount, and take the hi-byte (in the
+      // big-endian case).
+      Value *IVBits = Builder.CreateZExtOrTrunc(
+          Builder.CreateShl(IV, 3, "iv.bits"), DataTy, "iv.indexer");
+      Value *DataIndexer =
+          Info.ByteOrderSwapped
+              ? GetHiByte(Builder,
+                          Builder.CreateShl(Data, IVBits, "data.indexer"),
+                          "data.indexer.hi")
+              : GetLoByte(Builder,
+                          Builder.CreateLShr(Data, IVBits, "data.indexer"),
+                          "data.indexer.lo");
+      Indexer = Builder.CreateXor(
+          DataIndexer,
+          Builder.CreateZExtOrTrunc(Indexer, DataTy, "crc.indexer.cast"),
+          "crc.data.indexer");
+    }
+
+    if (Indexer->getType()->getIntegerBitWidth() == 8)
+      // If the indexer is of i8 type, the GEP could be interpreted with a
+      // negative offset.
+      Indexer = Builder.CreateZExt(Indexer, Type::getInt64Ty(SE->getContext()),
+                                   "indexer.ext");
+
+    // CRCTableLd = CRCTable[(iv'th byte of data) ^ (top|bottom) byte of CRC].
+    Value *CRCTableGEP =
+        Builder.CreateInBoundsGEP(CRCTy, GV, Indexer, "tbl.ptradd");
+    Value *CRCTableLd = Builder.CreateLoad(CRCTy, CRCTableGEP, "tbl.ld");
+
+    // CRCNext = (CRC (<<|>>) 8) ^ CRCTableLd, or simply CRCTableLd in case of
+    // CRC-8.
+    Value *CRCNext = CRCTableLd;
+    if (CRCBW > 8) {
+      Value *CRCShift = Info.ByteOrderSwapped
+                            ? Builder.CreateShl(CRC, 8, "crc.be.shift")
+                            : Builder.CreateLShr(CRC, 8, "crc.le.shift");
+      CRCNext = Builder.CreateXor(CRCShift, CRCTableLd, "crc.next");
+    }
+
+    // Connect the back-edge for the loop, and RAUW the ComputedValue.
+    CRCPhi->addIncoming(CRCNext, CurLoop->getLoopLatch());
+    Info.ComputedValue->replaceUsesOutsideBlock(CRCNext,
+                                                CurLoop->getLoopLatch());
+  }
+
+  // Cleanup.
+  {
+    for (PHINode *PN : Cleanup)
+      RecursivelyDeleteDeadPHINode(PN);
+    SE->forgetLoop(CurLoop);
+  }
+  return true;
+}
+
 bool LoopIdiomRecognize::runOnNoncountableLoop() {
   LLVM_DEBUG(dbgs() << DEBUG_TYPE " Scanning: F["
                     << CurLoop->getHeader()->getParent()->getName()
diff --git a/llvm/test/Transforms/LoopIdiom/cyclic-redundancy-check.ll b/llvm/test/Transforms/LoopIdiom/cyclic-redundancy-check.ll
new file mode 100644
index 0000000000000..266b78a5ab03e
--- /dev/null
+++ b/llvm/test/Transforms/LoopIdiom/cyclic-redundancy-check.ll
@@ -0,0 +1,390 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals all --version 5
+; RUN: opt -passes=loop-idiom -S %s| FileCheck %s
+
+;.
+; CHECK: @.crctable = private constant [256 x i16] [i16 0, i16 -16191, i16 -15999, i16 320, i16 -15615, i16 960, i16 640, i16 -15807, i16 -14847, i16 1728, i16 1920, i16 -14527, i16 1280, i16 -14911, i16 -15231, i16 1088, i16 -13311, i16 3264, i16 3456, i16 -12991, i16 3840, i16 -12351, i16 -12671, i16 3648, i16 2560, i16 -13631, i16 -13439, i16 2880, i16 -14079, i16 2496, i16 2176, i16 -14271, i16 -10239, i16 6336, i16 6528, i16 -9919, i16 6912, i16 -9279, i16 -9599, i16 6720, i16 7680, i16 -8511, i16 -8319, i16 8000, i16 -8959, i16 7616, i16 7296, i16 -9151, i16 5120, i16 -11071, i16 -10879, i16 5440, i16 -10495, i16 6080, i16 5760, i16 -10687, i16 -11775, i16 4800, i16 4992, i16 -11455, i16 4352, i16 -11839, i16 -12159, i16 4160, i16 -4095, i16 12480, i16 12672, i16 -3775, i16 13056, i16 -3135, i16 -3455, i16 12864, i16 13824, i16 -2367, i16 -2175, i16 14144, i16 -2815, i16 13760, i16 13440, i16 -3007, i16 15360, i16 -831, i16 -639, i16 15680, i16 -255, i16 16320, i16 16000, i16 -447, i16 -1535, i16 15040, i16 15232, i16 -1215, i16 14592, i16 -1599, i16 -1919, i16 14400, i16 10240, i16 -5951, i16 -5759, i16 10560, i16 -5375, i16 11200, i16 10880, i16 -5567, i16 -4607, i16 11968, i16 12160, i16 -4287, i16 11520, i16 -4671, i16 -4991, i16 11328, i16 -7167, i16 9408, i16 9600, i16 -6847, i16 9984, i16 -6207, i16 -6527, i16 9792, i16 8704, i16 -7487, i16 -7295, i16 9024, i16 -7935, i16 8640, i16 8320, i16 -8127, i16 -24575, i16 24768, i16 24960, i16 -24255, i16 25344, i16 -23615, i16 -23935, i16 25152, i16 26112, i16 -22847, i16 -22655, i16 26432, i16 -23295, i16 26048, i16 25728, i16 -23487, i16 27648, i16 -21311, i16 -21119, i16 27968, i16 -20735, i16 28608, i16 28288, i16 -20927, i16 -22015, i16 27328, i16 27520, i16 -21695, i16 26880, i16 -22079, i16 -22399, i16 26688, i16 30720, i16 -18239, i16 -18047, i16 31040, i16 -17663, i16 31680, i16 31360, i16 -17855, i16 -16895, i16 32448, i16 32640, i16 -16575, i16 32000, i16 -16959, i16 -17279, i16 31808, i16 -19455, i16 29888, i16 30080, i16 -19135, i16 30464, i16 -18495, i16 -18815, i16 30272, i16 29184, i16 -19775, i16 -19583, i16 29504, i16 -20223, i16 29120, i16 28800, i16 -20415, i16 20480, i16 -28479, i16 -28287, i16 20800, i16 -27903, i16 21440, i16 21120, i16 -28095, i16 -27135, i16 22208, i16 22400, i16 -26815, i16 21760, i16 -27199, i16 -27519, i16 21568, i16 -25599, i16 23744, i16 23936, i16 -25279, i16 24320, i16 -24639, i16 -24959, i16 24128, i16 23040, i16 -25919, i16 -25727, i16 23360, i16 -26367, i16 22976, i16 22656, i16 -26559, i16 -30719, i16 18624, i16 18816, i16 -30399, i16 19200, i16 -29759, i16 -30079, i16 19008, i16 19968, i16 -28991, i16 -28799, i16 20288, i16 -29439, i16 19904, i16 19584, i16 -29631, i16 17408, i16 -31551, i16 -31359, i16 17728, i16 -30975, i16 18368, i16 18048, i16 -31167, i16 -32255, i16 17088, i16 17280, i16 -31935, i16 16640, i16 -32319, i16 -32639, i16 16448]
+; CHECK: @.crctable.1 = private constant [256 x i16] [i16 0, i16 -16191, i16 -15999, i16 320, i16 -15615, i16 960, i16 640, i16 -15807, i16 -14847, i16 1728, i16 1920, i16 -14527, i16 1280, i16 -14911, i16 -15231, i16 1088, i16 -13311, i16 3264, i16 3456, i16 -12991, i16 3840, i16 -12351, i16 -12671, i16 3648, i16 2560, i16 -13631, i16 -13439, i16 2880, i16 -14079, i16 2496, i16 2176, i16 -14271, i16 -10239, i16 6336, i16 6528, i16 -9919, i16 6912, i16 -9279, i16 -9599, i16 6720, i16 7680, i16 -8511, i16 -8319, i16 8000, i16 -8959, i16 7616, i16 7296, i16 -9151, i16 5120, i16 -11071, i16 -10879, i16 5440, i16 -10495, i16 6080, i16 5760, i16 -10687, i16 -11775, i16 4800, i16 4992, i16 -11455, i16 4352, i16 -11839, i16 -12159, i16 4160, i16 -4095, i16 12480, i16 12672, i16 -3775, i16 13056, i16 -313...
[truncated]

@artagnon
Copy link
Contributor Author

We now test type-mismatches in llvm-test-suite! llvm/llvm-test-suite#261. I found a couple of weaknesses in the analysis, and some errors in the transform: I have now verified that, with #144878 and #144881, the newly-added tests in llvm-test-suite are optimized correctly. I'm feeling more confident about turning this optimization on by default: kindly let me know if you have other testing-related suggestions, so we can increase our confidence further.

[CRCTy](const APInt &E) { return ConstantInt::get(CRCTy, E); });
Constant *ConstArray =
ConstantArray::get(ArrayType::get(CRCTy, 256), CRCConstants);
GlobalVariable *GV =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we created two identical LUTs in two translation units, does ICF work during linking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect Identical COMDAT folding to work on any two identical global variables, and I'm not sure what that has to do with the patch? Does it depend on some parameter of the GlobalVariable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "identical" ? Constant arrays cannot be merged if the program compares array addresses. So I'd expect that's an option. Also, same-translation-unit case would be nice to test, in case someone implements CRC as a C macro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'm not sure how we'd write a test for this: possibly an LLD test?

@artagnon artagnon force-pushed the perf/lir-hashrecognize branch from 85467c5 to 66dbf1c Compare July 6, 2025 18:40
Copy link
Contributor

@pfusik pfusik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the input loop did other things aside from calculating the CRC (such as calling extern functions)? Will we abandon this transform or preserve the non-CRC operations? Do we have a test for that?

I think we want to skip this transform if optimizing for size.

@artagnon artagnon force-pushed the perf/lir-hashrecognize branch from 66dbf1c to 70d2960 Compare July 9, 2025 14:49
@artagnon
Copy link
Contributor Author

artagnon commented Jul 9, 2025

What if the input loop did other things aside from calculating the CRC (such as calling extern functions)? Will we abandon this transform or preserve the non-CRC operations?

We will abandon the transform, since HashRecognize will always abort on stray operations. Added a test.

@artagnon artagnon force-pushed the perf/lir-hashrecognize branch from 70d2960 to 1339e55 Compare July 9, 2025 15:27
@artagnon
Copy link
Contributor Author

artagnon commented Jul 9, 2025

What if the input loop did other things aside from calculating the CRC (such as calling extern functions)? Will we abandon this transform or preserve the non-CRC operations?

We will abandon the transform, since HashRecognize will always abort on stray operations. Added a test.

Actually, if the call isn't in the use-def chain (say call void @foo()), it will still remain in the transformed loop as ValueEvolution doesn't see it. However, if the call is side-effecting, the transform would be unsound, as the number of loop iterations have changed: this is probably a bug in HashRecognize, and I'll look into this.

Copy link
Contributor

@pfusik pfusik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if the call isn't in the use-def chain (say call void @foo()), it will still remain in the transformed loop as ValueEvolution doesn't see it. However, if the call is side-effecting, the transform would be unsound, as the number of loop iterations have changed: this is probably a bug in HashRecognize, and I'll look into this.

We need that potential miscompile fixed.

Copy link
Contributor

@pfusik pfusik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need that potential miscompile fixed.

Now that #148620 is merged, I'm ok with this PR.

@artagnon artagnon force-pushed the perf/lir-hashrecognize branch from 1339e55 to 6ca643c Compare September 5, 2025 10:04
@artagnon
Copy link
Contributor Author

artagnon commented Sep 5, 2025

Running some final checks.

cl::desc("Proceed with loop idiom recognize pass, "
"but do not optimize CRC loops."),
cl::location(DisableLIRP::HashRecognize),
cl::init(false), cl::ReallyHidden);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, everything seems to be in order: will merge. Quick note for reviewers in case regressions are found: prefer flipping this cl::init to true while the regression is being investigated, over reverting the patch.

@artagnon artagnon merged commit c5da190 into llvm:main Sep 5, 2025
9 checks passed
@artagnon artagnon deleted the perf/lir-hashrecognize branch September 5, 2025 11:19
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 5, 2025

LLVM Buildbot has detected a new failure on builder lldb-aarch64-windows running on linaro-armv8-windows-msvc-05 while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/141/builds/11316

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: tools/lldb-dap/step/TestDAP_step.py (1240 of 2301)
PASS: lldb-api :: tools/lldb-dap/stop-hooks/TestDAP_stop_hooks.py (1241 of 2301)
UNSUPPORTED: lldb-api :: tools/lldb-dap/terminated-event/TestDAP_terminatedEvent.py (1242 of 2301)
PASS: lldb-api :: tools/lldb-dap/threads/TestDAP_threads.py (1243 of 2301)
UNSUPPORTED: lldb-api :: tools/lldb-dap/variables/TestDAP_variables.py (1244 of 2301)
PASS: lldb-api :: tools/lldb-dap/variables/children/TestDAP_variables_children.py (1245 of 2301)
UNSUPPORTED: lldb-api :: tools/lldb-server/TestAppleSimulatorOSType.py (1246 of 2301)
PASS: lldb-api :: tools/lldb-server/TestGdbRemoteAttach.py (1247 of 2301)
UNSUPPORTED: lldb-api :: tools/lldb-server/TestGdbRemoteAuxvSupport.py (1248 of 2301)
UNRESOLVED: lldb-api :: tools/lldb-dap/stepInTargets/TestDAP_stepInTargets.py (1249 of 2301)
******************** TEST 'lldb-api :: tools/lldb-dap/stepInTargets/TestDAP_stepInTargets.py' FAILED ********************
Script:
--
C:/Users/tcwg/scoop/apps/python/current/python.exe C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/llvm-project/lldb\test\API\dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/./lib --env LLVM_INCLUDE_DIR=C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/include --env LLVM_TOOLS_DIR=C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/./bin --arch aarch64 --build-dir C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/lldb-test-build.noindex --lldb-module-cache-dir C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/lldb-test-build.noindex/module-cache-lldb\lldb-api --clang-module-cache-dir C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/lldb-test-build.noindex/module-cache-clang\lldb-api --executable C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/./bin/lldb.exe --compiler C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/./bin/clang.exe --dsymutil C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/./bin/dsymutil.exe --make C:/Users/tcwg/scoop/shims/make.exe --llvm-tools-dir C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/./bin --lldb-obj-root C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/tools/lldb --lldb-libs-dir C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/./lib --cmake-build-type Release --skip-category=watchpoint C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\API\tools\lldb-dap\stepInTargets -p TestDAP_stepInTargets.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 22.0.0git (https://github.com/llvm/llvm-project.git revision c5da190b6e762057fd121d936fff0859eb633f61)
  clang revision c5da190b6e762057fd121d936fff0859eb633f61
  llvm revision c5da190b6e762057fd121d936fff0859eb633f61
Skipping the following test categories: ['watchpoint', 'libc++', 'libstdcxx', 'dwo', 'dsym', 'gmodules', 'debugserver', 'objc', 'fork', 'pexpect']


--
Command Output (stderr):
--
UNSUPPORTED: LLDB (C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\build\bin\clang.exe-aarch64) :: test_basic (TestDAP_stepInTargets.TestDAP_stepInTargets.test_basic) (skipping due to the following parameter(s): architecture) 

========= DEBUG ADAPTER PROTOCOL LOGS =========

1757075743.373081684 (stdio) --> {"command":"initialize","type":"request","arguments":{"adapterID":"lldb-native","clientID":"vscode","columnsStartAt1":true,"linesStartAt1":true,"locale":"en-us","pathFormat":"path","supportsRunInTerminalRequest":true,"supportsVariablePaging":true,"supportsVariableType":true,"supportsStartDebuggingRequest":true,"supportsProgressReporting":true,"$__lldb_sourceInitFile":false},"seq":1}

1757075743.373259544 (stdio) queued (command=initialize seq=1)

1757075743.382948875 (stdio) <-- {"body":{"$__lldb_version":"lldb version 22.0.0git (https://github.com/llvm/llvm-project.git revision c5da190b6e762057fd121d936fff0859eb633f61)\n  clang revision c5da190b6e762057fd121d936fff0859eb633f61\n  llvm revision c5da190b6e762057fd121d936fff0859eb633f61","completionTriggerCharacters":["."," ","\t"],"exceptionBreakpointFilters":[{"description":"C++ Catch","filter":"cpp_catch","label":"C++ Catch","supportsCondition":true},{"description":"C++ Throw","filter":"cpp_throw","label":"C++ Throw","supportsCondition":true},{"description":"Objective-C Catch","filter":"objc_catch","label":"Objective-C Catch","supportsCondition":true},{"description":"Objective-C Throw","filter":"objc_throw","label":"Objective-C Throw","supportsCondition":true}],"supportTerminateDebuggee":true,"supportsBreakpointLocationsRequest":true,"supportsCancelRequest":true,"supportsCompletionsRequest":true,"supportsConditionalBreakpoints":true,"supportsConfigurationDoneRequest":true,"supportsDataBreakpoints":true,"supportsDelayedStackTraceLoading":true,"supportsDisassembleRequest":true,"supportsEvaluateForHovers":true,"supportsExceptionFilterOptions":true,"supportsExceptionInfoRequest":true,"supportsFunctionBreakpoints":true,"supportsHitConditionalBreakpoints":true,"supportsInstructionBreakpoints":true,"supportsLogPoints":true,"supportsModuleSymbolsRequest":true,"supportsModulesRequest":true,"supportsReadMemoryRequest":true,"supportsSetVariable":true,"supportsSteppingGranularity":true,"supportsValueFormattingOptions":true,"supportsWriteMemoryRequest":true},"command":"initialize","request_seq":1,"seq":0,"success":true,"type":"response"}

1757075743.383727312 (stdio) --> {"command":"launch","type":"request","arguments":{"program":"C:\\Users\\tcwg\\llvm-worker\\lldb-aarch64-windows\\build\\lldb-test-build.noindex\\tools\\lldb-dap\\stepInTargets\\TestDAP_stepInTargets.test_supported_capability_other_archs\\a.out","initCommands":["settings clear --all","settings set symbols.enable-external-lookup false","settings set target.inherit-tcc true","settings set target.disable-aslr false","settings set target.detach-on-error false","settings set target.auto-apply-fixits false","settings set plugin.process.gdb-remote.packet-timeout 60","settings set symbols.clang-modules-cache-path \"C:/Users/tcwg/llvm-worker/lldb-aarch64-windows/build/lldb-test-build.noindex/module-cache-lldb\\lldb-api\"","settings set use-color false","settings set show-statusline false","settings set target.env-vars PATH="],"disableASLR":false,"enableAutoVariableSummaries":false,"enableSyntheticChildDebugging":false,"displayExtendedBacktrace":false},"seq":2}

1757075743.383789301 (stdio) queued (command=launch seq=2)

1757075743.384266376 (stdio) <-- {"body":{"category":"console","output":"Running initCommands:\n"},"event":"output","seq":0,"type":"event"}

1757075743.384316444 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings clear --all\n"},"event":"output","seq":0,"type":"event"}

1757075743.384345770 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings set symbols.enable-external-lookup false\n"},"event":"output","seq":0,"type":"event"}

1757075743.384373188 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings set target.inherit-tcc true\n"},"event":"output","seq":0,"type":"event"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:ir llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants