-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[LoopIdiom] Use HashRecognize to optimize CRC #143208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Results from LLVM compile time tracker look okay, although there is a crash in ThinLTO. The results from llvm-opt-benchmark indicate that there are quite a few opportunities successfully exploited by HashRecognize, although the results look a bit strange. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round. :)
Thanks for the review, and sorry about the slow response: I'm currently traveling, but should get back to work next week. I will clarify that I put up this draft mainly for results from LLVM Compile Time Tracker/llvm-opt-benchmark: it currently suffers from correctness issues. |
6d1c6c9
to
ea6c264
Compare
Okay, the correctness issues should be fixed now: I've checked the output on different examples I wrote by hand. The diff from llvm-opt-benchmark also looks sound. LLVM Compile Time tracker results look clean. Within SingleSource/MultiSource, LLVM Test Suite has a single example of CRC which was optimized correctly: GCC-C-execute-pr84524. Since the pass is enabled by default in this PR, some pass-pipeline printing tests failed. Also, there's a CRC optimizer in Hexagon that runs later in the pipeline, and should probably be stripped: the test is written with O2, and hence fails. I'm not sure we want to enable it by default in this PR, which is why I didn't update the tests. My biggest question is how we can rigorously test the analysis/transform: the pattern is too specific, and I think the best we can do is to check in a bunch of hand-written CRC examples into llvm-test-suite. |
ea6c264
to
fa60b28
Compare
In preparation to optimize cyclic-redundancy-check loops using table-lookup using the newly introduced HashRecognize analysis, add tests with CRC loops (which will get optimized), and check the result against a Sarwate table-lookup (which will not get optimized). The companion change is llvm/llvm-project#143208.
In preparation to optimize cyclic-redundancy-check loops using table-lookup using the newly introduced HashRecognize analysis, add tests with CRC loops (which will get optimized), and check the result against a Sarwate table-lookup (which will not get optimized). The companion change is llvm/llvm-project#143208.
be15c64
to
4b1419f
Compare
Okay, all the outstanding issues have been resolved. Pending the merge of #144742, and rebase, this PR is ready to review. There could be two sources of miscompiles:
|
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-llvm-analysis Author: Ramkumar Ramachandra (artagnon) ChangesOptimize CRC loops using a Sarwate table-lookup by using the results of HashRecognize in LoopIdiomRecognize. The optimization is checked for correctness using the SingleSource/UnitTests/HashRecognize tests in llvm-test-suite. -- 8< -- Patch is 54.72 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143208.diff 6 Files Affected:
diff --git a/llvm/include/llvm/Analysis/HashRecognize.h b/llvm/include/llvm/Analysis/HashRecognize.h
index c169383bf7b08..a00f27ac22b1d 100644
--- a/llvm/include/llvm/Analysis/HashRecognize.h
+++ b/llvm/include/llvm/Analysis/HashRecognize.h
@@ -66,6 +66,10 @@ struct PolynomialInfo {
// Set to true in the case of big-endian.
bool ByteOrderSwapped;
+ // The generated Sarwate lookup-table, which can be used to optimize CRC in
+ // the absence of target-specific instructions.
+ function_ref<CRCTable(const APInt &, bool)> GenSarwateTable;
+
// An optional auxiliary checksum that augments the LHS. In the case of CRC,
// it is XOR'ed with the LHS, so that the computation's final remainder is
// zero.
@@ -73,6 +77,7 @@ struct PolynomialInfo {
PolynomialInfo(unsigned TripCount, Value *LHS, const APInt &RHS,
Value *ComputedValue, bool ByteOrderSwapped,
+ function_ref<CRCTable(const APInt &, bool)> GenSarwateTable,
Value *LHSAux = nullptr);
};
@@ -84,12 +89,9 @@ class HashRecognize {
public:
HashRecognize(const Loop &L, ScalarEvolution &SE);
- // The main analysis entry point.
+ // The main analysis entry points.
std::variant<PolynomialInfo, ErrBits, StringRef> recognizeCRC() const;
-
- // Auxilary entry point after analysis to interleave the generating polynomial
- // and return a 256-entry CRC table.
- CRCTable genSarwateTable(const APInt &GenPoly, bool ByteOrderSwapped) const;
+ std::optional<PolynomialInfo> getResult() const;
void print(raw_ostream &OS) const;
@@ -107,15 +109,6 @@ class HashRecognizePrinterPass
PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM,
LoopStandardAnalysisResults &AR, LPMUpdater &);
};
-
-class HashRecognizeAnalysis : public AnalysisInfoMixin<HashRecognizeAnalysis> {
- friend AnalysisInfoMixin<HashRecognizeAnalysis>;
- static AnalysisKey Key;
-
-public:
- using Result = HashRecognize;
- Result run(Loop &L, LoopAnalysisManager &AM, LoopStandardAnalysisResults &AR);
-};
} // namespace llvm
#endif
diff --git a/llvm/include/llvm/Transforms/Scalar/LoopIdiomRecognize.h b/llvm/include/llvm/Transforms/Scalar/LoopIdiomRecognize.h
index 241a3fc109360..109b4520878cb 100644
--- a/llvm/include/llvm/Transforms/Scalar/LoopIdiomRecognize.h
+++ b/llvm/include/llvm/Transforms/Scalar/LoopIdiomRecognize.h
@@ -40,6 +40,9 @@ struct DisableLIRP {
/// When true, Wcslen is disabled.
static bool Wcslen;
+
+ /// When true, HashRecognize is disabled.
+ static bool HashRecognize;
};
/// Performs Loop Idiom Recognize Pass.
diff --git a/llvm/lib/Analysis/HashRecognize.cpp b/llvm/lib/Analysis/HashRecognize.cpp
index d11602f921872..850b6be412d16 100644
--- a/llvm/lib/Analysis/HashRecognize.cpp
+++ b/llvm/lib/Analysis/HashRecognize.cpp
@@ -442,11 +442,13 @@ getRecurrences(BasicBlock *LoopLatch, const PHINode *IndVar, const Loop &L) {
return std::make_pair(SimpleRecurrence, ConditionalRecurrence);
}
-PolynomialInfo::PolynomialInfo(unsigned TripCount, Value *LHS, const APInt &RHS,
- Value *ComputedValue, bool ByteOrderSwapped,
- Value *LHSAux)
+PolynomialInfo::PolynomialInfo(
+ unsigned TripCount, Value *LHS, const APInt &RHS, Value *ComputedValue,
+ bool ByteOrderSwapped,
+ function_ref<CRCTable(const APInt &, bool)> GenSarwateTable, Value *LHSAux)
: TripCount(TripCount), LHS(LHS), RHS(RHS), ComputedValue(ComputedValue),
- ByteOrderSwapped(ByteOrderSwapped), LHSAux(LHSAux) {}
+ ByteOrderSwapped(ByteOrderSwapped), GenSarwateTable(GenSarwateTable),
+ LHSAux(LHSAux) {}
/// In the big-endian case, checks the bottom N bits against CheckFn, and that
/// the rest are unknown. In the little-endian case, checks the top N bits
@@ -471,8 +473,7 @@ static bool checkExtractBits(const KnownBits &Known, unsigned N,
/// Generate a lookup table of 256 entries by interleaving the generating
/// polynomial. The optimization technique of table-lookup for CRC is also
/// called the Sarwate algorithm.
-CRCTable HashRecognize::genSarwateTable(const APInt &GenPoly,
- bool ByteOrderSwapped) const {
+static CRCTable genSarwateTable(const APInt &GenPoly, bool ByteOrderSwapped) {
unsigned BW = GenPoly.getBitWidth();
CRCTable Table;
Table[0] = APInt::getZero(BW);
@@ -625,7 +626,7 @@ HashRecognize::recognizeCRC() const {
Value *LHSAux = SimpleRecurrence ? SimpleRecurrence.Start : nullptr;
return PolynomialInfo(TC, ConditionalRecurrence.Start, GenPoly, ComputedValue,
- *ByteOrderSwapped, LHSAux);
+ *ByteOrderSwapped, genSarwateTable, LHSAux);
}
void CRCTable::print(raw_ostream &OS) const {
@@ -679,13 +680,20 @@ void HashRecognize::print(raw_ostream &OS) const {
OS << "\n";
}
OS.indent(2) << "Computed CRC lookup table:\n";
- genSarwateTable(Info.RHS, Info.ByteOrderSwapped).print(OS);
+ Info.GenSarwateTable(Info.RHS, Info.ByteOrderSwapped).print(OS);
}
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
void HashRecognize::dump() const { print(dbgs()); }
#endif
+std::optional<PolynomialInfo> HashRecognize::getResult() const {
+ auto Res = HashRecognize(L, SE).recognizeCRC();
+ if (std::holds_alternative<PolynomialInfo>(Res))
+ return std::get<PolynomialInfo>(Res);
+ return std::nullopt;
+}
+
HashRecognize::HashRecognize(const Loop &L, ScalarEvolution &SE)
: L(L), SE(SE) {}
@@ -693,13 +701,6 @@ PreservedAnalyses HashRecognizePrinterPass::run(Loop &L,
LoopAnalysisManager &AM,
LoopStandardAnalysisResults &AR,
LPMUpdater &) {
- AM.getResult<HashRecognizeAnalysis>(L, AR).print(OS);
+ HashRecognize(L, AR.SE).print(OS);
return PreservedAnalyses::all();
}
-
-HashRecognize HashRecognizeAnalysis::run(Loop &L, LoopAnalysisManager &AM,
- LoopStandardAnalysisResults &AR) {
- return {L, AR.SE};
-}
-
-AnalysisKey HashRecognizeAnalysis::Key;
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index f761d0dab09a8..ec14c6a9211d9 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -672,7 +672,6 @@ LOOPNEST_PASS("no-op-loopnest", NoOpLoopNestPass())
#define LOOP_ANALYSIS(NAME, CREATE_PASS)
#endif
LOOP_ANALYSIS("ddg", DDGAnalysis())
-LOOP_ANALYSIS("hash-recognize", HashRecognizeAnalysis())
LOOP_ANALYSIS("iv-users", IVUsersAnalysis())
LOOP_ANALYSIS("no-op-loop", NoOpLoopAnalysis())
LOOP_ANALYSIS("pass-instrumentation", PassInstrumentationAnalysis(PIC))
diff --git a/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp b/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
index 0967e90e24c5f..86e62311f7771 100644
--- a/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
@@ -39,6 +39,7 @@
#include "llvm/ADT/StringRef.h"
#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/CmpInstAnalysis.h"
+#include "llvm/Analysis/HashRecognize.h"
#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/LoopPass.h"
#include "llvm/Analysis/MemoryLocation.h"
@@ -144,6 +145,14 @@ static cl::opt<bool, true>
cl::location(DisableLIRP::Wcslen), cl::init(false),
cl::ReallyHidden);
+bool DisableLIRP::HashRecognize;
+static cl::opt<bool, true>
+ DisableLIRPHashRecognize("disable-" DEBUG_TYPE "-hashrecognize",
+ cl::desc("Proceed with loop idiom recognize pass, "
+ "but do not optimize CRC loops."),
+ cl::location(DisableLIRP::HashRecognize),
+ cl::init(false), cl::ReallyHidden);
+
static cl::opt<bool> UseLIRCodeSizeHeurs(
"use-lir-code-size-heurs",
cl::desc("Use loop idiom recognition code size heuristics when compiling "
@@ -238,6 +247,7 @@ class LoopIdiomRecognize {
const SCEV *BECount);
bool avoidLIRForMultiBlockLoop(bool IsMemset = false,
bool IsLoopMemset = false);
+ bool optimizeCRCLoop(const PolynomialInfo &Info);
/// @}
/// \name Noncountable Loop Idiom Handling
@@ -283,6 +293,8 @@ PreservedAnalyses LoopIdiomRecognizePass::run(Loop &L, LoopAnalysisManager &AM,
// but ORE cannot be preserved (see comment before the pass definition).
OptimizationRemarkEmitter ORE(L.getHeader()->getParent());
+ std::optional<PolynomialInfo> HR;
+
LoopIdiomRecognize LIR(&AR.AA, &AR.DT, &AR.LI, &AR.SE, &AR.TLI, &AR.TTI,
AR.MSSA, DL, ORE);
if (!LIR.runOnLoop(&L))
@@ -326,7 +338,7 @@ bool LoopIdiomRecognize::runOnLoop(Loop *L) {
HasMemsetPattern = TLI->has(LibFunc_memset_pattern16);
HasMemcpy = TLI->has(LibFunc_memcpy);
- if (HasMemset || HasMemsetPattern || HasMemcpy)
+ if (HasMemset || HasMemsetPattern || HasMemcpy || !DisableLIRP::HashRecognize)
if (SE->hasLoopInvariantBackedgeTakenCount(L))
return runOnCountableLoop();
@@ -369,6 +381,12 @@ bool LoopIdiomRecognize::runOnCountableLoop() {
MadeChange |= runOnLoopBlock(BB, BECount, ExitBlocks);
}
+
+ // Optimize a CRC loop if HashRecognize found one.
+ if (!DisableLIRP::HashRecognize)
+ if (auto Res = HashRecognize(*CurLoop, *SE).getResult())
+ optimizeCRCLoop(*Res);
+
return MadeChange;
}
@@ -1473,6 +1491,160 @@ bool LoopIdiomRecognize::avoidLIRForMultiBlockLoop(bool IsMemset,
return false;
}
+bool LoopIdiomRecognize::optimizeCRCLoop(const PolynomialInfo &Info) {
+ // FIXME: Hexagon has a special HexagonLoopIdiom that optimizes CRC using
+ // carry-less multiplication instructions, which is more efficient than our
+ // Sarwate table-lookup optimization. Hence, until we're able to emit
+ // target-specific instructions for Hexagon, subsuming HexagonLoopIdiom,
+ // disable the optimization for Hexagon.
+ Module &M = *CurLoop->getHeader()->getModule();
+ Triple TT(M.getTargetTriple());
+ if (TT.getArch() == Triple::hexagon)
+ return false;
+
+ // First, create a new GlobalVariable corresponding to the
+ // Sarwate-lookup-table.
+ Type *CRCTy = Info.LHS->getType();
+ unsigned CRCBW = CRCTy->getIntegerBitWidth();
+ std::array<Constant *, 256> CRCConstants;
+ transform(Info.GenSarwateTable(Info.RHS, Info.ByteOrderSwapped),
+ CRCConstants.begin(),
+ [CRCTy](const APInt &E) { return ConstantInt::get(CRCTy, E); });
+ Constant *ConstArray =
+ ConstantArray::get(ArrayType::get(CRCTy, 256), CRCConstants);
+ GlobalVariable *GV =
+ new GlobalVariable(M, ConstArray->getType(), true,
+ GlobalValue::PrivateLinkage, ConstArray, ".crctable");
+
+ PHINode *IV = CurLoop->getCanonicalInductionVariable();
+ SmallVector<PHINode *, 2> Cleanup;
+
+ // Next, mark all PHIs for removal except IV.
+ {
+ for (PHINode &PN : CurLoop->getHeader()->phis()) {
+ if (&PN == IV)
+ continue;
+ PN.replaceAllUsesWith(PoisonValue::get(PN.getType()));
+ Cleanup.push_back(&PN);
+ }
+ }
+
+ // Next, fix up the trip count.
+ {
+ unsigned NewBTC = (Info.TripCount / 8) - 1;
+ BasicBlock *LoopBlk = CurLoop->getLoopLatch();
+ BranchInst *BrInst = cast<BranchInst>(LoopBlk->getTerminator());
+ CmpPredicate ExitPred = BrInst->getSuccessor(0) == LoopBlk
+ ? ICmpInst::Predicate::ICMP_NE
+ : ICmpInst::Predicate::ICMP_EQ;
+ Instruction *ExitCond = CurLoop->getLatchCmpInst();
+ Value *ExitLimit = ConstantInt::get(IV->getType(), NewBTC);
+ IRBuilder<> Builder(ExitCond);
+ Value *NewExitCond =
+ Builder.CreateICmp(ExitPred, IV, ExitLimit, "exit.cond");
+ ExitCond->replaceAllUsesWith(NewExitCond);
+ deleteDeadInstruction(ExitCond);
+ }
+
+ // Finally, fill the loop with the Sarwate-table-lookup logic, and replace all
+ // uses of ComputedValue.
+ //
+ // Little-endian:
+ // crc = (crc >> 8) ^ tbl[(iv'th byte of data) ^ (bottom byte of crc)]
+ // Big-Endian:
+ // crc = (crc << 8) ^ tbl[(iv'th byte of data) ^ (top byte of crc)]
+ {
+ auto GetLoByte = [](IRBuilderBase &Builder, Value *Op, const Twine &Name) {
+ Type *OpTy = Op->getType();
+ unsigned OpBW = OpTy->getIntegerBitWidth();
+ return OpBW > 8
+ ? Builder.CreateAnd(Op, ConstantInt::get(OpTy, 0XFF), Name)
+ : Op;
+ };
+ auto GetHiByte = [](IRBuilderBase &Builder, Value *Op, const Twine &Name) {
+ Type *OpTy = Op->getType();
+ unsigned OpBW = OpTy->getIntegerBitWidth();
+ return OpBW > 8 ? Builder.CreateLShr(Op, ConstantInt::get(OpTy, OpBW - 8),
+ Name)
+ : Op;
+ };
+
+ IRBuilder<> Builder(CurLoop->getHeader(),
+ CurLoop->getHeader()->getFirstNonPHIIt());
+
+ // Create the CRC PHI, and initialize its incoming value to the initial
+ // value of CRC.
+ PHINode *CRCPhi = Builder.CreatePHI(CRCTy, 2, "crc");
+ CRCPhi->addIncoming(Info.LHS, CurLoop->getLoopPreheader());
+
+ // CRC is now an evolving variable, initialized to the PHI.
+ Value *CRC = CRCPhi;
+
+ // TableIndexer = ((top|bottom) byte of CRC). It is XOR'ed with (iv'th byte
+ // of LHSAux), if LHSAux is non-nullptr.
+ Value *Indexer = Info.ByteOrderSwapped
+ ? GetHiByte(Builder, CRC, "crc.indexer.hi")
+ : GetLoByte(Builder, CRC, "crc.indexer.lo");
+
+ if (Value *Data = Info.LHSAux) {
+ Type *DataTy = Data->getType();
+
+ // To index into the (iv'th byte of LHSAux), we multiply iv by 8, and we
+ // shift right by that amount, and take the lo-byte (in the little-endian
+ // case), or shift left by that amount, and take the hi-byte (in the
+ // big-endian case).
+ Value *IVBits = Builder.CreateZExtOrTrunc(
+ Builder.CreateShl(IV, 3, "iv.bits"), DataTy, "iv.indexer");
+ Value *DataIndexer =
+ Info.ByteOrderSwapped
+ ? GetHiByte(Builder,
+ Builder.CreateShl(Data, IVBits, "data.indexer"),
+ "data.indexer.hi")
+ : GetLoByte(Builder,
+ Builder.CreateLShr(Data, IVBits, "data.indexer"),
+ "data.indexer.lo");
+ Indexer = Builder.CreateXor(
+ DataIndexer,
+ Builder.CreateZExtOrTrunc(Indexer, DataTy, "crc.indexer.cast"),
+ "crc.data.indexer");
+ }
+
+ if (Indexer->getType()->getIntegerBitWidth() == 8)
+ // If the indexer is of i8 type, the GEP could be interpreted with a
+ // negative offset.
+ Indexer = Builder.CreateZExt(Indexer, Type::getInt64Ty(SE->getContext()),
+ "indexer.ext");
+
+ // CRCTableLd = CRCTable[(iv'th byte of data) ^ (top|bottom) byte of CRC].
+ Value *CRCTableGEP =
+ Builder.CreateInBoundsGEP(CRCTy, GV, Indexer, "tbl.ptradd");
+ Value *CRCTableLd = Builder.CreateLoad(CRCTy, CRCTableGEP, "tbl.ld");
+
+ // CRCNext = (CRC (<<|>>) 8) ^ CRCTableLd, or simply CRCTableLd in case of
+ // CRC-8.
+ Value *CRCNext = CRCTableLd;
+ if (CRCBW > 8) {
+ Value *CRCShift = Info.ByteOrderSwapped
+ ? Builder.CreateShl(CRC, 8, "crc.be.shift")
+ : Builder.CreateLShr(CRC, 8, "crc.le.shift");
+ CRCNext = Builder.CreateXor(CRCShift, CRCTableLd, "crc.next");
+ }
+
+ // Connect the back-edge for the loop, and RAUW the ComputedValue.
+ CRCPhi->addIncoming(CRCNext, CurLoop->getLoopLatch());
+ Info.ComputedValue->replaceUsesOutsideBlock(CRCNext,
+ CurLoop->getLoopLatch());
+ }
+
+ // Cleanup.
+ {
+ for (PHINode *PN : Cleanup)
+ RecursivelyDeleteDeadPHINode(PN);
+ SE->forgetLoop(CurLoop);
+ }
+ return true;
+}
+
bool LoopIdiomRecognize::runOnNoncountableLoop() {
LLVM_DEBUG(dbgs() << DEBUG_TYPE " Scanning: F["
<< CurLoop->getHeader()->getParent()->getName()
diff --git a/llvm/test/Transforms/LoopIdiom/cyclic-redundancy-check.ll b/llvm/test/Transforms/LoopIdiom/cyclic-redundancy-check.ll
new file mode 100644
index 0000000000000..266b78a5ab03e
--- /dev/null
+++ b/llvm/test/Transforms/LoopIdiom/cyclic-redundancy-check.ll
@@ -0,0 +1,390 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals all --version 5
+; RUN: opt -passes=loop-idiom -S %s| FileCheck %s
+
+;.
+; CHECK: @.crctable = private constant [256 x i16] [i16 0, i16 -16191, i16 -15999, i16 320, i16 -15615, i16 960, i16 640, i16 -15807, i16 -14847, i16 1728, i16 1920, i16 -14527, i16 1280, i16 -14911, i16 -15231, i16 1088, i16 -13311, i16 3264, i16 3456, i16 -12991, i16 3840, i16 -12351, i16 -12671, i16 3648, i16 2560, i16 -13631, i16 -13439, i16 2880, i16 -14079, i16 2496, i16 2176, i16 -14271, i16 -10239, i16 6336, i16 6528, i16 -9919, i16 6912, i16 -9279, i16 -9599, i16 6720, i16 7680, i16 -8511, i16 -8319, i16 8000, i16 -8959, i16 7616, i16 7296, i16 -9151, i16 5120, i16 -11071, i16 -10879, i16 5440, i16 -10495, i16 6080, i16 5760, i16 -10687, i16 -11775, i16 4800, i16 4992, i16 -11455, i16 4352, i16 -11839, i16 -12159, i16 4160, i16 -4095, i16 12480, i16 12672, i16 -3775, i16 13056, i16 -3135, i16 -3455, i16 12864, i16 13824, i16 -2367, i16 -2175, i16 14144, i16 -2815, i16 13760, i16 13440, i16 -3007, i16 15360, i16 -831, i16 -639, i16 15680, i16 -255, i16 16320, i16 16000, i16 -447, i16 -1535, i16 15040, i16 15232, i16 -1215, i16 14592, i16 -1599, i16 -1919, i16 14400, i16 10240, i16 -5951, i16 -5759, i16 10560, i16 -5375, i16 11200, i16 10880, i16 -5567, i16 -4607, i16 11968, i16 12160, i16 -4287, i16 11520, i16 -4671, i16 -4991, i16 11328, i16 -7167, i16 9408, i16 9600, i16 -6847, i16 9984, i16 -6207, i16 -6527, i16 9792, i16 8704, i16 -7487, i16 -7295, i16 9024, i16 -7935, i16 8640, i16 8320, i16 -8127, i16 -24575, i16 24768, i16 24960, i16 -24255, i16 25344, i16 -23615, i16 -23935, i16 25152, i16 26112, i16 -22847, i16 -22655, i16 26432, i16 -23295, i16 26048, i16 25728, i16 -23487, i16 27648, i16 -21311, i16 -21119, i16 27968, i16 -20735, i16 28608, i16 28288, i16 -20927, i16 -22015, i16 27328, i16 27520, i16 -21695, i16 26880, i16 -22079, i16 -22399, i16 26688, i16 30720, i16 -18239, i16 -18047, i16 31040, i16 -17663, i16 31680, i16 31360, i16 -17855, i16 -16895, i16 32448, i16 32640, i16 -16575, i16 32000, i16 -16959, i16 -17279, i16 31808, i16 -19455, i16 29888, i16 30080, i16 -19135, i16 30464, i16 -18495, i16 -18815, i16 30272, i16 29184, i16 -19775, i16 -19583, i16 29504, i16 -20223, i16 29120, i16 28800, i16 -20415, i16 20480, i16 -28479, i16 -28287, i16 20800, i16 -27903, i16 21440, i16 21120, i16 -28095, i16 -27135, i16 22208, i16 22400, i16 -26815, i16 21760, i16 -27199, i16 -27519, i16 21568, i16 -25599, i16 23744, i16 23936, i16 -25279, i16 24320, i16 -24639, i16 -24959, i16 24128, i16 23040, i16 -25919, i16 -25727, i16 23360, i16 -26367, i16 22976, i16 22656, i16 -26559, i16 -30719, i16 18624, i16 18816, i16 -30399, i16 19200, i16 -29759, i16 -30079, i16 19008, i16 19968, i16 -28991, i16 -28799, i16 20288, i16 -29439, i16 19904, i16 19584, i16 -29631, i16 17408, i16 -31551, i16 -31359, i16 17728, i16 -30975, i16 18368, i16 18048, i16 -31167, i16 -32255, i16 17088, i16 17280, i16 -31935, i16 16640, i16 -32319, i16 -32639, i16 16448]
+; CHECK: @.crctable.1 = private constant [256 x i16] [i16 0, i16 -16191, i16 -15999, i16 320, i16 -15615, i16 960, i16 640, i16 -15807, i16 -14847, i16 1728, i16 1920, i16 -14527, i16 1280, i16 -14911, i16 -15231, i16 1088, i16 -13311, i16 3264, i16 3456, i16 -12991, i16 3840, i16 -12351, i16 -12671, i16 3648, i16 2560, i16 -13631, i16 -13439, i16 2880, i16 -14079, i16 2496, i16 2176, i16 -14271, i16 -10239, i16 6336, i16 6528, i16 -9919, i16 6912, i16 -9279, i16 -9599, i16 6720, i16 7680, i16 -8511, i16 -8319, i16 8000, i16 -8959, i16 7616, i16 7296, i16 -9151, i16 5120, i16 -11071, i16 -10879, i16 5440, i16 -10495, i16 6080, i16 5760, i16 -10687, i16 -11775, i16 4800, i16 4992, i16 -11455, i16 4352, i16 -11839, i16 -12159, i16 4160, i16 -4095, i16 12480, i16 12672, i16 -3775, i16 13056, i16 -313...
[truncated]
|
4b1419f
to
85467c5
Compare
We now test type-mismatches in llvm-test-suite! llvm/llvm-test-suite#261. I found a couple of weaknesses in the analysis, and some errors in the transform: I have now verified that, with #144878 and #144881, the newly-added tests in llvm-test-suite are optimized correctly. I'm feeling more confident about turning this optimization on by default: kindly let me know if you have other testing-related suggestions, so we can increase our confidence further. |
[CRCTy](const APInt &E) { return ConstantInt::get(CRCTy, E); }); | ||
Constant *ConstArray = | ||
ConstantArray::get(ArrayType::get(CRCTy, 256), CRCConstants); | ||
GlobalVariable *GV = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we created two identical LUTs in two translation units, does ICF work during linking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect Identical COMDAT folding to work on any two identical global variables, and I'm not sure what that has to do with the patch? Does it depend on some parameter of the GlobalVariable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "identical" ? Constant arrays cannot be merged if the program compares array addresses. So I'd expect that's an option. Also, same-translation-unit case would be nice to test, in case someone implements CRC as a C macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I'm not sure how we'd write a test for this: possibly an LLD test?
85467c5
to
66dbf1c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the input loop did other things aside from calculating the CRC (such as calling extern
functions)? Will we abandon this transform or preserve the non-CRC operations? Do we have a test for that?
I think we want to skip this transform if optimizing for size.
66dbf1c
to
70d2960
Compare
We will abandon the transform, since HashRecognize will always abort on stray operations. Added a test. |
70d2960
to
1339e55
Compare
Actually, if the call isn't in the use-def chain (say |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, if the call isn't in the use-def chain (say
call void @foo()
), it will still remain in the transformed loop as ValueEvolution doesn't see it. However, if the call is side-effecting, the transform would be unsound, as the number of loop iterations have changed: this is probably a bug in HashRecognize, and I'll look into this.
We need that potential miscompile fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need that potential miscompile fixed.
Now that #148620 is merged, I'm ok with this PR.
1339e55
to
6ca643c
Compare
Running some final checks. |
cl::desc("Proceed with loop idiom recognize pass, " | ||
"but do not optimize CRC loops."), | ||
cl::location(DisableLIRP::HashRecognize), | ||
cl::init(false), cl::ReallyHidden); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, everything seems to be in order: will merge. Quick note for reviewers in case regressions are found: prefer flipping this cl::init to true while the regression is being investigated, over reverting the patch.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/141/builds/11316 Here is the relevant piece of the build log for the reference
|
Optimize CRC loops using a Sarwate table-lookup by using the results of HashRecognize in LoopIdiomRecognize. The optimization is checked for correctness using the SingleSource/UnitTests/HashRecognize tests in llvm-test-suite.