-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[MCA] Enable customization of individual instructions #155420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…to change latency via special comments * Added customizations to header * Added customization to implementation * Use default small vector size for custom instructions * Typo fixed * Use std::vector for custom descriptors * Revert to SmallVector * Erroneous dereference removed * Introduced annotated instructions * Updated getter method * Updated getter usage * Fixed type usage * Revert to MCInst for regions * Keep annotations in separate map * Removed usage of annotated instruction * Use std::optional * Add setter/getter for explicit latency * Set instruction latency according to annotation * Simplify default return * Revert to explicit default value * Add missing brace * Added missing field name * Typo fixed * Added missing namespace tags * Whitespace removed * Index annotations by smloc * Search latency by SMLoc * Use pointer as a map key * Missing const added * Added optional annotation to MCStreamerWrapper * Added interface to annotate instructions * Move annotations to CodeRegons * Get explicit latency from Regions * Erroneous dereference removed * Add interface to pass latency to MCStreamerWrapper * Fixed incorrect code placement * Make Streamer available to CommentConsumer * Move MCStreamerWrapper definition to make it visible in comment consumers * Parse latency comment * Fix drop_front usage * Fix whitespace skipping on latency annotation * Added test for latency annotation * Additional test for latency annotations * Use load in test instruction * Use load in test instruction * Added instruction builder customization * Added customized instruction creation * Pass builder to custom creation functor * Added header with InstrBuilder declaration * Typo fixed * Added base for instruction customizations test * Added missing result object * Hardcode name for summary view * Explicitly model single instruction * Add actual custom latency test * Typo fixed * Pass latency to lambda body * Adding missing const * Specify lambda return type * Erroneous argument removed * Removed unnecessary lambda return type * Code cleanup * Code cleanup
@llvm/pr-subscribers-tools-llvm-mca @llvm/pr-subscribers-backend-x86 Author: Roman Belenov (r-belenov) ChangesCurrently MCA takes instruction properties from scheduling model. However, some instructions may execute differently depending on external factors - for example, latency of memory instructions may vary differently depending on whether the load comes from L1 cache, L2 or DRAM. While MCA as a static analysis tool cannot model such differences (and currently takes some static decision, e.g. all memory ops are treated as L1 accesses), it makes sense to allow manual modification of instruction properties to model different behavior (e.g. sensitivity of code performance to cache misses in particular load instruction). This patch addresses this need. The library modification is intentionally generic - arbitrary modifications to InstrDesc are allowed. The tool support is currently limited to changing instruction latencies (single number applies to all output arguments and MaxLatency) via coments in the input assembler code; the format is the like this: add (%eax), eax // LLVM-MCA-LATENCY:100 Users of MCA library can already make additional customizations; command line tool can be extended in the future. Note that InstructionView currently shows per-instruction information according to scheduling model and is not affected by this change. See #133429 for additional clarifications (including explanation why existing customization mechanisms do not provide required functionality) Full diff: https://github.com/llvm/llvm-project/pull/155420.diff 11 Files Affected:
diff --git a/llvm/include/llvm/MCA/InstrBuilder.h b/llvm/include/llvm/MCA/InstrBuilder.h
index e0949d975fa99..9bb8c5b28fdad 100644
--- a/llvm/include/llvm/MCA/InstrBuilder.h
+++ b/llvm/include/llvm/MCA/InstrBuilder.h
@@ -25,6 +25,7 @@
#include "llvm/MCA/Support.h"
#include "llvm/Support/Compiler.h"
#include "llvm/Support/Error.h"
+#include <functional>
namespace llvm {
namespace mca {
@@ -78,6 +79,10 @@ class InstrBuilder {
DenseMap<std::pair<hash_code, unsigned>, std::unique_ptr<const InstrDesc>>
VariantDescriptors;
+ // These descriptors are customized for particular instructions and cannot
+ // be reused
+ SmallVector<std::unique_ptr<const InstrDesc>> CustomDescriptors;
+
bool FirstCallInst;
bool FirstReturnInst;
unsigned CallLatency;
@@ -87,7 +92,8 @@ class InstrBuilder {
Expected<unsigned> getVariantSchedClassID(const MCInst &MCI, unsigned SchedClassID);
Expected<const InstrDesc &>
- createInstrDescImpl(const MCInst &MCI, const SmallVector<Instrument *> &IVec);
+ createInstrDescImpl(const MCInst &MCI, const SmallVector<Instrument *> &IVec,
+ std::function<void(InstrDesc&)> Customizer = {});
Expected<const InstrDesc &>
getOrCreateInstrDesc(const MCInst &MCI,
const SmallVector<Instrument *> &IVec);
@@ -116,7 +122,8 @@ class InstrBuilder {
void setInstRecycleCallback(InstRecycleCallback CB) { InstRecycleCB = CB; }
LLVM_ABI Expected<std::unique_ptr<Instruction>>
- createInstruction(const MCInst &MCI, const SmallVector<Instrument *> &IVec);
+ createInstruction(const MCInst &MCI, const SmallVector<Instrument *> &IVec,
+ std::function<void(InstrDesc&)> Customizer = {});
};
} // namespace mca
} // namespace llvm
diff --git a/llvm/lib/MCA/InstrBuilder.cpp b/llvm/lib/MCA/InstrBuilder.cpp
index cad25a6ddd3f5..4a0f09c4160d9 100644
--- a/llvm/lib/MCA/InstrBuilder.cpp
+++ b/llvm/lib/MCA/InstrBuilder.cpp
@@ -558,7 +558,8 @@ Expected<unsigned> InstrBuilder::getVariantSchedClassID(const MCInst &MCI,
Expected<const InstrDesc &>
InstrBuilder::createInstrDescImpl(const MCInst &MCI,
- const SmallVector<Instrument *> &IVec) {
+ const SmallVector<Instrument *> &IVec,
+ std::function<void(InstrDesc&)> Customizer) {
assert(STI.getSchedModel().hasInstrSchedModel() &&
"Itineraries are not yet supported!");
@@ -632,6 +633,12 @@ InstrBuilder::createInstrDescImpl(const MCInst &MCI,
return std::move(Err);
// Now add the new descriptor.
+
+ if (Customizer) {
+ Customizer(*ID);
+ return *CustomDescriptors.emplace_back(std::move(ID));
+ }
+
bool IsVariadic = MCDesc.isVariadic();
if ((ID->IsRecyclable = !IsVariadic && !IsVariant)) {
auto DKey = std::make_pair(MCI.getOpcode(), SchedClassID);
@@ -675,8 +682,10 @@ STATISTIC(NumVariantInst, "Number of MCInsts that doesn't have static Desc");
Expected<std::unique_ptr<Instruction>>
InstrBuilder::createInstruction(const MCInst &MCI,
- const SmallVector<Instrument *> &IVec) {
- Expected<const InstrDesc &> DescOrErr = getOrCreateInstrDesc(MCI, IVec);
+ const SmallVector<Instrument *> &IVec,
+ std::function<void(InstrDesc&)> Customizer) {
+ Expected<const InstrDesc &> DescOrErr = Customizer? createInstrDescImpl(MCI, IVec, Customizer) :
+ getOrCreateInstrDesc(MCI, IVec);
if (!DescOrErr)
return DescOrErr.takeError();
const InstrDesc &D = *DescOrErr;
diff --git a/llvm/test/tools/llvm-mca/X86/llvm-mca-markers-13.s b/llvm/test/tools/llvm-mca/X86/llvm-mca-markers-13.s
new file mode 100644
index 0000000000000..9115ffadbdaee
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/X86/llvm-mca-markers-13.s
@@ -0,0 +1,10 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=10 %s 2>&1 | FileCheck %s
+
+add (%eax), %eax // LLVM-MCA-LATENCY : 100
+mov %eax, (%ebx)
+
+# CHECK: Iterations: 10
+# CHECK-NEXT: Instructions: 20
+# CHECK-NEXT: Total Cycles: 1004
+# CHECK-NEXT: Total uOps: 20
diff --git a/llvm/test/tools/llvm-mca/X86/llvm-mca-markers-14.s b/llvm/test/tools/llvm-mca/X86/llvm-mca-markers-14.s
new file mode 100644
index 0000000000000..41da9f4b4cef1
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/X86/llvm-mca-markers-14.s
@@ -0,0 +1,10 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=10 %s 2>&1 | FileCheck %s
+
+add (%eax), %eax // LLVM-MCA-LATENCY : 100
+mov %eax, (%ebx) // LLVM-MCA-LATENCY : 100
+
+# CHECK: Iterations: 10
+# CHECK-NEXT: Instructions: 20
+# CHECK-NEXT: Total Cycles: 1103
+# CHECK-NEXT: Total uOps: 20
diff --git a/llvm/tools/llvm-mca/CodeRegion.h b/llvm/tools/llvm-mca/CodeRegion.h
index c04ea51169cfb..d16f976fe101e 100644
--- a/llvm/tools/llvm-mca/CodeRegion.h
+++ b/llvm/tools/llvm-mca/CodeRegion.h
@@ -69,10 +69,15 @@
#include "llvm/Support/SMLoc.h"
#include "llvm/Support/SourceMgr.h"
#include <vector>
+#include <utility>
namespace llvm {
namespace mca {
+struct InstAnnotation {
+ std::optional<unsigned> Latency;
+};
+
/// A region of assembly code.
///
/// It identifies a sequence of machine instructions.
@@ -155,6 +160,9 @@ class CodeRegions {
llvm::StringMap<unsigned> ActiveRegions;
bool FoundErrors;
+ // Annotations specified in comments, indexed by SMLoc value
+ llvm::DenseMap<const char*, InstAnnotation> Annotations;
+
public:
CodeRegions(llvm::SourceMgr &S) : SM(S), FoundErrors(false) {}
virtual ~CodeRegions() = default;
@@ -170,6 +178,16 @@ class CodeRegions {
void addInstruction(const llvm::MCInst &Instruction);
llvm::SourceMgr &getSourceMgr() const { return SM; }
+ void Annotate(llvm::SMLoc Loc, const InstAnnotation& A) { Annotations[Loc.getPointer()] = A; }
+ std::optional<unsigned> getExplicitLatency(llvm::SMLoc Loc) const {
+ const auto It = Annotations.find(Loc.getPointer());
+ if (It != Annotations.end()) {
+ return It->second.Latency;
+ } else {
+ return {};
+ }
+ }
+
llvm::ArrayRef<llvm::MCInst> getInstructionSequence(unsigned Idx) const {
return Regions[Idx]->getInstructions();
}
diff --git a/llvm/tools/llvm-mca/CodeRegionGenerator.cpp b/llvm/tools/llvm-mca/CodeRegionGenerator.cpp
index f7f929e49ead9..767f92bf6f935 100644
--- a/llvm/tools/llvm-mca/CodeRegionGenerator.cpp
+++ b/llvm/tools/llvm-mca/CodeRegionGenerator.cpp
@@ -95,6 +95,18 @@ void AnalysisRegionCommentConsumer::HandleComment(SMLoc Loc,
return;
Comment = Comment.drop_front(Position);
+ if (Comment.starts_with("LLVM-MCA-LATENCY")) {
+ auto Parts = Comment.split(':');
+ Position = Parts.second.find_first_not_of(" \t");
+ if (Position >= Parts.second.size())
+ return;
+ auto LatStr = Parts.second.drop_front(Position);
+ unsigned Latency = 0;
+ if (!LatStr.getAsInteger(10, Latency))
+ Streamer.AddLatencyAnnotation(Latency);
+ return;
+ }
+
if (Comment.consume_front("LLVM-MCA-END")) {
// Skip spaces and tabs.
Position = Comment.find_first_not_of(" \t");
diff --git a/llvm/tools/llvm-mca/CodeRegionGenerator.h b/llvm/tools/llvm-mca/CodeRegionGenerator.h
index a48c67a22f27b..2c632be5e7f53 100644
--- a/llvm/tools/llvm-mca/CodeRegionGenerator.h
+++ b/llvm/tools/llvm-mca/CodeRegionGenerator.h
@@ -31,6 +31,50 @@
namespace llvm {
namespace mca {
+// This class provides the callbacks that occur when parsing input assembly.
+class MCStreamerWrapper : public MCStreamer {
+protected:
+ CodeRegions &Regions;
+ std::optional<InstAnnotation> CurrentAnnotation;
+
+public:
+ MCStreamerWrapper(MCContext &Context, mca::CodeRegions &R)
+ : MCStreamer(Context), Regions(R) {}
+
+ // We only want to intercept the emission of new instructions.
+ void emitInstruction(const MCInst &Inst,
+ const MCSubtargetInfo & /* unused */) override {
+ Regions.addInstruction(Inst);
+ if (CurrentAnnotation) {
+ Regions.Annotate(Inst.getLoc(), *CurrentAnnotation);
+ CurrentAnnotation = {};
+ }
+ }
+
+ void AddLatencyAnnotation(unsigned Lat) {
+ if (!CurrentAnnotation) CurrentAnnotation = InstAnnotation();
+ CurrentAnnotation->Latency = Lat;
+ }
+
+ bool emitSymbolAttribute(MCSymbol *Symbol, MCSymbolAttr Attribute) override {
+ return true;
+ }
+
+ void emitCommonSymbol(MCSymbol *Symbol, uint64_t Size,
+ Align ByteAlignment) override {}
+ void emitZerofill(MCSection *Section, MCSymbol *Symbol = nullptr,
+ uint64_t Size = 0, Align ByteAlignment = Align(1),
+ SMLoc Loc = SMLoc()) override {}
+ void beginCOFFSymbolDef(const MCSymbol *Symbol) override {}
+ void emitCOFFSymbolStorageClass(int StorageClass) override {}
+ void emitCOFFSymbolType(int Type) override {}
+ void endCOFFSymbolDef() override {}
+
+ ArrayRef<MCInst> GetInstructionSequence(unsigned Index) const {
+ return Regions.getInstructionSequence(Index);
+ }
+};
+
class MCACommentConsumer : public AsmCommentConsumer {
protected:
bool FoundError = false;
@@ -44,9 +88,10 @@ class MCACommentConsumer : public AsmCommentConsumer {
/// A comment consumer that parses strings. The only valid tokens are strings.
class AnalysisRegionCommentConsumer : public MCACommentConsumer {
AnalysisRegions &Regions;
+ MCStreamerWrapper &Streamer;
public:
- AnalysisRegionCommentConsumer(AnalysisRegions &R) : Regions(R) {}
+ AnalysisRegionCommentConsumer(AnalysisRegions &R, MCStreamerWrapper &S) : Regions(R), Streamer(S) {}
/// Parses a comment. It begins a new region if it is of the form
/// LLVM-MCA-BEGIN. It ends a region if it is of the form LLVM-MCA-END.
@@ -82,40 +127,6 @@ class InstrumentRegionCommentConsumer : public MCACommentConsumer {
InstrumentManager &getInstrumentManager() { return IM; }
};
-// This class provides the callbacks that occur when parsing input assembly.
-class MCStreamerWrapper : public MCStreamer {
-protected:
- CodeRegions &Regions;
-
-public:
- MCStreamerWrapper(MCContext &Context, mca::CodeRegions &R)
- : MCStreamer(Context), Regions(R) {}
-
- // We only want to intercept the emission of new instructions.
- void emitInstruction(const MCInst &Inst,
- const MCSubtargetInfo & /* unused */) override {
- Regions.addInstruction(Inst);
- }
-
- bool emitSymbolAttribute(MCSymbol *Symbol, MCSymbolAttr Attribute) override {
- return true;
- }
-
- void emitCommonSymbol(MCSymbol *Symbol, uint64_t Size,
- Align ByteAlignment) override {}
- void emitZerofill(MCSection *Section, MCSymbol *Symbol = nullptr,
- uint64_t Size = 0, Align ByteAlignment = Align(1),
- SMLoc Loc = SMLoc()) override {}
- void beginCOFFSymbolDef(const MCSymbol *Symbol) override {}
- void emitCOFFSymbolStorageClass(int StorageClass) override {}
- void emitCOFFSymbolType(int Type) override {}
- void endCOFFSymbolDef() override {}
-
- ArrayRef<MCInst> GetInstructionSequence(unsigned Index) const {
- return Regions.getInstructionSequence(Index);
- }
-};
-
class InstrumentMCStreamer : public MCStreamerWrapper {
InstrumentManager &IM;
@@ -210,15 +221,15 @@ class AsmCodeRegionGenerator : public virtual CodeRegionGenerator {
class AsmAnalysisRegionGenerator final : public AnalysisRegionGenerator,
public AsmCodeRegionGenerator {
- AnalysisRegionCommentConsumer CC;
MCStreamerWrapper Streamer;
+ AnalysisRegionCommentConsumer CC;
public:
AsmAnalysisRegionGenerator(const Target &T, llvm::SourceMgr &SM, MCContext &C,
const MCAsmInfo &A, const MCSubtargetInfo &S,
const MCInstrInfo &I)
: AnalysisRegionGenerator(SM), AsmCodeRegionGenerator(T, C, A, S, I),
- CC(Regions), Streamer(Ctx, Regions) {}
+ Streamer(Ctx, Regions), CC(Regions, Streamer) {}
MCACommentConsumer *getCommentConsumer() override { return &CC; };
CodeRegions &getRegions() override { return Regions; };
diff --git a/llvm/tools/llvm-mca/llvm-mca.cpp b/llvm/tools/llvm-mca/llvm-mca.cpp
index 4b0611186d21f..5f653ffcd83ed 100644
--- a/llvm/tools/llvm-mca/llvm-mca.cpp
+++ b/llvm/tools/llvm-mca/llvm-mca.cpp
@@ -633,7 +633,11 @@ int main(int argc, char **argv) {
const SmallVector<mca::Instrument *> Instruments =
InstrumentRegions.getActiveInstruments(Loc);
- Expected<std::unique_ptr<mca::Instruction>> Inst =
+ auto Latency = Regions.getExplicitLatency(Loc);
+ Expected<std::unique_ptr<mca::Instruction>> Inst = Latency ?
+ IB.createInstruction(MCI, Instruments, [=](llvm::mca::InstrDesc& ID) {
+ for (auto& W : ID.Writes) W.Latency = *Latency;
+ ID.MaxLatency = *Latency; }) :
IB.createInstruction(MCI, Instruments);
if (!Inst) {
if (auto NewE = handleErrors(
diff --git a/llvm/unittests/tools/llvm-mca/MCATestBase.cpp b/llvm/unittests/tools/llvm-mca/MCATestBase.cpp
index cd3258ef96d64..96413168b8f5e 100644
--- a/llvm/unittests/tools/llvm-mca/MCATestBase.cpp
+++ b/llvm/unittests/tools/llvm-mca/MCATestBase.cpp
@@ -61,7 +61,8 @@ void MCATestBase::SetUp() {
Error MCATestBase::runBaselineMCA(json::Object &Result, ArrayRef<MCInst> Insts,
ArrayRef<mca::View *> Views,
- const mca::PipelineOptions *PO) {
+ const mca::PipelineOptions *PO,
+ Builder B) {
mca::Context MCA(*MRI, *STI);
// Default InstrumentManager
@@ -72,7 +73,7 @@ Error MCATestBase::runBaselineMCA(json::Object &Result, ArrayRef<MCInst> Insts,
SmallVector<std::unique_ptr<mca::Instruction>> LoweredInsts;
for (const auto &MCI : Insts) {
Expected<std::unique_ptr<mca::Instruction>> Inst =
- IB.createInstruction(MCI, Instruments);
+ B ? B(IB, MCI, Instruments) : IB.createInstruction(MCI, Instruments);
if (!Inst) {
if (auto NewE =
handleErrors(Inst.takeError(),
diff --git a/llvm/unittests/tools/llvm-mca/MCATestBase.h b/llvm/unittests/tools/llvm-mca/MCATestBase.h
index 66e20a45c96ce..399a8e892bc28 100644
--- a/llvm/unittests/tools/llvm-mca/MCATestBase.h
+++ b/llvm/unittests/tools/llvm-mca/MCATestBase.h
@@ -24,6 +24,7 @@
#include "llvm/MC/MCTargetOptions.h"
#include "llvm/MC/TargetRegistry.h"
#include "llvm/MCA/Context.h"
+#include "llvm/MCA/InstrBuilder.h"
#include "llvm/TargetParser/SubtargetFeature.h"
#include "llvm/TargetParser/Triple.h"
@@ -70,12 +71,16 @@ class MCATestBase : public ::testing::Test {
void SetUp() override;
+ using Builder = std::function<Expected<std::unique_ptr<mca::Instruction>>
+ (mca::InstrBuilder&, const MCInst&, const SmallVector<mca::Instrument *>&)>;
+
/// Utility function to run MCA with (nearly) the same configuration as the
/// `llvm-mca` tool to verify result correctness.
/// This function only displays on SummaryView by default.
virtual Error runBaselineMCA(json::Object &Result, ArrayRef<MCInst> Insts,
ArrayRef<mca::View *> Views = {},
- const mca::PipelineOptions *PO = nullptr);
+ const mca::PipelineOptions *PO = nullptr,
+ Builder B = {});
};
} // end namespace mca
diff --git a/llvm/unittests/tools/llvm-mca/X86/TestIncrementalMCA.cpp b/llvm/unittests/tools/llvm-mca/X86/TestIncrementalMCA.cpp
index 1a14c687295ca..cf66460b11fc8 100644
--- a/llvm/unittests/tools/llvm-mca/X86/TestIncrementalMCA.cpp
+++ b/llvm/unittests/tools/llvm-mca/X86/TestIncrementalMCA.cpp
@@ -234,3 +234,29 @@ TEST_F(X86TestBase, TestVariantInstructionsSameAddress) {
Expected<unsigned> Cycles = P->run();
ASSERT_TRUE(static_cast<bool>(Cycles));
}
+
+TEST_F(X86TestBase, TestInstructionCustomization) {
+ const unsigned ExplicitLatency = 100;
+ SmallVector<MCInst> MCIs;
+ MCInst InstructionToAdd = MCInstBuilder(X86::XOR64rr)
+ .addReg(X86::RAX)
+ .addReg(X86::RAX)
+ .addReg(X86::RAX);
+ MCIs.push_back(InstructionToAdd);
+
+ // Run the baseline.
+ json::Object BaselineResult;
+ auto E = runBaselineMCA(BaselineResult, MCIs, {}, nullptr,
+ [=](InstrBuilder& IB, const MCInst& MCI, const SmallVector<Instrument*>& Instruments) {
+ return IB.createInstruction(MCI, Instruments,
+ [=](InstrDesc& ID) {
+ for (auto& W : ID.Writes) W.Latency = ExplicitLatency;
+ ID.MaxLatency = ExplicitLatency;
+ });
+ });
+ auto *BaselineObj = BaselineResult.getObject("SummaryView");
+ auto V = BaselineObj->getInteger("TotalCycles");
+ ASSERT_TRUE(V);
+ // Additional 3 cycles for Dispatch, Executed and Retired states
+ ASSERT_EQ(unsigned(*V), ExplicitLatency + 3) << "Total cycles do not match";
+}
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add another new callback in InstrumentManager
to implement the customizer, and move the parsing logics of LLVM-MCA-LATENCY
from CodeRegionGenerator.* to InstrumentManager (either putting it in X86's custom behavior or in InstrumentManager's base implementation) because InstrumentManager was created exactly for supporting custom annotations like LLVM-MCA-LATENCY
.
llvm/include/llvm/MCA/InstrBuilder.h
Outdated
@@ -87,7 +92,8 @@ class InstrBuilder { | |||
|
|||
Expected<unsigned> getVariantSchedClassID(const MCInst &MCI, unsigned SchedClassID); | |||
Expected<const InstrDesc &> | |||
createInstrDescImpl(const MCInst &MCI, const SmallVector<Instrument *> &IVec); | |||
createInstrDescImpl(const MCInst &MCI, const SmallVector<Instrument *> &IVec, | |||
std::function<void(InstrDesc &)> Customizer = {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use llvm::function_ref
: https://www.llvm.org/docs/ProgrammersManual.html#passing-functions-and-other-callable-objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
llvm/include/llvm/MCA/InstrBuilder.h
Outdated
@@ -116,7 +122,8 @@ class InstrBuilder { | |||
void setInstRecycleCallback(InstRecycleCallback CB) { InstRecycleCB = CB; } | |||
|
|||
LLVM_ABI Expected<std::unique_ptr<Instruction>> | |||
createInstruction(const MCInst &MCI, const SmallVector<Instrument *> &IVec); | |||
createInstruction(const MCInst &MCI, const SmallVector<Instrument *> &IVec, | |||
std::function<void(InstrDesc &)> Customizer = {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
* Switch to function_ref * Switch to function_ref * Check pipeline execution status * Switch to function_ref
Extended InstrumentManager to allow InstrDesc modification with instruments; added common instrument that can modify instruction latencies.
Refactored code to use InstrumentManager (not quite polished yet). |
Fixed formatting issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your current solution could be (greatly) simplified, please see my inline comment
@@ -143,19 +146,27 @@ class LLVM_ABI InstrumentManager { | |||
protected: | |||
const MCSubtargetInfo &STI; | |||
const MCInstrInfo &MCII; | |||
bool EnableDefaults; | |||
std::unique_ptr<InstrumentManager> TargetIM; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InstrumentManager is already target-specific, why do you need TargetIM here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to have InstrDesc customization generic (e.g. latency change code is generic, other things I do locally also do not require target details) and coexist with target-specific instrumentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to have InstrDesc customization generic (e.g. latency change code is generic, other things I do locally also do not require target details) and coexist with target-specific instrumentation.
I think my concern here is that this additional layer of pointer indirection could be avoid if we could simply put this latency updating logics in the base IM and called it from the target-specific, derived IM when needed, as I described in the sibling comment
llvm/lib/MCA/CustomBehaviour.cpp
Outdated
@@ -42,19 +43,91 @@ CustomBehaviour::getEndViews(llvm::MCInstPrinter &IP, | |||
return std::vector<std::unique_ptr<View>>(); | |||
} | |||
|
|||
class CustomInstrument : public Instrument { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be solved in a much simpler way: given a line "LLVM-MCA-LATENCY: 100", the default Instrument
could already provide you Desc
and Data
of "LLVM-MCA-LATENCY:" and "100" respectively. So I don't think you need the CustomInstrument
here.
And then for supportsInstrumentType
, canCustomize
and customize
-- you could just put the implementation directly in the function, and call this base implementation for the target-specific override, if a target overrides it. Such that you don't need the slightly confusing TargetIM
indirection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given a line "LLVM-MCA-LATENCY: 100", the default Instrument could already provide you Desc and Data of "LLVM-MCA-LATENCY:" and "100" respectively. So I don't think you need the CustomInstrument here.
Ok, I can move the logic to base Instrument implementation
Such that you don't need the slightly confusing TargetIM indirection.
Without TargetIM generic customization will be disabled in case target-specific IM is used (since it will override createInstrument), unless all target IMs call base class createInstrument for unrecognized types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, supportsIntrumentType() is normally overridden in target-specific IMs, while we need to check for both common and target specific instruments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added target IM creation into default IM constructor to simplify the logic in the tool code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless all target IMs call base class createInstrument for unrecognized types.
yes, thats what i meant when I said "call this base implementation for the target-specific override, if a target overrides it". Calling parent class's base implementation in an override function is a pretty common pattern in LLVM, and I would prefer that pattern to be used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll change the code to leave the logic in the base IM only without TargetIM indirection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS Pushed the changes that remove targetIM
|
||
Instrument() : Instrument("", "") {} | ||
|
||
virtual ~Instrument() = default; | ||
|
||
virtual bool canCustomize() const { return bool(Latency); } | ||
virtual void customize(InstrDesc &ID) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not adding this virtual function here and implement the customization logics directly in InstrumentManager::customize
instead.
With such approach, we can also avoid storing Latency in this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latency must be stored in the instrument since it may be (and normally is) different in different instrumentation regions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping - any additional comments?
@@ -143,19 +146,27 @@ class LLVM_ABI InstrumentManager { | |||
protected: | |||
const MCSubtargetInfo &STI; | |||
const MCInstrInfo &MCII; | |||
bool EnableDefaults; | |||
std::unique_ptr<InstrumentManager> TargetIM; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to have InstrDesc customization generic (e.g. latency change code is generic, other things I do locally also do not require target details) and coexist with target-specific instrumentation.
I think my concern here is that this additional layer of pointer indirection could be avoid if we could simply put this latency updating logics in the base IM and called it from the target-specific, derived IM when needed, as I described in the sibling comment
* Update CustomBehaviour.h * Update CustomBehaviour.cpp * Update llvm-mca.cpp
Currently MCA takes instruction properties from scheduling model. However, some instructions may execute differently depending on external factors - for example, latency of memory instructions may vary differently depending on whether the load comes from L1 cache, L2 or DRAM. While MCA as a static analysis tool cannot model such differences (and currently takes some static decision, e.g. all memory ops are treated as L1 accesses), it makes sense to allow manual modification of instruction properties to model different behavior (e.g. sensitivity of code performance to cache misses in particular load instruction). This patch addresses this need.
The library modification is intentionally generic - arbitrary modifications to InstrDesc are allowed. The tool support is currently limited to changing instruction latencies (single number applies to all output arguments and MaxLatency) via coments in the input assembler code; the format is the like this:
add (%eax), eax // LLVM-MCA-LATENCY:100
Users of MCA library can already make additional customizations; command line tool can be extended in the future.
Note that InstructionView currently shows per-instruction information according to scheduling model and is not affected by this change.
See #133429 for additional clarifications (including explanation why existing customization mechanisms do not provide required functionality)