-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[MemProf] Extend MemProfUse pass to make use of data access profiles to partition data #151238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
57068ea
to
9d475d4
Compare
9d475d4
to
925e9fd
Compare
925e9fd
to
6311383
Compare
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-pgo Author: Mingming Liu (mingmingl-llvm) Changesf3f2832 introduces the data access profile format as a payload inside memprof, and the MemProfUse pass reads the memprof payload. This change extends the MemProfUse pass to read the data access profiles to annotate global variables' section prefix.
Full diff: https://github.com/llvm/llvm-project/pull/151238.diff 4 Files Affected:
diff --git a/llvm/include/llvm/ProfileData/InstrProfReader.h b/llvm/include/llvm/ProfileData/InstrProfReader.h
index deb5cd17d8fd9..bccbc2006b898 100644
--- a/llvm/include/llvm/ProfileData/InstrProfReader.h
+++ b/llvm/include/llvm/ProfileData/InstrProfReader.h
@@ -729,6 +729,11 @@ class IndexedMemProfReader {
LLVM_ABI DenseMap<uint64_t, SmallVector<memprof::CallEdgeTy, 0>>
getMemProfCallerCalleePairs() const;
+ // Returns non-owned pointer to data access profile data.
+ memprof::DataAccessProfData *getDataAccessProfileData() const {
+ return DataAccessProfileData.get();
+ }
+
// Return the entire MemProf profile.
LLVM_ABI memprof::AllMemProfData getAllMemProfData() const;
@@ -900,6 +905,12 @@ class LLVM_ABI IndexedInstrProfReader : public InstrProfReader {
return MemProfReader.getSummary();
}
+ /// Returns non-owned pointer to the data access profile data.
+ /// Will be null if unavailable (version < 4).
+ memprof::DataAccessProfData *getDataAccessProfileData() const {
+ return MemProfReader.getDataAccessProfileData();
+ }
+
Error readBinaryIds(std::vector<llvm::object::BuildID> &BinaryIds) override;
Error printBinaryIds(raw_ostream &OS) override;
};
diff --git a/llvm/include/llvm/Transforms/Instrumentation/MemProfUse.h b/llvm/include/llvm/Transforms/Instrumentation/MemProfUse.h
index 6170bf48e4695..c11333bf8ce5b 100644
--- a/llvm/include/llvm/Transforms/Instrumentation/MemProfUse.h
+++ b/llvm/include/llvm/Transforms/Instrumentation/MemProfUse.h
@@ -14,6 +14,7 @@
#include "llvm/ADT/IntrusiveRefCntPtr.h"
#include "llvm/IR/PassManager.h"
+#include "llvm/ProfileData/DataAccessProf.h"
#include "llvm/ProfileData/MemProf.h"
#include "llvm/Support/Compiler.h"
@@ -36,6 +37,10 @@ class MemProfUsePass : public PassInfoMixin<MemProfUsePass> {
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
private:
+ // Annotate global variables' section prefix based on data access profile.
+ bool
+ annotateGlobalVariables(Module &M,
+ const memprof::DataAccessProfData *DataAccessProf);
std::string MemoryProfileFileName;
IntrusiveRefCntPtr<vfs::FileSystem> FS;
};
diff --git a/llvm/lib/Transforms/Instrumentation/MemProfUse.cpp b/llvm/lib/Transforms/Instrumentation/MemProfUse.cpp
index a9a0731f16d90..3c1e41f8eb374 100644
--- a/llvm/lib/Transforms/Instrumentation/MemProfUse.cpp
+++ b/llvm/lib/Transforms/Instrumentation/MemProfUse.cpp
@@ -22,6 +22,7 @@
#include "llvm/IR/Function.h"
#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Module.h"
+#include "llvm/ProfileData/DataAccessProf.h"
#include "llvm/ProfileData/InstrProf.h"
#include "llvm/ProfileData/InstrProfReader.h"
#include "llvm/ProfileData/MemProfCommon.h"
@@ -75,6 +76,17 @@ static cl::opt<unsigned> MinMatchedColdBytePercent(
"memprof-matching-cold-threshold", cl::init(100), cl::Hidden,
cl::desc("Min percent of cold bytes matched to hint allocation cold"));
+static cl::opt<bool> AnnotationStaticDataPrefix(
+ "annotate-static-data-prefix", cl::init(false), cl::Hidden,
+ cl::desc("If true, annotate the static data section prefix"));
+
+static cl::opt<bool>
+ PrintStaticDataPrefix("print-static-data-prefix", cl::init(false),
+ cl::Hidden,
+ cl::desc("If true, print the static data section "
+ "prefix in errs(). This option is "
+ "meant for debugging."));
+
// Matching statistics
STATISTIC(NumOfMemProfMissing, "Number of functions without memory profile.");
STATISTIC(NumOfMemProfMismatch,
@@ -674,8 +686,9 @@ MemProfUsePass::MemProfUsePass(std::string MemoryProfileFile,
}
PreservedAnalyses MemProfUsePass::run(Module &M, ModuleAnalysisManager &AM) {
- // Return immediately if the module doesn't contain any function.
- if (M.empty())
+ // Return immediately if the module doesn't contain any function or global
+ // variables.
+ if (M.empty() && M.globals().empty())
return PreservedAnalyses::all();
LLVM_DEBUG(dbgs() << "Read in memory profile:");
@@ -703,6 +716,14 @@ PreservedAnalyses MemProfUsePass::run(Module &M, ModuleAnalysisManager &AM) {
return PreservedAnalyses::all();
}
+ const bool Changed =
+ annotateGlobalVariables(M, MemProfReader->getDataAccessProfileData());
+
+ // If the module doesn't contain any function, return after we process all
+ // global variables.
+ if (M.empty())
+ return Changed ? PreservedAnalyses::none() : PreservedAnalyses::all();
+
auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
TargetLibraryInfo &TLI = FAM.getResult<TargetLibraryAnalysis>(*M.begin());
@@ -752,3 +773,58 @@ PreservedAnalyses MemProfUsePass::run(Module &M, ModuleAnalysisManager &AM) {
return PreservedAnalyses::none();
}
+
+bool MemProfUsePass::annotateGlobalVariables(
+ Module &M, const memprof::DataAccessProfData *DataAccessProf) {
+ if (!AnnotationStaticDataPrefix || M.globals().empty() || !DataAccessProf)
+ return false;
+
+ bool Changed = false;
+ for (GlobalVariable &GVar : M.globals()) {
+ assert(!GVar.getSectionPrefix().has_value() &&
+ "GVar shouldn't have section prefix yet");
+ if (GVar.isDeclarationForLinker())
+ continue;
+
+ StringRef Name = GVar.getName();
+ // Skip string literals whose mangled names doesn't stay stable across
+ // binary releases.
+ // TODO: Track string content hash in the profiles and compute it inside the
+ // compiler to categeorize the hotness string literals.
+ if (Name.starts_with(".str"))
+ continue;
+
+ // DataAccessProfRecord's look-up methods will canonicalize the variable
+ // name before looking up methods, so optimizer doesn't need to do it.
+ std::optional<DataAccessProfRecord> Record =
+ DataAccessProf->getProfileRecord(Name);
+ // Annotate a global variable as hot if it has non-zero sampled count, and
+ // annotate it as cold if it's seen in the profiled binary
+ // file but doesn't have any access sample.
+ if (Record && Record->AccessCount > 0) {
+ GVar.setSectionPrefix("hot");
+ Changed = true;
+ } else if (DataAccessProf->isKnownColdSymbol(Name)) {
+ GVar.setSectionPrefix("unlikely");
+ Changed = true;
+ }
+ }
+
+ // Optimization remark emitter requires a llvm::Function, but it's not well
+ // defined to associate a global variable with a function. So we just print
+ // out the static data section prefix in errs().
+ if (PrintStaticDataPrefix) {
+ for (GlobalVariable &GVar : M.globals()) {
+ if (GVar.isDeclarationForLinker())
+ continue;
+ StringRef Name = GVar.getName();
+ auto SectionPrefix = GVar.getSectionPrefix();
+ if (SectionPrefix.has_value())
+ errs() << "Global variable " << Name
+ << " has section prefix: " << SectionPrefix.value() << "\n";
+ else
+ errs() << "Global variable " << Name << " has no section prefix\n";
+ }
+ }
+ return Changed;
+}
diff --git a/llvm/test/Transforms/PGOProfile/data-access-profile.ll b/llvm/test/Transforms/PGOProfile/data-access-profile.ll
new file mode 100644
index 0000000000000..91eaa934374b3
--- /dev/null
+++ b/llvm/test/Transforms/PGOProfile/data-access-profile.ll
@@ -0,0 +1,87 @@
+; RUN: rm -rf %t && split-file %s %t && cd %t
+
+;; Read a text profile and merge it into indexed profile.
+; RUN: llvm-profdata merge --memprof-version=4 memprof.yaml -o memprof.profdata
+
+;; Run optimizer pass on the IR, and check the section prefix.
+; RUN: opt -passes='memprof-use<profile-filename=memprof.profdata>' -annotate-static-data-prefix \
+; RUN: -S input.ll -o - 2>&1 | FileCheck %s
+
+;; Repeat the command line above and enable -print-static-data-prefix. Test both IR and log.
+; RUN: opt -passes='memprof-use<profile-filename=memprof.profdata>' -annotate-static-data-prefix \
+; RUN: -print-static-data-prefix -S input.ll -o - 2>&1 | FileCheck %s --check-prefixes=LOG,CHECK
+
+; LOG: Global variable .str has no section prefix
+; LOG: Global variable var1 has section prefix: hot
+; LOG: Global variable var2.llvm.125 has section prefix: hot
+; LOG: Global variable foo has section prefix: unlikely
+; LOG: Global variable bar has no section prefix
+
+;; String literals are not annotated.
+; CHECK: @.str = unnamed_addr constant [5 x i8] c"abcde"
+; CHECK-NOT: section_prefix
+; CHECK: @var1 = global i32 123, !section_prefix !0
+
+;; @var.llvm.125 will be canonicalized to @var2 for profile look-up.
+; CHECK-NEXT: @var2.llvm.125 = global i64 0, !section_prefix !0
+; CHECK-NEXT: @foo = global i8 2, !section_prefix !1
+
+;; @bar is not seen in hot symbol or known symbol set, so it doesn't get
+;; a section prefix. It's up to the linker to decide how to map input sections
+;; to output, and one conservative practice is to map unlikely-prefixed ones to
+;; unlikely output section, and map the rest (hot-prefixed or prefix-less) to
+;; the canonical output section.
+; CHECK-NEXT: @bar = global i16 3
+
+; CHECK: !0 = !{!"section_prefix", !"hot"}
+; CHECK-NEXT: !1 = !{!"section_prefix", !"unlikely"}
+
+;--- memprof.yaml
+---
+HeapProfileRecords:
+ - GUID: 0xdeadbeef12345678
+ AllocSites:
+ - Callstack:
+ - { Function: 0x1111111111111111, LineOffset: 11, Column: 10, IsInlineFrame: true }
+ - { Function: 0x2222222222222222, LineOffset: 22, Column: 20, IsInlineFrame: false }
+ MemInfoBlock:
+ AllocCount: 111
+ TotalSize: 222
+ TotalLifetime: 333
+ TotalLifetimeAccessDensity: 444
+ CallSites:
+ - Frames:
+ - { Function: 0x5555555555555555, LineOffset: 55, Column: 50, IsInlineFrame: true }
+ - { Function: 0x6666666666666666, LineOffset: 66, Column: 60, IsInlineFrame: false }
+ CalleeGuids: [ 0x100, 0x200 ]
+DataAccessProfiles:
+ SampledRecords:
+ - Symbol: var1
+ AccessCount: 1000
+ - Symbol: var2
+ AccessCount: 5
+ - Hash: 101010
+ AccessCount: 145
+ KnownColdSymbols:
+ - foo
+ KnownColdStrHashes: [ 999, 1001 ]
+...
+;--- input.ll
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+@.str = unnamed_addr constant [5 x i8] c"abcde"
+@var1 = global i32 123
+@var2.llvm.125 = global i64 0
+@foo = global i8 2
+@bar = global i16 3
+
+define i32 @func() {
+ %a = load i32, ptr @var1
+ %b = load i32, ptr @var2.llvm.125
+ %ret = call i32 (...) @func_taking_arbitrary_param(i32 %a, i32 %b)
+ ret i32 %ret
+}
+
+declare i32 @func_taking_arbitrary_param(...)
|
@@ -729,6 +729,11 @@ class IndexedMemProfReader { | |||
LLVM_ABI DenseMap<uint64_t, SmallVector<memprof::CallEdgeTy, 0>> | |||
getMemProfCallerCalleePairs() const; | |||
|
|||
// Returns non-owned pointer to data access profile data. | |||
memprof::DataAccessProfData *getDataAccessProfileData() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might need an LLVM_ABI tag like the other interfaces (this is for Windows DLL apparently).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
continue; | ||
|
||
StringRef Name = GVar.getName(); | ||
// Skip string literals whose mangled names doesn't stay stable across |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/doesn't/don't/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also might be clearer to s/whose/as their/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
if (Name.starts_with(".str")) | ||
continue; | ||
|
||
// DataAccessProfRecord's look-up methods will canonicalize the variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment seems a little circular, and as a result I don't know quite what this means: "look-up methods .... before looking up methods".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revised this. PTAL.
StringRef Name = GVar.getName(); | ||
auto SectionPrefix = GVar.getSectionPrefix(); | ||
if (SectionPrefix.has_value()) | ||
errs() << "Global variable " << Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just emit these messages during the above loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. Also incorporated Snehasish's feedback above to use LLVM_DEBUG for both strings and the rest.
;; to output, and one conservative practice is to map unlikely-prefixed ones to | ||
;; unlikely output section, and map the rest (hot-prefixed or prefix-less) to | ||
;; the canonical output section. | ||
; CHECK-NEXT: @bar = global i16 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need a check for EOL to ensure there isn't a section prefix here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done by testing that there is no section_prefix between @bar
and the object after it (i.e., @foo
in the updated test)
|
||
;; @bar is not seen in hot symbol or known symbol set, so it doesn't get | ||
;; a section prefix. It's up to the linker to decide how to map input sections | ||
;; to output, and one conservative practice is to map unlikely-prefixed ones to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This second sentence seems like it belongs in the code and not in the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's much more visible to document such considerations as code comment, and I wouldn't try find it in a regression test myself..
Moved it to cpp.
--- | ||
HeapProfileRecords: | ||
- GUID: 0xdeadbeef12345678 | ||
AllocSites: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the AllocSites and CallSites needed for this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically yaml parser implementation requires HeapProfileRecords
currently by mapRequired
at
llvm-project/llvm/include/llvm/ProfileData/MemProfYAML.h
Lines 264 to 272 in e3823a6
template <> struct MappingTraits<memprof::AllMemProfData> { | |
static void mapping(IO &Io, memprof::AllMemProfData &Data) { | |
Io.mapRequired("HeapProfileRecords", Data.HeapProfileRecords); | |
// Map data access profiles if reading input, or if writing output && | |
// the struct is populated. | |
if (!Io.outputting() || !Data.YamlifiedDataAccessProfiles.isEmpty()) | |
Io.mapOptional("DataAccessProfiles", Data.YamlifiedDataAccessProfiles); | |
} | |
}; |
Io.mapOptional("DataAccessProfiles", ...
at L270 as a sibling field.
I wonder if we want to make both fields optional at this point if the regression tests are happy. Also cc @kazutakahirata for some thoughts and opinions on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes ideally that could be changed. Also it looks like both the AllocSites and CallSites are required, which isn't necessarily true in a real profile for each function. For now you could trim it a bit by only having one frame for each. Also add a comment that this is currently required by the yaml reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. I trimmed the heap profiles to preserve one entry for the required fields and subfields. PTAL.
cc @kazutakahirata I want to make sure I don't miss some considerations about mapRequired
, but otherwise I'd be glad to make the change from mapRequired
to mapOptional
. Just lmk ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to mapOptional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mingmingl-llvm I agree with @teresajohnson. I've posted #155671. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for putting together #155671!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mingmingl-llvm I just merged #155671. You should be able to drop CallSites
and other heap-profile-specific items from your tests. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! Now the text profile is trimmed now.
; RUN: llvm-profdata merge --memprof-version=4 memprof.yaml -o memprof.profdata | ||
|
||
;; Run optimizer pass on the IR, and check the section prefix. | ||
; RUN: opt -passes='memprof-use<profile-filename=memprof.profdata>' -annotate-static-data-prefix \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a run to ensure that without this option (or with it explicitly off) we don't get the prefixes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a RUN line below with --implicit-check-not="section_prefix"
, also updated the 'CHECK' (default prefix) to 'PREFIX' (a custom one). This way, it's very explicit that the added RUN line won't match 'PREFIX' lines.
; RUN: opt -passes='memprof-use<profile-filename=memprof.profdata>' -annotate-static-data-prefix \ | ||
; RUN: -S input.ll -o - 2>&1 | FileCheck %s | ||
|
||
;; Repeat the command line above and enable -print-static-data-prefix. Test both IR and log. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably can combine this with the above opt invocation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, and added REQUIRES: asserts
to check LLVM_DEBUG lines.
@@ -75,6 +76,17 @@ static cl::opt<unsigned> MinMatchedColdBytePercent( | |||
"memprof-matching-cold-threshold", cl::init(100), cl::Hidden, | |||
cl::desc("Min percent of cold bytes matched to hint allocation cold")); | |||
|
|||
static cl::opt<bool> AnnotationStaticDataPrefix( | |||
"annotate-static-data-prefix", cl::init(false), cl::Hidden, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a "memprof-" prefix to the option name to be consistent with the others.
Also I think we can make this default true since we can also control whether the static data profile is generated as part of the memprof profile generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the option prefix.
If we make the default true, the DiagnosticInfoPGOProfile
warning (which also makes sense) will show up in the build logs unless static data profile is populated. This may also confuse the compiler users. From this perspective, I wonder if any of the following two options might be preferred
- keep the default option value false and update [Clang] Add clang driver option -fpartition-static-data-sections #124991 to set this LLVM option to true
- keep the default option value true, and do something like
if (!AnnotateStaticDataSectionPrefix || M.globals().empty())
return false;
// Check NumOccurrences before emit warning since the option
// value is true by default.
if (!DataAccessProf && AnnotateStaticDataSectionPrefix.getNumOccurrences()) {
M.getContext().diagnose(DiagnosticInfoPGOProfile(
MemoryProfileFileName.data(),
StringRef("Data access profiles not found in memprof. Ignore "
"-memprof-annotate-static-data-prefix."),
DS_Warning));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the default should be true with that warning, until and unless we expect to always have that profile section.
|
||
bool MemProfUsePass::annotateGlobalVariables( | ||
Module &M, const memprof::DataAccessProfData *DataAccessProf) { | ||
if (!AnnotationStaticDataPrefix || M.globals().empty() || !DataAccessProf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we emit a warning if the user wanted to annotate the static data but the memprof didn't have the info? It would catch mismatched expectations during a build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. We can follow up in https://github.com/llvm/llvm-project/pull/151238/files#r2298880473 on whether we also want to check option's NumOccurrences
.
|
||
bool Changed = false; | ||
for (GlobalVariable &GVar : M.globals()) { | ||
assert(!GVar.getSectionPrefix().has_value() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen here for globals that the user has explicitly assigned a section using e.g. https://clang.llvm.org/docs/AttributeReference.html#section-declspec-allocate?
I'd like to see a test for this attribute too to ensure we don't break user annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen here for globals that the user has explicitly assigned a section using e.g. https://clang.llvm.org/docs/AttributeReference.html#section-declspec-allocate
Basically, with profile generator's filtering and linker's mapping, the chance of partitioning custom-section global variables inadvertently should be very minimal, as described below. But I agree it's good to not annotate them in the first place, so implemented that.
-
From profile generator perspective, it will only retrieve symbols and their sample counts from the 4 x 2 relevant static data sections, and it will not read symbols from custom sections. This way, data access profile payload won't have profiles for custom section global variables -- however, it's possible that internal-linkage global variables have the same name, and intermediate object files (not the executable) sees custom sections with prefix.
-
From linker's input -> output section mapping perspective, the
keep-data-section-prefix
option targets the section with one of known prefix ([lld][ELF] Introduce an option to keep data section prefix. #148985).
I'd like to see a test for this attribute too to ensure we don't break user annotations.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
291bae7 implements the change and test.
https://gcc.godbolt.org/z/3c59K3rqM is a small C++ example to illustrate how source code section names (section attribute and #pragma clang section
directive) affects section name in asm. From the asm output, we can see that a
and b
has sec1
and sec2
respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for feedback! PTAL.
@@ -729,6 +729,11 @@ class IndexedMemProfReader { | |||
LLVM_ABI DenseMap<uint64_t, SmallVector<memprof::CallEdgeTy, 0>> | |||
getMemProfCallerCalleePairs() const; | |||
|
|||
// Returns non-owned pointer to data access profile data. | |||
memprof::DataAccessProfData *getDataAccessProfileData() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -75,6 +76,17 @@ static cl::opt<unsigned> MinMatchedColdBytePercent( | |||
"memprof-matching-cold-threshold", cl::init(100), cl::Hidden, | |||
cl::desc("Min percent of cold bytes matched to hint allocation cold")); | |||
|
|||
static cl::opt<bool> AnnotationStaticDataPrefix( | |||
"annotate-static-data-prefix", cl::init(false), cl::Hidden, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the option prefix.
If we make the default true, the DiagnosticInfoPGOProfile
warning (which also makes sense) will show up in the build logs unless static data profile is populated. This may also confuse the compiler users. From this perspective, I wonder if any of the following two options might be preferred
- keep the default option value false and update [Clang] Add clang driver option -fpartition-static-data-sections #124991 to set this LLVM option to true
- keep the default option value true, and do something like
if (!AnnotateStaticDataSectionPrefix || M.globals().empty())
return false;
// Check NumOccurrences before emit warning since the option
// value is true by default.
if (!DataAccessProf && AnnotateStaticDataSectionPrefix.getNumOccurrences()) {
M.getContext().diagnose(DiagnosticInfoPGOProfile(
MemoryProfileFileName.data(),
StringRef("Data access profiles not found in memprof. Ignore "
"-memprof-annotate-static-data-prefix."),
DS_Warning));
}
StringRef Name = GVar.getName(); | ||
auto SectionPrefix = GVar.getSectionPrefix(); | ||
if (SectionPrefix.has_value()) | ||
errs() << "Global variable " << Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. Also incorporated Snehasish's feedback above to use LLVM_DEBUG for both strings and the rest.
; RUN: llvm-profdata merge --memprof-version=4 memprof.yaml -o memprof.profdata | ||
|
||
;; Run optimizer pass on the IR, and check the section prefix. | ||
; RUN: opt -passes='memprof-use<profile-filename=memprof.profdata>' -annotate-static-data-prefix \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a RUN line below with --implicit-check-not="section_prefix"
, also updated the 'CHECK' (default prefix) to 'PREFIX' (a custom one). This way, it's very explicit that the added RUN line won't match 'PREFIX' lines.
; RUN: opt -passes='memprof-use<profile-filename=memprof.profdata>' -annotate-static-data-prefix \ | ||
; RUN: -S input.ll -o - 2>&1 | FileCheck %s | ||
|
||
;; Repeat the command line above and enable -print-static-data-prefix. Test both IR and log. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, and added REQUIRES: asserts
to check LLVM_DEBUG lines.
;; to output, and one conservative practice is to map unlikely-prefixed ones to | ||
;; unlikely output section, and map the rest (hot-prefixed or prefix-less) to | ||
;; the canonical output section. | ||
; CHECK-NEXT: @bar = global i16 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done by testing that there is no section_prefix between @bar
and the object after it (i.e., @foo
in the updated test)
|
||
;; @bar is not seen in hot symbol or known symbol set, so it doesn't get | ||
;; a section prefix. It's up to the linker to decide how to map input sections | ||
;; to output, and one conservative practice is to map unlikely-prefixed ones to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's much more visible to document such considerations as code comment, and I wouldn't try find it in a regression test myself..
Moved it to cpp.
// file but doesn't have any access sample. | ||
// For logging, optimization remark emitter requires a llvm::Function, but | ||
// it's not well defined how to associate a global variable with a function. | ||
// So we just print out the static data section prefix in LLVM_DEBUG. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only downside with that is the need for a debug built compiler to get any information out. Maybe at least add STATISTICs to enable getting a count of hot and unlikely annotated gvs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm onboard with adding counters so implemented it; though if I remember correctly stats are not always enabled in a release build compiler, and there is a DLLVM_ENABLE_STATS
flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, shoot, well the stats are helpful in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a statistic sounds good to me too.
--- | ||
HeapProfileRecords: | ||
- GUID: 0xdeadbeef12345678 | ||
AllocSites: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes ideally that could be changed. Also it looks like both the AllocSites and CallSites are required, which isn't necessarily true in a real profile for each function. For now you could trim it a bit by only having one frame for each. Also add a comment that this is currently required by the yaml reader.
@@ -75,6 +76,17 @@ static cl::opt<unsigned> MinMatchedColdBytePercent( | |||
"memprof-matching-cold-threshold", cl::init(100), cl::Hidden, | |||
cl::desc("Min percent of cold bytes matched to hint allocation cold")); | |||
|
|||
static cl::opt<bool> AnnotationStaticDataPrefix( | |||
"annotate-static-data-prefix", cl::init(false), cl::Hidden, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the default should be true with that warning, until and unless we expect to always have that profile section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for reviews!
--- | ||
HeapProfileRecords: | ||
- GUID: 0xdeadbeef12345678 | ||
AllocSites: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. I trimmed the heap profiles to preserve one entry for the required fields and subfields. PTAL.
cc @kazutakahirata I want to make sure I don't miss some considerations about mapRequired
, but otherwise I'd be glad to make the change from mapRequired
to mapOptional
. Just lmk ;)
// file but doesn't have any access sample. | ||
// For logging, optimization remark emitter requires a llvm::Function, but | ||
// it's not well defined how to associate a global variable with a function. | ||
// So we just print out the static data section prefix in LLVM_DEBUG. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm onboard with adding counters so implemented it; though if I remember correctly stats are not always enabled in a release build compiler, and there is a DLLVM_ENABLE_STATS
flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
// file but doesn't have any access sample. | ||
// For logging, optimization remark emitter requires a llvm::Function, but | ||
// it's not well defined how to associate a global variable with a function. | ||
// So we just print out the static data section prefix in LLVM_DEBUG. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a statistic sounds good to me too.
--- | ||
HeapProfileRecords: | ||
- GUID: 0xdeadbeef12345678 | ||
AllocSites: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to mapOptional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change lgtm except I think the option should be off by default if this will cause spurious diagnostics to be generated about missing data access profiles
// file but doesn't have any access sample. | ||
// For logging, optimization remark emitter requires a llvm::Function, but | ||
// it's not well defined how to associate a global variable with a function. | ||
// So we just print out the static data section prefix in LLVM_DEBUG. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, shoot, well the stats are helpful in any case.
…rely on default value
done. And update one RUN line to rely on the default value being false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trimmed the text profile input.
--- | ||
HeapProfileRecords: | ||
- GUID: 0xdeadbeef12345678 | ||
AllocSites: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! Now the text profile is trimmed now.
f3f2832 introduces the data access profile format as a payload inside memprof, and the MemProfUse pass reads the memprof payload.
This change extends the MemProfUse pass to read the data access profiles to annotate global variables' section prefix.
Introduce an option
annotate-static-data-prefix
to flag-gate the global-variable annotation path, and make it false by default. #155337 is the (WIP) draft change to "reconcile" two sources of hotness.