Skip to content

Conversation

mingmingl-llvm
Copy link
Contributor

@mingmingl-llvm mingmingl-llvm commented Jul 29, 2025

f3f2832 introduces the data access profile format as a payload inside memprof, and the MemProfUse pass reads the memprof payload.

This change extends the MemProfUse pass to read the data access profiles to annotate global variables' section prefix.

  1. If there are samples for a global variable, it's annotated as hot.
  2. If a global variable is seen in the profiled binary file but doesn't have access samples, it's annotated as unlikely.

Introduce an option annotate-static-data-prefix to flag-gate the global-variable annotation path, and make it false by default. #155337 is the (WIP) draft change to "reconcile" two sources of hotness.

@mingmingl-llvm mingmingl-llvm force-pushed the users/mingmingl-llvm/memprofuse branch from 57068ea to 9d475d4 Compare July 29, 2025 21:54
@mingmingl-llvm mingmingl-llvm force-pushed the users/mingmingl-llvm/memprofuse branch from 9d475d4 to 925e9fd Compare August 20, 2025 16:52
@mingmingl-llvm mingmingl-llvm force-pushed the users/mingmingl-llvm/memprofuse branch from 925e9fd to 6311383 Compare August 25, 2025 07:39
@mingmingl-llvm mingmingl-llvm marked this pull request as ready for review August 25, 2025 15:32
@llvmbot llvmbot added PGO Profile Guided Optimizations llvm:transforms labels Aug 25, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 25, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-pgo

Author: Mingming Liu (mingmingl-llvm)

Changes

f3f2832 introduces the data access profile format as a payload inside memprof, and the MemProfUse pass reads the memprof payload.

This change extends the MemProfUse pass to read the data access profiles to annotate global variables' section prefix.

  1. If there are samples for a global variable, it's annotated as hot.
  2. If a global variable is seen in the profiled binary file but doesn't have access samples, it's annotated as unlikely.

Full diff: https://github.com/llvm/llvm-project/pull/151238.diff

4 Files Affected:

  • (modified) llvm/include/llvm/ProfileData/InstrProfReader.h (+11)
  • (modified) llvm/include/llvm/Transforms/Instrumentation/MemProfUse.h (+5)
  • (modified) llvm/lib/Transforms/Instrumentation/MemProfUse.cpp (+78-2)
  • (added) llvm/test/Transforms/PGOProfile/data-access-profile.ll (+87)
diff --git a/llvm/include/llvm/ProfileData/InstrProfReader.h b/llvm/include/llvm/ProfileData/InstrProfReader.h
index deb5cd17d8fd9..bccbc2006b898 100644
--- a/llvm/include/llvm/ProfileData/InstrProfReader.h
+++ b/llvm/include/llvm/ProfileData/InstrProfReader.h
@@ -729,6 +729,11 @@ class IndexedMemProfReader {
   LLVM_ABI DenseMap<uint64_t, SmallVector<memprof::CallEdgeTy, 0>>
   getMemProfCallerCalleePairs() const;
 
+  // Returns non-owned pointer to data access profile data.
+  memprof::DataAccessProfData *getDataAccessProfileData() const {
+    return DataAccessProfileData.get();
+  }
+
   // Return the entire MemProf profile.
   LLVM_ABI memprof::AllMemProfData getAllMemProfData() const;
 
@@ -900,6 +905,12 @@ class LLVM_ABI IndexedInstrProfReader : public InstrProfReader {
     return MemProfReader.getSummary();
   }
 
+  /// Returns non-owned pointer to the data access profile data.
+  /// Will be null if unavailable (version < 4).
+  memprof::DataAccessProfData *getDataAccessProfileData() const {
+    return MemProfReader.getDataAccessProfileData();
+  }
+
   Error readBinaryIds(std::vector<llvm::object::BuildID> &BinaryIds) override;
   Error printBinaryIds(raw_ostream &OS) override;
 };
diff --git a/llvm/include/llvm/Transforms/Instrumentation/MemProfUse.h b/llvm/include/llvm/Transforms/Instrumentation/MemProfUse.h
index 6170bf48e4695..c11333bf8ce5b 100644
--- a/llvm/include/llvm/Transforms/Instrumentation/MemProfUse.h
+++ b/llvm/include/llvm/Transforms/Instrumentation/MemProfUse.h
@@ -14,6 +14,7 @@
 
 #include "llvm/ADT/IntrusiveRefCntPtr.h"
 #include "llvm/IR/PassManager.h"
+#include "llvm/ProfileData/DataAccessProf.h"
 #include "llvm/ProfileData/MemProf.h"
 #include "llvm/Support/Compiler.h"
 
@@ -36,6 +37,10 @@ class MemProfUsePass : public PassInfoMixin<MemProfUsePass> {
   LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
 
 private:
+  // Annotate global variables' section prefix based on data access profile.
+  bool
+  annotateGlobalVariables(Module &M,
+                          const memprof::DataAccessProfData *DataAccessProf);
   std::string MemoryProfileFileName;
   IntrusiveRefCntPtr<vfs::FileSystem> FS;
 };
diff --git a/llvm/lib/Transforms/Instrumentation/MemProfUse.cpp b/llvm/lib/Transforms/Instrumentation/MemProfUse.cpp
index a9a0731f16d90..3c1e41f8eb374 100644
--- a/llvm/lib/Transforms/Instrumentation/MemProfUse.cpp
+++ b/llvm/lib/Transforms/Instrumentation/MemProfUse.cpp
@@ -22,6 +22,7 @@
 #include "llvm/IR/Function.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/Module.h"
+#include "llvm/ProfileData/DataAccessProf.h"
 #include "llvm/ProfileData/InstrProf.h"
 #include "llvm/ProfileData/InstrProfReader.h"
 #include "llvm/ProfileData/MemProfCommon.h"
@@ -75,6 +76,17 @@ static cl::opt<unsigned> MinMatchedColdBytePercent(
     "memprof-matching-cold-threshold", cl::init(100), cl::Hidden,
     cl::desc("Min percent of cold bytes matched to hint allocation cold"));
 
+static cl::opt<bool> AnnotationStaticDataPrefix(
+    "annotate-static-data-prefix", cl::init(false), cl::Hidden,
+    cl::desc("If true, annotate the static data section prefix"));
+
+static cl::opt<bool>
+    PrintStaticDataPrefix("print-static-data-prefix", cl::init(false),
+                          cl::Hidden,
+                          cl::desc("If true, print the static data section "
+                                   "prefix in errs(). This option is "
+                                   "meant for debugging."));
+
 // Matching statistics
 STATISTIC(NumOfMemProfMissing, "Number of functions without memory profile.");
 STATISTIC(NumOfMemProfMismatch,
@@ -674,8 +686,9 @@ MemProfUsePass::MemProfUsePass(std::string MemoryProfileFile,
 }
 
 PreservedAnalyses MemProfUsePass::run(Module &M, ModuleAnalysisManager &AM) {
-  // Return immediately if the module doesn't contain any function.
-  if (M.empty())
+  // Return immediately if the module doesn't contain any function or global
+  // variables.
+  if (M.empty() && M.globals().empty())
     return PreservedAnalyses::all();
 
   LLVM_DEBUG(dbgs() << "Read in memory profile:");
@@ -703,6 +716,14 @@ PreservedAnalyses MemProfUsePass::run(Module &M, ModuleAnalysisManager &AM) {
     return PreservedAnalyses::all();
   }
 
+  const bool Changed =
+      annotateGlobalVariables(M, MemProfReader->getDataAccessProfileData());
+
+  // If the module doesn't contain any function, return after we process all
+  // global variables.
+  if (M.empty())
+    return Changed ? PreservedAnalyses::none() : PreservedAnalyses::all();
+
   auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
 
   TargetLibraryInfo &TLI = FAM.getResult<TargetLibraryAnalysis>(*M.begin());
@@ -752,3 +773,58 @@ PreservedAnalyses MemProfUsePass::run(Module &M, ModuleAnalysisManager &AM) {
 
   return PreservedAnalyses::none();
 }
+
+bool MemProfUsePass::annotateGlobalVariables(
+    Module &M, const memprof::DataAccessProfData *DataAccessProf) {
+  if (!AnnotationStaticDataPrefix || M.globals().empty() || !DataAccessProf)
+    return false;
+
+  bool Changed = false;
+  for (GlobalVariable &GVar : M.globals()) {
+    assert(!GVar.getSectionPrefix().has_value() &&
+           "GVar shouldn't have section prefix yet");
+    if (GVar.isDeclarationForLinker())
+      continue;
+
+    StringRef Name = GVar.getName();
+    // Skip string literals whose mangled names doesn't stay stable across
+    // binary releases.
+    // TODO: Track string content hash in the profiles and compute it inside the
+    // compiler to categeorize the hotness string literals.
+    if (Name.starts_with(".str"))
+      continue;
+
+    // DataAccessProfRecord's look-up methods will canonicalize the variable
+    // name before looking up methods, so optimizer doesn't need to do it.
+    std::optional<DataAccessProfRecord> Record =
+        DataAccessProf->getProfileRecord(Name);
+    // Annotate a global variable as hot if it has non-zero sampled count, and
+    // annotate it as cold if it's seen in the profiled binary
+    // file but doesn't have any access sample.
+    if (Record && Record->AccessCount > 0) {
+      GVar.setSectionPrefix("hot");
+      Changed = true;
+    } else if (DataAccessProf->isKnownColdSymbol(Name)) {
+      GVar.setSectionPrefix("unlikely");
+      Changed = true;
+    }
+  }
+
+  // Optimization remark emitter requires a llvm::Function, but it's not well
+  // defined to associate a global variable with a function. So we just print
+  // out the static data section prefix in errs().
+  if (PrintStaticDataPrefix) {
+    for (GlobalVariable &GVar : M.globals()) {
+      if (GVar.isDeclarationForLinker())
+        continue;
+      StringRef Name = GVar.getName();
+      auto SectionPrefix = GVar.getSectionPrefix();
+      if (SectionPrefix.has_value())
+        errs() << "Global variable " << Name
+               << " has section prefix: " << SectionPrefix.value() << "\n";
+      else
+        errs() << "Global variable " << Name << " has no section prefix\n";
+    }
+  }
+  return Changed;
+}
diff --git a/llvm/test/Transforms/PGOProfile/data-access-profile.ll b/llvm/test/Transforms/PGOProfile/data-access-profile.ll
new file mode 100644
index 0000000000000..91eaa934374b3
--- /dev/null
+++ b/llvm/test/Transforms/PGOProfile/data-access-profile.ll
@@ -0,0 +1,87 @@
+; RUN: rm -rf %t && split-file %s %t && cd %t
+
+;; Read a text profile and merge it into indexed profile.
+; RUN: llvm-profdata merge --memprof-version=4 memprof.yaml -o memprof.profdata
+
+;; Run optimizer pass on the IR, and check the section prefix.
+; RUN: opt -passes='memprof-use<profile-filename=memprof.profdata>' -annotate-static-data-prefix \
+; RUN: -S input.ll -o - 2>&1 | FileCheck %s
+
+;; Repeat the command line above and enable -print-static-data-prefix. Test both IR and log.
+; RUN: opt -passes='memprof-use<profile-filename=memprof.profdata>' -annotate-static-data-prefix \
+; RUN: -print-static-data-prefix -S input.ll -o - 2>&1 | FileCheck %s --check-prefixes=LOG,CHECK
+
+; LOG: Global variable .str has no section prefix
+; LOG: Global variable var1 has section prefix: hot
+; LOG: Global variable var2.llvm.125 has section prefix: hot
+; LOG: Global variable foo has section prefix: unlikely
+; LOG: Global variable bar has no section prefix
+
+;; String literals are not annotated.
+; CHECK: @.str = unnamed_addr constant [5 x i8] c"abcde"
+; CHECK-NOT: section_prefix
+; CHECK: @var1 = global i32 123, !section_prefix !0
+
+;; @var.llvm.125 will be canonicalized to @var2 for profile look-up.
+; CHECK-NEXT: @var2.llvm.125 = global i64 0, !section_prefix !0
+; CHECK-NEXT: @foo = global i8 2, !section_prefix !1
+
+;; @bar is not seen in hot symbol or known symbol set, so it doesn't get
+;; a section prefix. It's up to the linker to decide how to map input sections
+;; to output, and one conservative practice is to map unlikely-prefixed ones to
+;; unlikely output section, and map the rest (hot-prefixed or prefix-less) to
+;; the canonical output section.
+; CHECK-NEXT: @bar = global i16 3
+
+; CHECK: !0 = !{!"section_prefix", !"hot"}
+; CHECK-NEXT: !1 = !{!"section_prefix", !"unlikely"}
+
+;--- memprof.yaml
+---
+HeapProfileRecords:
+  - GUID:            0xdeadbeef12345678
+    AllocSites:
+      - Callstack:
+          - { Function: 0x1111111111111111, LineOffset: 11, Column: 10, IsInlineFrame: true }
+          - { Function: 0x2222222222222222, LineOffset: 22, Column: 20, IsInlineFrame: false }
+        MemInfoBlock:
+          AllocCount:      111
+          TotalSize:       222
+          TotalLifetime:   333
+          TotalLifetimeAccessDensity: 444
+    CallSites:
+      - Frames:
+        - { Function: 0x5555555555555555, LineOffset: 55, Column: 50, IsInlineFrame: true }
+        - { Function: 0x6666666666666666, LineOffset: 66, Column: 60, IsInlineFrame: false }
+        CalleeGuids: [ 0x100, 0x200 ]
+DataAccessProfiles:
+  SampledRecords:
+    - Symbol:          var1
+      AccessCount:     1000
+    - Symbol:          var2
+      AccessCount:     5
+    - Hash:            101010
+      AccessCount:     145
+  KnownColdSymbols:
+    - foo
+  KnownColdStrHashes: [ 999, 1001 ]
+...
+;--- input.ll
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+@.str = unnamed_addr constant [5 x i8] c"abcde"
+@var1 = global i32 123
+@var2.llvm.125 = global i64 0 
+@foo = global i8 2
+@bar = global i16 3
+
+define i32 @func() {
+  %a = load i32, ptr @var1
+  %b = load i32, ptr @var2.llvm.125
+  %ret = call i32 (...) @func_taking_arbitrary_param(i32 %a, i32 %b)
+  ret i32 %ret
+}
+
+declare i32 @func_taking_arbitrary_param(...)

@@ -729,6 +729,11 @@ class IndexedMemProfReader {
LLVM_ABI DenseMap<uint64_t, SmallVector<memprof::CallEdgeTy, 0>>
getMemProfCallerCalleePairs() const;

// Returns non-owned pointer to data access profile data.
memprof::DataAccessProfData *getDataAccessProfileData() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need an LLVM_ABI tag like the other interfaces (this is for Windows DLL apparently).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

continue;

StringRef Name = GVar.getName();
// Skip string literals whose mangled names doesn't stay stable across
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/doesn't/don't/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also might be clearer to s/whose/as their/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

if (Name.starts_with(".str"))
continue;

// DataAccessProfRecord's look-up methods will canonicalize the variable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment seems a little circular, and as a result I don't know quite what this means: "look-up methods .... before looking up methods".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revised this. PTAL.

StringRef Name = GVar.getName();
auto SectionPrefix = GVar.getSectionPrefix();
if (SectionPrefix.has_value())
errs() << "Global variable " << Name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just emit these messages during the above loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. Also incorporated Snehasish's feedback above to use LLVM_DEBUG for both strings and the rest.

;; to output, and one conservative practice is to map unlikely-prefixed ones to
;; unlikely output section, and map the rest (hot-prefixed or prefix-less) to
;; the canonical output section.
; CHECK-NEXT: @bar = global i16 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need a check for EOL to ensure there isn't a section prefix here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done by testing that there is no section_prefix between @bar and the object after it (i.e., @foo in the updated test)


;; @bar is not seen in hot symbol or known symbol set, so it doesn't get
;; a section prefix. It's up to the linker to decide how to map input sections
;; to output, and one conservative practice is to map unlikely-prefixed ones to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This second sentence seems like it belongs in the code and not in the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's much more visible to document such considerations as code comment, and I wouldn't try find it in a regression test myself..
Moved it to cpp.

---
HeapProfileRecords:
- GUID: 0xdeadbeef12345678
AllocSites:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the AllocSites and CallSites needed for this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically yaml parser implementation requires HeapProfileRecords currently by mapRequired at

template <> struct MappingTraits<memprof::AllMemProfData> {
static void mapping(IO &Io, memprof::AllMemProfData &Data) {
Io.mapRequired("HeapProfileRecords", Data.HeapProfileRecords);
// Map data access profiles if reading input, or if writing output &&
// the struct is populated.
if (!Io.outputting() || !Data.YamlifiedDataAccessProfiles.isEmpty())
Io.mapOptional("DataAccessProfiles", Data.YamlifiedDataAccessProfiles);
}
};
, which is implemented before we add Io.mapOptional("DataAccessProfiles", ... at L270 as a sibling field.

I wonder if we want to make both fields optional at this point if the regression tests are happy. Also cc @kazutakahirata for some thoughts and opinions on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ideally that could be changed. Also it looks like both the AllocSites and CallSites are required, which isn't necessarily true in a real profile for each function. For now you could trim it a bit by only having one frame for each. Also add a comment that this is currently required by the yaml reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. I trimmed the heap profiles to preserve one entry for the required fields and subfields. PTAL.

cc @kazutakahirata I want to make sure I don't miss some considerations about mapRequired, but otherwise I'd be glad to make the change from mapRequired to mapOptional. Just lmk ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to mapOptional.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mingmingl-llvm I agree with @teresajohnson. I've posted #155671. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for putting together #155671!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mingmingl-llvm I just merged #155671. You should be able to drop CallSites and other heap-profile-specific items from your tests. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! Now the text profile is trimmed now.

; RUN: llvm-profdata merge --memprof-version=4 memprof.yaml -o memprof.profdata

;; Run optimizer pass on the IR, and check the section prefix.
; RUN: opt -passes='memprof-use<profile-filename=memprof.profdata>' -annotate-static-data-prefix \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a run to ensure that without this option (or with it explicitly off) we don't get the prefixes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a RUN line below with --implicit-check-not="section_prefix", also updated the 'CHECK' (default prefix) to 'PREFIX' (a custom one). This way, it's very explicit that the added RUN line won't match 'PREFIX' lines.

; RUN: opt -passes='memprof-use<profile-filename=memprof.profdata>' -annotate-static-data-prefix \
; RUN: -S input.ll -o - 2>&1 | FileCheck %s

;; Repeat the command line above and enable -print-static-data-prefix. Test both IR and log.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably can combine this with the above opt invocation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, and added REQUIRES: asserts to check LLVM_DEBUG lines.

@@ -75,6 +76,17 @@ static cl::opt<unsigned> MinMatchedColdBytePercent(
"memprof-matching-cold-threshold", cl::init(100), cl::Hidden,
cl::desc("Min percent of cold bytes matched to hint allocation cold"));

static cl::opt<bool> AnnotationStaticDataPrefix(
"annotate-static-data-prefix", cl::init(false), cl::Hidden,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a "memprof-" prefix to the option name to be consistent with the others.

Also I think we can make this default true since we can also control whether the static data profile is generated as part of the memprof profile generation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the option prefix.

If we make the default true, the DiagnosticInfoPGOProfile warning (which also makes sense) will show up in the build logs unless static data profile is populated. This may also confuse the compiler users. From this perspective, I wonder if any of the following two options might be preferred

  1. keep the default option value false and update [Clang] Add clang driver option -fpartition-static-data-sections  #124991 to set this LLVM option to true
  2. keep the default option value true, and do something like
if (!AnnotateStaticDataSectionPrefix || M.globals().empty())
    return false;

// Check NumOccurrences before emit warning since the option
// value is true by default.
  if (!DataAccessProf && AnnotateStaticDataSectionPrefix.getNumOccurrences()) {
    M.getContext().diagnose(DiagnosticInfoPGOProfile(
        MemoryProfileFileName.data(),
        StringRef("Data access profiles not found in memprof. Ignore "
                  "-memprof-annotate-static-data-prefix."),
        DS_Warning));

}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the default should be true with that warning, until and unless we expect to always have that profile section.


bool MemProfUsePass::annotateGlobalVariables(
Module &M, const memprof::DataAccessProfData *DataAccessProf) {
if (!AnnotationStaticDataPrefix || M.globals().empty() || !DataAccessProf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we emit a warning if the user wanted to annotate the static data but the memprof didn't have the info? It would catch mismatched expectations during a build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. We can follow up in https://github.com/llvm/llvm-project/pull/151238/files#r2298880473 on whether we also want to check option's NumOccurrences.


bool Changed = false;
for (GlobalVariable &GVar : M.globals()) {
assert(!GVar.getSectionPrefix().has_value() &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen here for globals that the user has explicitly assigned a section using e.g. https://clang.llvm.org/docs/AttributeReference.html#section-declspec-allocate?

I'd like to see a test for this attribute too to ensure we don't break user annotations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen here for globals that the user has explicitly assigned a section using e.g. https://clang.llvm.org/docs/AttributeReference.html#section-declspec-allocate

Basically, with profile generator's filtering and linker's mapping, the chance of partitioning custom-section global variables inadvertently should be very minimal, as described below. But I agree it's good to not annotate them in the first place, so implemented that.

  • From profile generator perspective, it will only retrieve symbols and their sample counts from the 4 x 2 relevant static data sections, and it will not read symbols from custom sections. This way, data access profile payload won't have profiles for custom section global variables -- however, it's possible that internal-linkage global variables have the same name, and intermediate object files (not the executable) sees custom sections with prefix.

  • From linker's input -> output section mapping perspective, the keep-data-section-prefix option targets the section with one of known prefix ([lld][ELF] Introduce an option to keep data section prefix. #148985).

I'd like to see a test for this attribute too to ensure we don't break user annotations.

done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

291bae7 implements the change and test.

https://gcc.godbolt.org/z/3c59K3rqM is a small C++ example to illustrate how source code section names (section attribute and #pragma clang section directive) affects section name in asm. From the asm output, we can see that a and b has sec1 and sec2 respectively.

Copy link
Contributor Author

@mingmingl-llvm mingmingl-llvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for feedback! PTAL.

@@ -729,6 +729,11 @@ class IndexedMemProfReader {
LLVM_ABI DenseMap<uint64_t, SmallVector<memprof::CallEdgeTy, 0>>
getMemProfCallerCalleePairs() const;

// Returns non-owned pointer to data access profile data.
memprof::DataAccessProfData *getDataAccessProfileData() const {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -75,6 +76,17 @@ static cl::opt<unsigned> MinMatchedColdBytePercent(
"memprof-matching-cold-threshold", cl::init(100), cl::Hidden,
cl::desc("Min percent of cold bytes matched to hint allocation cold"));

static cl::opt<bool> AnnotationStaticDataPrefix(
"annotate-static-data-prefix", cl::init(false), cl::Hidden,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the option prefix.

If we make the default true, the DiagnosticInfoPGOProfile warning (which also makes sense) will show up in the build logs unless static data profile is populated. This may also confuse the compiler users. From this perspective, I wonder if any of the following two options might be preferred

  1. keep the default option value false and update [Clang] Add clang driver option -fpartition-static-data-sections  #124991 to set this LLVM option to true
  2. keep the default option value true, and do something like
if (!AnnotateStaticDataSectionPrefix || M.globals().empty())
    return false;

// Check NumOccurrences before emit warning since the option
// value is true by default.
  if (!DataAccessProf && AnnotateStaticDataSectionPrefix.getNumOccurrences()) {
    M.getContext().diagnose(DiagnosticInfoPGOProfile(
        MemoryProfileFileName.data(),
        StringRef("Data access profiles not found in memprof. Ignore "
                  "-memprof-annotate-static-data-prefix."),
        DS_Warning));

}

StringRef Name = GVar.getName();
auto SectionPrefix = GVar.getSectionPrefix();
if (SectionPrefix.has_value())
errs() << "Global variable " << Name
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. Also incorporated Snehasish's feedback above to use LLVM_DEBUG for both strings and the rest.

; RUN: llvm-profdata merge --memprof-version=4 memprof.yaml -o memprof.profdata

;; Run optimizer pass on the IR, and check the section prefix.
; RUN: opt -passes='memprof-use<profile-filename=memprof.profdata>' -annotate-static-data-prefix \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a RUN line below with --implicit-check-not="section_prefix", also updated the 'CHECK' (default prefix) to 'PREFIX' (a custom one). This way, it's very explicit that the added RUN line won't match 'PREFIX' lines.

; RUN: opt -passes='memprof-use<profile-filename=memprof.profdata>' -annotate-static-data-prefix \
; RUN: -S input.ll -o - 2>&1 | FileCheck %s

;; Repeat the command line above and enable -print-static-data-prefix. Test both IR and log.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, and added REQUIRES: asserts to check LLVM_DEBUG lines.

;; to output, and one conservative practice is to map unlikely-prefixed ones to
;; unlikely output section, and map the rest (hot-prefixed or prefix-less) to
;; the canonical output section.
; CHECK-NEXT: @bar = global i16 3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done by testing that there is no section_prefix between @bar and the object after it (i.e., @foo in the updated test)


;; @bar is not seen in hot symbol or known symbol set, so it doesn't get
;; a section prefix. It's up to the linker to decide how to map input sections
;; to output, and one conservative practice is to map unlikely-prefixed ones to
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's much more visible to document such considerations as code comment, and I wouldn't try find it in a regression test myself..
Moved it to cpp.

// file but doesn't have any access sample.
// For logging, optimization remark emitter requires a llvm::Function, but
// it's not well defined how to associate a global variable with a function.
// So we just print out the static data section prefix in LLVM_DEBUG.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only downside with that is the need for a debug built compiler to get any information out. Maybe at least add STATISTICs to enable getting a count of hot and unlikely annotated gvs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm onboard with adding counters so implemented it; though if I remember correctly stats are not always enabled in a release build compiler, and there is a DLLVM_ENABLE_STATS flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, shoot, well the stats are helpful in any case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a statistic sounds good to me too.

---
HeapProfileRecords:
- GUID: 0xdeadbeef12345678
AllocSites:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ideally that could be changed. Also it looks like both the AllocSites and CallSites are required, which isn't necessarily true in a real profile for each function. For now you could trim it a bit by only having one frame for each. Also add a comment that this is currently required by the yaml reader.

@@ -75,6 +76,17 @@ static cl::opt<unsigned> MinMatchedColdBytePercent(
"memprof-matching-cold-threshold", cl::init(100), cl::Hidden,
cl::desc("Min percent of cold bytes matched to hint allocation cold"));

static cl::opt<bool> AnnotationStaticDataPrefix(
"annotate-static-data-prefix", cl::init(false), cl::Hidden,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the default should be true with that warning, until and unless we expect to always have that profile section.

Copy link
Contributor Author

@mingmingl-llvm mingmingl-llvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for reviews!

---
HeapProfileRecords:
- GUID: 0xdeadbeef12345678
AllocSites:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. I trimmed the heap profiles to preserve one entry for the required fields and subfields. PTAL.

cc @kazutakahirata I want to make sure I don't miss some considerations about mapRequired, but otherwise I'd be glad to make the change from mapRequired to mapOptional. Just lmk ;)

// file but doesn't have any access sample.
// For logging, optimization remark emitter requires a llvm::Function, but
// it's not well defined how to associate a global variable with a function.
// So we just print out the static data section prefix in LLVM_DEBUG.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm onboard with adding counters so implemented it; though if I remember correctly stats are not always enabled in a release build compiler, and there is a DLLVM_ENABLE_STATS flag.

Copy link
Contributor

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

// file but doesn't have any access sample.
// For logging, optimization remark emitter requires a llvm::Function, but
// it's not well defined how to associate a global variable with a function.
// So we just print out the static data section prefix in LLVM_DEBUG.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a statistic sounds good to me too.

---
HeapProfileRecords:
- GUID: 0xdeadbeef12345678
AllocSites:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to mapOptional.

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change lgtm except I think the option should be off by default if this will cause spurious diagnostics to be generated about missing data access profiles

// file but doesn't have any access sample.
// For logging, optimization remark emitter requires a llvm::Function, but
// it's not well defined how to associate a global variable with a function.
// So we just print out the static data section prefix in LLVM_DEBUG.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, shoot, well the stats are helpful in any case.

@mingmingl-llvm
Copy link
Contributor Author

mingmingl-llvm commented Aug 27, 2025

Change lgtm except I think the option should be off by default if this will cause spurious diagnostics to be generated about missing data access profiles

done. And update one RUN line to rely on the default value being false.

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor Author

@mingmingl-llvm mingmingl-llvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trimmed the text profile input.

---
HeapProfileRecords:
- GUID: 0xdeadbeef12345678
AllocSites:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! Now the text profile is trimmed now.

@mingmingl-llvm mingmingl-llvm merged commit c93b3a3 into main Aug 28, 2025
9 checks passed
@mingmingl-llvm mingmingl-llvm deleted the users/mingmingl-llvm/memprofuse branch August 28, 2025 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:transforms PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants