Skip to content

Conversation

mingmingl-llvm
Copy link
Contributor

@mingmingl-llvm mingmingl-llvm commented Jul 10, 2025

An RFC is in https://discourse.llvm.org/t/rfc-vtable-type-profiling-for-samplefdo/87283

This change extends to process perf data with Intel MEM_INST_RETIRED.ALL_LOADS samples and produce sample profiles with vtable information for non context-sensitive SampleFDO profiles.

  • For feature parity across different hardwares, future work could incorporate support for AMD Instruction-Based Sampling (IBS) and Arm Statistical Profiling Extension (SPE).

…enerate vtable profiles for non context-sensitive AFDO profiles
@mingmingl-llvm mingmingl-llvm marked this pull request as ready for review July 10, 2025 22:48
@llvmbot llvmbot added the PGO Profile Guided Optimizations label Jul 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 10, 2025

@llvm/pr-subscribers-pgo

Author: Mingming Liu (mingmingl-llvm)

Changes

An RFC is in https://discourse.llvm.org/t/rfc-vtable-type-profiling-for-samplefdo/87283

This change extends to process perf data with Intel MEM_INST_RETIRED.ALL_LOADS samples and produce sample profiles with vtable information for non context-sensitive SampleFDO profiles.

  • For feature parity across different hardwares, future work could incorporate support for AMD Instruction-Based Sampling (IBS) and Arm Statistical Profiling Extension (SPE).

Patch is 186.37 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/148013.diff

10 Files Affected:

  • (added) llvm/test/tools/llvm-profgen/Inputs/dap-perf-trace.txt (+37)
  • (added) llvm/test/tools/llvm-profgen/Inputs/dap.perfbin ()
  • (added) llvm/test/tools/llvm-profgen/Inputs/lbr-perf-for-dap.script (+175)
  • (added) llvm/test/tools/llvm-profgen/afdo-with-vtable.test (+18)
  • (modified) llvm/tools/llvm-profgen/PerfReader.cpp (+68-4)
  • (modified) llvm/tools/llvm-profgen/PerfReader.h (+21-13)
  • (modified) llvm/tools/llvm-profgen/ProfileGenerator.cpp (+16)
  • (modified) llvm/tools/llvm-profgen/ProfiledBinary.cpp (+40)
  • (modified) llvm/tools/llvm-profgen/ProfiledBinary.h (+35)
  • (modified) llvm/tools/llvm-profgen/llvm-profgen.cpp (+12)
diff --git a/llvm/test/tools/llvm-profgen/Inputs/dap-perf-trace.txt b/llvm/test/tools/llvm-profgen/Inputs/dap-perf-trace.txt
new file mode 100644
index 0000000000000..28f15b1ff199d
--- /dev/null
+++ b/llvm/test/tools/llvm-profgen/Inputs/dap-perf-trace.txt
@@ -0,0 +1,37 @@
+0 0x7b10 [0x88]: PERF_RECORD_MMAP2 3446532/3446532: [0x200000(0x60000) @ 0 08:01 527501 0]: r--p /path/to/dap.perfbin
+0 0x7b98 [0x88]: PERF_RECORD_MMAP2 3446532/3446532: [0x260000(0x153000) @ 0x5f000 08:01 527501 0]: r-xp /path/to/dap.perfbin
+0 0x7c20 [0x88]: PERF_RECORD_MMAP2 3446532/3446532: [0x3b3000(0xc000) @ 0x1b1000 08:01 527501 0]: r--p /path/to/dap.perfbin
+0 0x7ca8 [0x88]: PERF_RECORD_MMAP2 3446532/3446532: [0x3bf000(0x3000) @ 0x1bc000 08:01 527501 0]: rw-p /path/to/dap.perfbin
+1282514021937402 0x8660 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3f70
+1282514022939813 0x87b0 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3fb0
+1282514023932029 0x8a00 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3fb0
+1282514024937981 0x8d48 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3fb0
+1282514028925828 0x94c0 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3fb0
+1282514028934870 0x9678 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608ac period: 233 addr: 0x3b3fc0
+1282514029934094 0x9830 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3fb0
+1282514040934785 0xb1d0 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608ac period: 233 addr: 0x3b3fc0
+1282514052924510 0xcbb8 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3f70
+1282514053932406 0xcfb0 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608ac period: 233 addr: 0x3b3fc0
+1282514063928248 0xe5c8 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3f70
+1282514073928057 0xfd20 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3f70
+1282514081925013 0x10f28 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3f70
+1282514084927335 0x11678 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3f70
+1282514088926926 0x11f90 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3f70
+1282514089929492 0x12270 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3f70
+1282514119919997 0x16610 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3f70
+1282514120924169 0x16920 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3f70
+1282514145923603 0x1a338 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3f70
+1282514146917708 0x1a428 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3f70
+1282514173914003 0x1e1b0 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3f70
+1282514188915199 0x20488 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3f70
+1282514210915866 0x236d8 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3f70
+1282514212908181 0x23a50 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608a2 period: 233 addr: 0x3b3f70
+1282514480886012 0x4a098 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608ac period: 233 addr: 0x3b3f80
+1282514840855333 0x7dd48 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608ac period: 233 addr: 0x3b3f80
+1282514955835364 0x8e380 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608ac period: 233 addr: 0x3b3f80
+1282514967839429 0x8fef8 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608ac period: 233 addr: 0x3b3f80
+1282515023830209 0x97f98 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608ac period: 233 addr: 0x3b3f80
+1282515356804308 0xc7b28 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608ac period: 233 addr: 0x3b3f80
+1282515410794371 0xcf590 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608ac period: 233 addr: 0x3b3f80
+1282515541786485 0xe2280 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608ac period: 233 addr: 0x3b3f80
+1282515703761203 0xf93c0 [0x60]: PERF_RECORD_SAMPLE(IP, 0x4002): 3446532/3446532: 0x2608ac period: 233 addr: 0x3b3f80
diff --git a/llvm/test/tools/llvm-profgen/Inputs/dap.perfbin b/llvm/test/tools/llvm-profgen/Inputs/dap.perfbin
new file mode 100755
index 0000000000000..e69de29bb2d1d
diff --git a/llvm/test/tools/llvm-profgen/Inputs/lbr-perf-for-dap.script b/llvm/test/tools/llvm-profgen/Inputs/lbr-perf-for-dap.script
new file mode 100644
index 0000000000000..34a2f5121b3c6
--- /dev/null
+++ b/llvm/test/tools/llvm-profgen/Inputs/lbr-perf-for-dap.script
@@ -0,0 +1,175 @@
+PERF_RECORD_MMAP2 3446532/3446532: [0x260000(0x153000) @ 0x5f000 08:01 527501 0]: r-xp /path/to/dap.perfbin
+PERF_RECORD_MMAP2 3446532/3446532: [0x7fff5ff28000(0x2000) @ 0 00:00 0 0]: r-xp [vdso]
+PERF_RECORD_MMAP2 3446532/3446532: [0xffffffffff600000(0x1000) @ 0 00:00 0 0]: --xp [vsyscall]
+           350fd4 0x260832/0x260894/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x26081b/0x26082e/P/-/-/1//-  0x260c52/0x260814/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/3//-  0x26080f/0x260c30/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26085a/0x260be0/P/-/-/2//-  0x2608ac/0x260850/P/-/-/1//-  0x26084c/0x2608a4/P/-/-/2//-  0x2608a2/0x260840/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x26081b/0x26082e/P/-/-/1//-  0x260c52/0x260814/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/3//-  0x26080f/0x260c30/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//- 
+           260884 0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26085a/0x260be0/P/-/-/2//-  0x2608ac/0x260850/P/-/-/1//-  0x26084c/0x2608a4/P/-/-/2//-  0x2608a2/0x260840/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x26081b/0x26082e/P/-/-/1//-  0x260c52/0x260814/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/3//-  0x26080f/0x260c30/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26087a/0x260be0/P/-/-/2//-  0x2608ac/0x260870/P/-/-/1//-  0x2607fc/0x2608a4/P/-/-/2//-  0x2608a2/0x2607f0/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x260c52/0x260827/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/2//- 
+           2608a4 0x260c47/0x350850/P/-/-/2//-  0x260822/0x260c30/P/-/-/1//-  0x260808/0x26081d/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26085a/0x260be0/P/-/-/2//-  0x2608ac/0x260850/P/-/-/1//-  0x26084c/0x2608a4/P/-/-/2//-  0x2608a2/0x260840/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x26081b/0x26082e/P/-/-/1//-  0x260c52/0x260814/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/3//-  0x26080f/0x260c30/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//- 
+           35104c 0x2608ac/0x260850/P/-/-/2//-  0x2608ac/0x260850/P/-/-/1//-  0x26084c/0x2608a4/P/-/-/2//-  0x2608a2/0x260840/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x26081b/0x26082e/P/-/-/1//-  0x260c52/0x260814/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/3//-  0x26080f/0x260c30/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26085a/0x260be0/P/-/-/2//-  0x2608ac/0x260850/P/-/-/1//-  0x26084c/0x2608a4/P/-/-/2//-  0x2608a2/0x260840/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x26081b/0x26082e/P/-/-/1//-  0x260c52/0x260814/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/3//-  0x26080f/0x260c30/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//- 
+           260800 0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26087a/0x260be0/P/-/-/2//-  0x2608ac/0x260870/P/-/-/1//-  0x2607fc/0x2608a4/P/-/-/2//-  0x2608a2/0x2607f0/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x260c52/0x260827/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/2//-  0x260822/0x260c30/P/-/-/1//-  0x260808/0x26081d/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26085a/0x260be0/P/-/-/2//-  0x2608ac/0x260850/P/-/-/1//-  0x26084c/0x2608a4/P/-/-/2//-  0x2608a2/0x260840/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x26081b/0x26082e/P/-/-/1//-  0x260c52/0x260814/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//- 
+           2608a4 0x260c47/0x350850/P/-/-/3//-  0x26080f/0x260c30/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26085a/0x260be0/P/-/-/2//-  0x2608ac/0x260850/P/-/-/1//-  0x26084c/0x2608a4/P/-/-/2//-  0x2608a2/0x260840/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x26081b/0x26082e/P/-/-/1//-  0x260c52/0x260814/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/3//-  0x26080f/0x260c30/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26085a/0x260be0/P/-/-/2//- 
+           35109d 0x26085a/0x260be0/P/-/-/1//-  0x26085a/0x260be0/P/-/-/2//-  0x2608ac/0x260850/P/-/-/1//-  0x26084c/0x2608a4/P/-/-/2//-  0x2608a2/0x260840/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x26081b/0x26082e/P/-/-/1//-  0x260c52/0x260814/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/3//-  0x26080f/0x260c30/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26087a/0x260be0/P/-/-/2//-  0x2608ac/0x260870/P/-/-/1//-  0x2607fc/0x2608a4/P/-/-/2//-  0x2608a2/0x2607f0/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x260c52/0x260827/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/2//-  0x260822/0x260c30/P/-/-/1//-  0x260808/0x26081d/P/-/-/1//- 
+           260c30 0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26085a/0x260be0/P/-/-/2//-  0x2608ac/0x260850/P/-/-/1//-  0x26084c/0x2608a4/P/-/-/2//-  0x2608a2/0x260840/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x26081b/0x26082e/P/-/-/1//-  0x260c52/0x260814/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/3//-  0x26080f/0x260c30/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26085a/0x260be0/P/-/-/2//-  0x2608ac/0x260850/P/-/-/1//-  0x26084c/0x2608a4/P/-/-/2//-  0x2608a2/0x260840/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x26081b/0x26082e/P/-/-/1//-  0x260c52/0x260814/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//- 
+           260bf0 0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/3//-  0x26080f/0x260c30/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26085a/0x260be0/P/-/-/2//-  0x2608ac/0x260850/P/-/-/1//-  0x26084c/0x2608a4/P/-/-/2//-  0x2608a2/0x260840/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x26081b/0x26082e/P/-/-/1//-  0x260c52/0x260814/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/3//-  0x26080f/0x260c30/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//- 
+           2608af 0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26087a/0x260be0/P/-/-/2//-  0x2608ac/0x260870/P/-/-/1//-  0x2607fc/0x2608a4/P/-/-/2//-  0x2608a2/0x2607f0/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x260c52/0x260827/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/2//-  0x260822/0x260c30/P/-/-/1//-  0x260808/0x26081d/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26085a/0x260be0/P/-/-/2//-  0x2608ac/0x260850/P/-/-/1//-  0x26084c/0x2608a4/P/-/-/2//-  0x2608a2/0x260840/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x26081b/0x26082e/P/-/-/1//-  0x260c52/0x260814/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/3//- 
+           350866 0x351059/0x351098/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26085a/0x260be0/P/-/-/2//-  0x2608ac/0x260850/P/-/-/1//-  0x26084c/0x2608a4/P/-/-/2//-  0x2608a2/0x260840/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x26081b/0x26082e/P/-/-/1//-  0x260c52/0x260814/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/3//-  0x26080f/0x260c30/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26085a/0x260be0/P/-/-/2//-  0x2608ac/0x260850/P/-/-/1//-  0x26084c/0x2608a4/P/-/-/2//-  0x2608a2/0x260840/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x26081b/0x26082e/P/-/-/1//-  0x260c52/0x260814/P/-/-/1//- 
+           350f40 0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/3//-  0x26080f/0x260c30/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26087a/0x260be0/P/-/-/2//-  0x2608ac/0x260870/P/-/-/1//-  0x2607fc/0x2608a4/P/-/-/2//-  0x2608a2/0x2607f0/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x260c52/0x260827/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/2//-  0x260822/0x260c30/P/-/-/1//-  0x260808/0x26081d/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//- 
+           26090f 0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26085a/0x260be0/P/-/-/2//-  0x2608ac/0x260850/P/-/-/1//-  0x26084c/0x2608a4/P/-/-/2//-  0x2608a2/0x260840/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x26081b/0x26082e/P/-/-/1//-  0x260c52/0x260814/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/3//-  0x26080f/0x260c30/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26085a/0x260be0/P/-/-/2//-  0x2608ac/0x260850/P/-/-/1//-  0x26084c/0x2608a4/P/-/-/2//-  0x2608a2/0x260840/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x26081b/0x26082e/P/-/-/1//-  0x260c52/0x260814/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/3//- 
+           260814 0x26090a/0x260880/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26085a/0x260be0/P/-/-/2//-  0x2608ac/0x260850/P/-/-/1//-  0x26084c/0x2608a4/P/-/-/2//-  0x2608a2/0x260840/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x26081b/0x26082e/P/-/-/1//-  0x260c52/0x260814/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/3//-  0x26080f/0x260c30/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26087a/0x260be0/P/-/-/2//-  0x2608ac/0x260870/P/-/-/1//-  0x2607fc/0x2608a4/P/-/-/2//- 
+           350f57 0x350879/0x350887/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/2//-  0x260822/0x260c30/P/-/-/1//-  0x260808/0x26081d/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26085a/0x260be0/P/-/-/2//-  0x2608ac/0x260850/P/-/-/1//-  0x26084c/0x2608a4/P/-/-/2//-  0x2608a2/0x260840/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x26081b/0x26082e/P/-/-/1//-  0x260c52/0x260814/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350887/P/-/-/3//-  0x260c47/0x350850/P/-/-/3//-  0x26080f/0x260c30/P/-/-/1//-  0x26088f/0x260800/P/-/-/3//-  0x26090a/0x260880/P/-/-/1//-  0x26091c/0x260900/P/-/-/1//-  0x2608bb/0x26090f/P/-/-/2//-  0x3510ab/0x2608af/P/-/-/3//-  0x351059/0x351098/P/-/-/7//-  0x350f8c/0x350fb4/P/-/-/4//- 
+           260880 0x350f8c/0x350fb4/P/-/-/4//-  0x260bf4/0x350f40/P/-/-/1//-  0x260be4/0x260bf0/P/-/-/1//-  0x26085a/0x260be0/P/-/-/2//-  0x2608ac/0x260850/P/-/-/1//-  0x26084c/0x2608a4/P/-/-/2//-  0x2608a2/0x260840/P/-/-/2//-  0x260832/0x260894/P/-/-/1//-  0x26081b/0x26082e/P/-/-/1//-  0x260c52/0x260814/P/-/-/1//-  0x3508da/0x260c4c/P/-/-/5//-  0x350879/0x350...
[truncated]

@WenleiHe WenleiHe requested a review from apolloww July 16, 2025 05:42
Copy link
Contributor

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly cosmetic comments..

Copy link
Contributor Author

@mingmingl-llvm mingmingl-llvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for reviews! PTAL.

}

// Given a runtime address, canonicalize it to the virtual address in the
// binary.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment to mention 'non-text' and added a TODO to consider unifying text vs non-text. I think it should be possible to do some refactoring, but not sure if we can completely get rid of a data-vs-text hint from caller side (e.g. executable's linking option may affect the code/data to segment mapping).

@@ -946,6 +978,14 @@ SampleContextFrameVector ProfiledBinary::symbolize(const InstructionPointer &IP,
return CallStack;
}

StringRef ProfiledBinary::symbolizeDataAddress(uint64_t Address) {
DIGlobal DataDIGlobal = unwrapOrError(
Symbolizer->symbolizeData(SymbolizerPath.str(), {Address, 0}),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVMSymbolizer's symbolize* interfaces [1] require a struct of object::SectionedAddress [2] to symbolize an address, like, they don't take an integer of address alone. Chasing down the calls [3], sectionIndex field isn't used so its value doesn't matter [4].

Upon this question, it's better to use UndefSection [5] here. Added a static helper function getSectionedAddress for this purpose, and use it for both code and data inside ProfiledBinary class.

[1] for data and code

[2]

struct SectionedAddress {
const static uint64_t UndefSection = UINT64_MAX;
uint64_t Address = 0;
uint64_t SectionIndex = UndefSection;
};

[3]

LLVMSymbolizer::symbolizeDataCommon(const T &ModuleSpecifier,
object::SectionedAddress ModuleOffset) {
auto InfoOrErr = getOrCreateModuleInfo(ModuleSpecifier);
if (!InfoOrErr)
return InfoOrErr.takeError();
SymbolizableModule *Info = *InfoOrErr;
// A null module means an error has already been reported. Return an empty
// result.
if (!Info)
return DIGlobal();
// If the user is giving us relative addresses, add the preferred base of
// the object to the offset before we do the query. It's what DIContext
// expects.
if (Opts.RelativeAddresses)
ModuleOffset.Address += Info->getModulePreferredBase();
DIGlobal Global = Info->symbolizeData(ModuleOffset);
if (Opts.Demangle)
Global.Name = DemangleName(Global.Name, Info);
return Global;
}

[4]

std::optional<DILineInfo>
DWARFContext::getLineInfoForDataAddress(object::SectionedAddress Address) {
DILineInfo Result;
DWARFCompileUnit *CU = getCompileUnitForDataAddress(Address.Address);
if (!CU)
return Result;
if (DWARFDie Die = CU->getVariableForAddress(Address.Address)) {
Result.FileName = Die.getDeclFile(FileLineInfoKind::AbsoluteFilePath);
Result.Line = Die.getDeclLine();
}
return Result;
}

[5]

struct SectionedAddress {
const static uint64_t UndefSection = UINT64_MAX;

Copy link
Contributor

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of requests regarding testing --

  1. I think the current dap.bin is non-PIE, can you add a PIE variant test case too?
  2. Can you trim the lbr-perf-for-dap.script test input to preserve only whats needed for a minimal test. I think a lot of the samples are repeated and we can just test for a lower count.

@@ -603,6 +640,16 @@ class ProfiledBinary {
return ProbeDecoder.getInlinerDescForProbe(Probe);
}

void addMMapNonTextEvent(MMapEvent MMap) {
MMapNonTextEvents.push_back(MMap);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that multiple mmap events can happen on the same segment? If so, would it be better to use a map here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think even if there were mutliple mmap events for the same segment we would still only want the last one. For the things we care about .text* .data*, we shouldn't have multiple copies in the address space.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Using vector here, I think there is a chance an address could be matched to an old map during canonicalization. If using map with offsets as keys, we only keep the most recent one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that it's rare for the virtual address ranges (i.e., the intervals formed by [MMapEvent.Address, MMapEvent.Address + MMap.Size)) of mmap events to overlap with each other. Assuming this is true, I think there are two implementation options here:

  1. Make llvm-profgen to validate mmap events don't have overlapping virtual address (and report 'unimplemented error' otherwise). This way, the implementation can use a map to do efficient address canonicalization and handle the common case. The updated change implements this option.

  2. llvm-profgen doesn't validate address interval and cannot assume non-overlapping address ranges. The implementation uses a vector to record the mmap events in the order they are added and reverse iterate the map to do data address canonicalization. While a linear reverse iteration is probably acceptable given the number of non-text mmap events is observed to be 3 in most cases, I think option 1) does the better trade-off.

Let me know your thoughts and I'd be glad to follow up.

Copy link
Contributor

@apolloww apolloww Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

Sorry for not making it more clearer. I think this won't be an issue if we only consider the main executable right now. Its sections should just be loaded once and stay the same. I was thinking about if we are going to support dynamic loading/reloading of shared libraries (dlopen/dlclose), which may make the same DSO appear at different addresses. Maybe this isn't something we need to consider right now. The current map solution can help surface such issue in case we want to go that direction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification @apolloww !

I think this won't be an issue if we only consider the main executable right now. Its sections should just be loaded once and stay the same. I was thinking about if we are going to support dynamic loading/reloading of shared libraries (dlopen/dlclose), which may make the same DSO appear at different addresses. Maybe this isn't something we need to consider right now. The current map solution can help surface such issue in case we want to go that direction.

Acknowledged that dynamic loading of shared libraries can create overlapping intervals and that producing vtable symbols from shared DSOs is not something we want to pursue in this version. Added some comment to call out the considerations behind the current implementation. PTAL.

Copy link
Member

@paschalis-mpeis paschalis-mpeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Just adding a few nits.

can you add a PIE variant test case too?

IIUC any PIE/PIC code is not yet supported, possibly due to the mapping done below?
Or is it to ensure tests are deterministic (ie we have materialized dap-perf-trace.txt, lbr-perf-for-dap.script) ?

StringRef DataSymbol = Binary->symbolizeDataAddress(
Binary->CanonicalizeNonTextAddress(DataAddress));
if (DataSymbol.starts_with("_ZTV")) {
uint64_t IP = 0;
Fields[2].getAsInteger(0, IP);
Counter.recordDataAccessCount(Binary->canonicalizeVirtualAddress(IP),

@@ -1027,6 +1027,20 @@ class FunctionSamples {
return VirtualCallsiteTypeCounts[mapIRLocToProfileLoc(Loc)];
}

// At location \p Loc, add a type sample for the given \p Type with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could use triple ///.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

uint64_t DataAddress = 0;
Fields[3].getAsInteger(0, DataAddress);

StringRef DataSymbol = Binary->symbolizeDataAddress(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth adding a comment that the memory accesses pointing to the vtables (based on the ABI convention) are recorded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented that vtable accesses are used and Itanium C++ ABI is assumed to find out vtable accesses.

mingmingl-llvm and others added 2 commits July 25, 2025 13:38
Co-authored-by: Paschalis Mpeis <paschalis.mpeis@arm.com>
Copy link
Contributor Author

@mingmingl-llvm mingmingl-llvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for reviews! PTAL.

@@ -1027,6 +1027,20 @@ class FunctionSamples {
return VirtualCallsiteTypeCounts[mapIRLocToProfileLoc(Loc)];
}

// At location \p Loc, add a type sample for the given \p Type with
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

uint64_t DataAddress = 0;
Fields[3].getAsInteger(0, DataAddress);

StringRef DataSymbol = Binary->symbolizeDataAddress(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented that vtable accesses are used and Itanium C++ ABI is assumed to find out vtable accesses.

@@ -603,6 +640,16 @@ class ProfiledBinary {
return ProbeDecoder.getInlinerDescForProbe(Probe);
}

void addMMapNonTextEvent(MMapEvent MMap) {
MMapNonTextEvents.push_back(MMap);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that it's rare for the virtual address ranges (i.e., the intervals formed by [MMapEvent.Address, MMapEvent.Address + MMap.Size)) of mmap events to overlap with each other. Assuming this is true, I think there are two implementation options here:

  1. Make llvm-profgen to validate mmap events don't have overlapping virtual address (and report 'unimplemented error' otherwise). This way, the implementation can use a map to do efficient address canonicalization and handle the common case. The updated change implements this option.

  2. llvm-profgen doesn't validate address interval and cannot assume non-overlapping address ranges. The implementation uses a vector to record the mmap events in the order they are added and reverse iterate the map to do data address canonicalization. While a linear reverse iteration is probably acceptable given the number of non-text mmap events is observed to be 3 in most cases, I think option 1) does the better trade-off.

Let me know your thoughts and I'd be glad to follow up.

@mingmingl-llvm
Copy link
Contributor Author

A couple of requests regarding testing --

  1. I think the current dap.bin is non-PIE, can you add a PIE variant test case too?
  2. Can you trim the lbr-perf-for-dap.script test input to preserve only whats needed for a minimal test. I think a lot of the samples are repeated and we can just test for a lower count.

done! It took me sometime to find a good time to tune the profiling period for a non-PIE binary without over-heating the machine, and now sampling at 1,000,003 is a good number :)

Copy link
Contributor

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. It would be good to get an approval from llvm-profgen owners.

cc: @WenleiHe @wlei-llvm

uint64_t Count) {
auto &TypeCounts = getTypeSamplesAt(Loc);
bool Overflowed = false;
TypeCounts[Type] = SaturatingMultiplyAdd(Count, /* Weight= */ (uint64_t)1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should insert into the map if it overflowed. Can you check the overflow first and then insert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As clarified offline, the SaturatingMultiplyAdd will clamp the result if overflow happens, and we insert the return value of SaturatingMultiplyAdd. From this perspective, counter_overflow is more of informative warning as opposed to a real error that wraps around a large unsigned integer into another value.

I updated the comment to make this more explicit, and will probably prepare a separate change around the warning handling (in SampleProf or InstrProf).

// mapping between the binary name and its memory layout.
static bool extractMMapEventForBinary(ProfiledBinary *Binary, StringRef Line,
MMapEvent &MMap);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra newline? The existing code doesn't have new lines between decls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -946,6 +978,14 @@ SampleContextFrameVector ProfiledBinary::symbolize(const InstructionPointer &IP,
return CallStack;
}

StringRef ProfiledBinary::symbolizeDataAddress(uint64_t Address) {
DIGlobal DataDIGlobal = unwrapOrError(
Symbolizer->symbolizeData(SymbolizerPath.str(), {Address, 0}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed followup.


const auto &MMapEvent = MMapIter->second;

// If the address is within the non-text mmap event, calculates its file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: s/calculates/calculate/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@mingmingl-llvm
Copy link
Contributor Author

lgtm. It would be good to get an approval from llvm-profgen owners.

cc: @WenleiHe @wlei-llvm

Sure! also cc @apolloww for (more) reviews and comments! Let me know if there are questions about the work of vtable type profiling (this change or #148002 for SamplePGO, or for Instrumented PGO).

Comment on lines 70 to 74
static cl::opt<std::string> DataAccessProfileFilename(
"data-access-profile", cl::value_desc("data-access-profile"),
cl::desc("Path to the data access profile to be generated."),
cl::cat(ProfGenCategory));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understand correctly, this is the input perf trace not the generated output, then how about renaming it to something like data-access-perftrace or data-access-perfscript? In llvm-profgen, the "profile" is usually referred to the output of this tool. This also aligns with the parsing function("parseDataAccessPerfTraces").

Also, the comment "Path to the data access profile to be generated." made me think this refers to the output file, which could be confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done by renaming the option name to data-access-perftrace and updating the comment accordingly.

Comment on lines 543 to 558
// For each instruction with vtable accesses, get its symbolized inline
// stack, and add the vtable counters to the function samples.
for (const auto &[IpData, Count] : SC.DataAccessCounter) {
uint64_t InstAddr = IpData.first;
const SampleContextFrameVector &FrameVec =
Binary->getCachedFrameLocationStack(InstAddr, false);
if (!FrameVec.empty()) {
FunctionSamples &FunctionProfile =
getLeafProfileAndAddTotalSamples(FrameVec, 0);
LineLocation Loc(
FrameVec.back().Location.LineOffset,
getBaseDiscriminator(FrameVec.back().Location.Discriminator));
FunctionProfile.addTypeSamplesAt(Loc, FunctionId(IpData.second), Count);
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move the code into a wrapper function, just to be consistent with other populate functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to a helper function ProfileGenerator::populateTypeSamplesForAllFunctions.

If I understand correctly, llvm-profgen supports 4 settings -- {LBR w/o callgraph, LBR w call graph} x {binary using line + discriminator, binary enabling pseudo-probe based mapping}. ProfileGenerator::populateTypeSamplesForAllFunctions only supports LBR w/o callgraph + binary using line + discriminator.

In the updated patch

  • llvm-profgen.cpp will report errors on the 3 other cases
  • afdo-with-vtable.test and afdo-with-vtable-pie.test are updated to test that llvm-profgen will report error and exit with -use-dwarf-correlation=false.

Unfortunately the environment where I can conveniently collect LBR is temporarily not available due to a very recent tech issue so I cannot collect context-sensitive perf traces or provide it. I'll deal with the tech issue and hopefully be able to collect LBR conveniently in the new few days to update this patch.

I'll file a Github issue to track the other three combinations of settings. Meanwhile may I know which setting is preferred the most out of the other 3 settings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github issue is filed in #155438

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The environment issue is fixed (with a manual machine restart after some system maintenances..) , and I can collect LBR. e587089#diff-22e592380f9deced00e432033744b4bbd34f8490b5aeb6f35356e95dd51c821a adds a context-sensitive perf script after running llvm/test/tools/llvm-profgen/Inputs/dap.bin, and verified that llvm-profgen reports error when it sees a context-sensitive script.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the err check and filing the Github issue.

Meanwhile may I know which setting is preferred the most out of the other 3 settings?

In Meta, we mainly use the CS(LBR w call graph})+ pseudo-probe, sometimes we use (LBR w/o call graph}) + pseudo-probe for experiments.

@@ -0,0 +1,19 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove the empty leading line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor Author

@mingmingl-llvm mingmingl-llvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the reviews!

Comment on lines 70 to 74
static cl::opt<std::string> DataAccessProfileFilename(
"data-access-profile", cl::value_desc("data-access-profile"),
cl::desc("Path to the data access profile to be generated."),
cl::cat(ProfGenCategory));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done by renaming the option name to data-access-perftrace and updating the comment accordingly.

@@ -0,0 +1,19 @@

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link

github-actions bot commented Aug 26, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@mingmingl-llvm mingmingl-llvm changed the title [llvm-profgen] Extend llvm-profgen to generate vtable profiles with data access events. [llvm-profgen] Extend llvm-profgen to generate vtable profiles with data access events for non context-sensitive profiles using debug info Aug 26, 2025
Copy link
Contributor

@wlei-llvm wlei-llvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants