Skip to content

Conversation

boomanaiden154
Copy link
Contributor

Currently the clang CMake caches use FE PGO for instrumentation (LLVM_BUILD_INSTRUMENTED=ON). However, IRPGO is generally regarded as better for performance. I am measuring about a 1.5% performance gain when building libLLVMSupport.a using this configuration versus what existed before this commit. I would suspect the gains are larger on other platforms like Windows where we cannot subsume any gains using PLO.

Currently the clang CMake caches use FE PGO for instrumentation
(LLVM_BUILD_INSTRUMENTED=ON). However, IRPGO is generally regarded as
better for performance. I am measuring about a 1.5% performance gain
when building libLLVMSupport.a using this configuration versus what
existed before this commit. I would suspect the gains are larger on
other platforms like Windows where we cannot subsume any gains using
PLO.
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Aug 29, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 29, 2025

@llvm/pr-subscribers-clang

Author: Aiden Grossman (boomanaiden154)

Changes

Currently the clang CMake caches use FE PGO for instrumentation (LLVM_BUILD_INSTRUMENTED=ON). However, IRPGO is generally regarded as better for performance. I am measuring about a 1.5% performance gain when building libLLVMSupport.a using this configuration versus what existed before this commit. I would suspect the gains are larger on other platforms like Windows where we cannot subsume any gains using PLO.


Full diff: https://github.com/llvm/llvm-project/pull/155957.diff

1 Files Affected:

  • (modified) clang/cmake/caches/PGO.cmake (+1-1)
diff --git a/clang/cmake/caches/PGO.cmake b/clang/cmake/caches/PGO.cmake
index 15bc755d110d1..d6471160037c1 100644
--- a/clang/cmake/caches/PGO.cmake
+++ b/clang/cmake/caches/PGO.cmake
@@ -5,7 +5,7 @@ set(LLVM_ENABLE_PROJECTS "clang;lld" CACHE STRING "")
 set(LLVM_ENABLE_RUNTIMES "compiler-rt;libcxx;libcxxabi;libunwind" CACHE STRING "")
 
 set(LLVM_TARGETS_TO_BUILD Native CACHE STRING "")
-set(BOOTSTRAP_LLVM_BUILD_INSTRUMENTED ON CACHE BOOL "")
+set(BOOTSTRAP_LLVM_BUILD_INSTRUMENTED IR CACHE BOOL "")
 set(CLANG_BOOTSTRAP_TARGETS
   generate-profdata
   stage2

@boomanaiden154 boomanaiden154 requested a review from aaupov August 29, 2025 01:56
ilovepi added a commit that referenced this pull request Aug 29, 2025
After #155957 pointed it out, we're actually using the frontend PGO in
the Fuchsia cmake cache files.
@boomanaiden154 boomanaiden154 merged commit 7fca1f8 into llvm:main Aug 29, 2025
11 checks passed
@boomanaiden154 boomanaiden154 deleted the ir-pgo-clang-caches branch August 29, 2025 17:06
@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 29, 2025

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime-2 running on rocm-worker-hw-02 while building clang at step 6 "test-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/10/builds/12477

Here is the relevant piece of the build log for the reference
Step 6 (test-openmp) failure: test (failure)
******************** TEST 'libarcher :: races/task-taskwait-nested.c' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 13
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp  -gdwarf-4 -O1 -fsanitize=thread  -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src   /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp 2>&1 | tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp.log | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp -gdwarf-4 -O1 -fsanitize=thread -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp -latomic
# note: command had no output on stdout or stderr
# executed command: env TSAN_OPTIONS=ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1 /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp
# note: command had no output on stdout or stderr
# executed command: tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp.log
# note: command had no output on stdout or stderr
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c
# note: command had no output on stdout or stderr
# RUN: at line 14
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp  -gdwarf-4 -O1 -fsanitize=thread  -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src   /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp -latomic && env ARCHER_OPTIONS="ignore_serial=1 report_data_leak=1" env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp 2>&1 | tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp.log | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang -fopenmp -gdwarf-4 -O1 -fsanitize=thread -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests -I /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/runtime/src /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c -o /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp -latomic
# note: command had no output on stdout or stderr
# executed command: env 'ARCHER_OPTIONS=ignore_serial=1 report_data_leak=1' env TSAN_OPTIONS=ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1 /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/deflake.bash /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp
# note: command had no output on stdout or stderr
# executed command: tee /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/runtimes/runtimes-bins/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp.log
# note: command had no output on stdout or stderr
# executed command: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/FileCheck /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c
# .---command stderr------------
# | /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c:52:11: error: CHECK: expected string not found in input
# | // CHECK: WARNING: ThreadSanitizer: data race
# |           ^
# | <stdin>:1:1: note: scanning from here
# | DONE
# | ^
# | 
# | Input file: <stdin>
# | Check file: /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/openmp/tools/archer/tests/races/task-taskwait-nested.c
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |           1: DONE 
# | check:52     X~~~~ error: no match found
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

--

********************


ilovepi added a commit that referenced this pull request Aug 29, 2025
After #155957 pointed it out, we're actually using the frontend PGO in
the Fuchsia cmake cache files.
@boomanaiden154 boomanaiden154 added this to the LLVM 21.x Release milestone Sep 1, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in LLVM Release Status Sep 1, 2025
@boomanaiden154
Copy link
Contributor Author

/cherry-pick 7fca1f8

@llvmbot
Copy link
Member

llvmbot commented Sep 1, 2025

/pull-request #156271

@llvmbot llvmbot moved this from Needs Triage to Done in LLVM Release Status Sep 1, 2025
tru pushed a commit to llvmbot/llvm-project that referenced this pull request Sep 3, 2025
)

Currently the clang CMake caches use FE PGO for instrumentation
(LLVM_BUILD_INSTRUMENTED=ON). However, IRPGO is generally regarded as
better for performance. I am measuring about a 1.5% performance gain
when building libLLVMSupport.a using this configuration versus what
existed before this commit. I would suspect the gains are larger on
other platforms like Windows where we cannot subsume any gains using
PLO.

(cherry picked from commit 7fca1f8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category
Projects
Development

Successfully merging this pull request may close these issues.

4 participants