[PGO] Add llvm.loop.estimated_trip_count metadata #152775

jdenny-ornl · 2025-08-08T18:31:40Z

This patch implements the llvm.loop.estimated_trip_count metadata discussed in [RFC] Fix Loop Transformations to Preserve Block Frequencies. As the RFC explains, that metadata enables future patches, such as PR #128785, to fix block frequency issues without losing estimated trip counts.

This patch implements the `llvm.loop.estimated_trip_count` metadata discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785). As [suggested in the RFC comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4), it adds the new metadata to all loops at the time of profile ingestion and estimates each trip count from the loop's `branch_weights` metadata. As [suggested in the PR#128785 review](#128785 (comment)), it does so via a `PGOEstimateTripCountsPass` pass, which creates the new metadata for the loop but omits the value if it cannot estimate a trip count due to the loop's form. An important observation not previously discussed is that `PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count but later passes can transform the loop in a way that makes it possible. Currently, such passes do not necessarily update the metadata, but eventually that should be fixed. Until then, if the new metadata has no value, `llvm::getLoopEstimatedTripCount` disregards it and tries again to estimate the trip count from the loop's `branch_weights` metadata.

Somehow, on some of my builds, `llvm::` prefixes are dropped from some symbol names in the printed past list.

For estimated trip count metdata.

That was applied after merging this PR last time.

Suggested at <#148758 (comment)>.

jdenny-ornl · 2025-08-08T18:32:21Z

This resurrects PR #148758. I'll address previous issues shortly.

github-actions · 2025-08-08T18:44:46Z

✅ With the latest revision this PR passed the C/C++ code formatter.

jdenny-ornl · 2025-08-08T19:05:48Z

I believe I have addressed most issues raised in PR #148758.

For now, this PR shows what the pass pipeline tests (e.g., llvm/test/Other/new-pm-defaults.ll) look like after recent changes. As @nikic pointed out, there are still changes to where DominatorTreeAnalysis and LoopAnalysis run. LastRunTrackingAnalysis too.

I have not addressed the general issue raised there about whether pgo-estimated-trip-counts is actually worthwhile and whether we should constrain it or disable it by default. What do people think now after seeing the above changes?

nikic · 2025-08-08T19:10:16Z

~~Please rebase out the merge commits so I can test this.~~ Found a way to apply it.

nikic · 2025-08-13T20:29:51Z

I have not addressed the general issue raised there about whether pgo-estimated-trip-counts is actually worthwhile and whether we should constrain it or disable it by default. What do people think now after seeing the above changes?

I think we should at least start by not having the pass and only doing on the fly updates. I assume that this patch is intended to be NFC-ish, but it does cause codegen changes, and as-is it's hard to tell whether that's because of the new loop metadata that is present everywhere, or due to divergence between branch weights and the metadata. (A sample to look at would be libclamav_nsis_LZMADecode.c from llvm-test-suite.)

jdenny-ornl · 2025-08-14T21:17:34Z

I assume that this patch is intended to be NFC-ish, but it does cause codegen changes, and as-is it's hard to tell whether that's because of the new loop metadata that is present everywhere, or due to divergence between branch weights and the metadata. (A sample to look at would be libclamav_nsis_LZMADecode.c from llvm-test-suite.)

Thanks. I took a look at libclamav_nsis_LZMADecode.c, using the results you previously sent as a reference. For -O3, that shows that the previously landed version of this PR reduced the instruction count by 25.84%.

Following the instructions at the About page there, I used valgrind --tool=callgrind to try to reproduce the results.

For the current PR, I saw the following summary:

==2593500== I   refs:      909,055,484

I commented out all runs of PGOEstimateTripCountsPass and saw:

==2600705== I   refs:      1,203,023,203

So running PGOEstimateTripCountsPass still has roughly the same impact as in the previous results: 24.4% reduction.

I then tried to determine what in PGOEstimateTripCountsPass causes the change. In setLoopEstimatedTripCount's addStringMetadataToLoop call, I renamed the loop metadata from llvm.loop.estimated_trip_count to foo, which I believe should be semantically ignored by the rest of LLVM, including getLoopEstimatedTripCount:

==2674918== I   refs:      909,054,727

So, it appears that it is the up-front loop metadata creation that produces the change rather any changes related to estimated trip counts or branch weights.

To rule out anything else from the PR, I went back to the main branch and added a new function pass into the pipeline where this PR adds PGOEstimateTripCountsPass. The new pass merely calls addStringMetadataToLoop to add the bogus "foo" loop metadata to all loops. For any function containing loops, it preserves only CFGAnalyses and LazyCallGraphAnalysis, and it preserves all analyses otherwise.

==2797012== I   refs:      909,223,730

I then removed the addStringMetadataToLoop call from that pass to confirm that it not the analysis invalidation causes the change:

==2800546== I   refs:      1,206,601,523

I think we should at least start by not having the pass and only doing on the fly updates.

Do you think we should remove the pass or just disable it by default, as was previously suggested? I am leery of the latter because I am afraid it will just end up being unused code, unless someone else commits to continuing to investigate it.

nikic · 2025-08-17T15:00:40Z

@jdenny-ornl Thanks for looking into it. So this is indeed related to the fact that merely adding loop metadata can change optimization behavior. A likely culprit for this is this hack:

llvm-project/llvm/lib/Transforms/Utils/Local.cpp

Line 1200 in 65ffa53

// 'BB' and 'BB->Pred' are loop latches, bail out to presrve inner loop

With that in mind, I don't think we should be doing early attachment of this metadata unless this issue can be fixed (as the comment indicates, by introducing block metadata).

Adding it lazily can also cause issues, but I'm hoping it would be to a lesser degree than adding it to every single loop. We should probably look out for the case where the latch is part of two loops though, so we don't mistakenly assign the same trip count to both an inner and an outer loop.

Do you think we should remove the pass or just disable it by default, as was previously suggested? I am leery of the latter because I am afraid it will just end up being unused code, unless someone else commits to continuing to investigate it.

I agree. Based on past experience, either a pass needs to be enabled right away, or it will never get enabled.

jdenny-ornl · 2025-08-18T23:32:35Z

@jdenny-ornl Thanks for looking into it. So this is indeed related to the fact that merely adding loop metadata can change optimization behavior. A likely culprit for this is this hack:

llvm-project/llvm/lib/Transforms/Utils/Local.cpp

Line 1200 in 65ffa53

// 'BB' and 'BB->Pred' are loop latches, bail out to presrve inner loop

With that in mind, I don't think we should be doing early attachment of this metadata unless this issue can be fixed (as the comment indicates, by introducing block metadata).

Thanks for the reference. Fixing that issue seems orthogonal to my BFI fixes, and it would surely be a major distraction from them.

Adding it lazily can also cause issues, but I'm hoping it would be to a lesser degree than adding it to every single loop. We should probably look out for the case where the latch is part of two loops though, so we don't mistakenly assign the same trip count to both an inner and an outer loop.

With or without this PR, {get,set}LoopEstimatedTripCount operate on the branch weights of a loop's only latch (and they reject loops with more latches). My expectation is that merely moving the estimated trip count to loop metadata on that latch for each of the same loops should not worsen the above issue.

Do you think we should remove the pass or just disable it by default, as was previously suggested? I am leery of the latter because I am afraid it will just end up being unused code, unless someone else commits to continuing to investigate it.

I agree. Based on past experience, either a pass needs to be enabled right away, or it will never get enabled.

I have dropped PGOEstimateTripCountsPass for now. If someone implements header block metadata for loops in the future, we can revisit the possibility of PGOEstimateTripCountsPass then.

nikic

Confirming that I don't see any compile-time or codegen changes with the new version.

nikic · 2025-08-19T10:50:57Z

llvm/docs/LangRef.rst

+
+This metadata records an estimated trip count for the loop.  The first operand
+is the string ``llvm.loop.estimated_trip_count``.  The second operand is an
+integer constant of type ``i32`` or smaller specifying the estimate.  For


What's the motivation for allowing "or smaller"?

I set a maximum when I realized the current implementation cannot handle anything wider. For flexibility, I did not set a minimum, but I have no specific use case in mind. Do you prefer i32 only? I think that would be fine for me.

nikic · 2025-08-19T10:53:29Z

llvm/test/Verifier/llvm.loop.estimated_trip_count.ll

+; Test "llvm.loop.estimated_trip_count" validation
+
+; DEFINE: %{RUN} = opt -passes=verify %t -disable-output 2>&1 | \
+; DEFINE:   FileCheck %s -allow-empty -check-prefix


The idiomatic way to do this is split-file. Tests that don't actually contain the tested code are inconvenient.

The idiomatic way to do this is split-file.

Originally I coded it that way, but I did not like the repetition of the boilerplate code. I prefer the way the current version isolates what is unique about each case.

Tests that don't actually contain the tested code are inconvenient.

As a general rule of thumb, sure. But what is the inconvenience in this test file?

nikic · 2025-08-19T11:01:52Z

With or without this PR, {get,set}LoopEstimatedTripCount operate on the branch weights of a loop's only latch (and they reject loops with more latches). My expectation is that merely moving the estimated trip count to loop metadata on that latch for each of the same loops should not worsen the above issue.

To be clear, the concern is not about loops with multiple latches, but two loops sharing one latch. You can have an inner loop latch that exits into an outer loop and is the latch of the outer loop. IIRC that's possible even in loop simplify form.

jdenny-ornl · 2025-08-19T20:13:18Z

With or without this PR, {get,set}LoopEstimatedTripCount operate on the branch weights of a loop's only latch (and they reject loops with more latches). My expectation is that merely moving the estimated trip count to loop metadata on that latch for each of the same loops should not worsen the above issue.

To be clear, the concern is not about loops with multiple latches, but two loops sharing one latch. You can have an inner loop latch that exits into an outer loop and is the latch of the outer loop. IIRC that's possible even in loop simplify form.

I was trying to make a point about how, for a loop with one latch, all this PR is doing is changing which metadata on that latch encodes the estimated trip count.

After further thought, I realize that the branch_weights metadata on a latch distinguishes an outer loop from an inner loop because each loop header is a different successor of the latch. But llvm.loop.estimated_trip_count does not distinguish the loops.

I think I have a workaround. I have adjusted getExpectedExitLoopLatchBranch to guard {get,set}LoopEstimatedTripCount as it does without this PR. getExpectedExitLoopLatchBranch will not recognize the same latch for both loops unless the latch exits both loops and has only two successors. However, to exit both loops, the latch must have at least three successors: the inner loop header, the outer loop header (exit for the inner loop), and an exit for the outer loop.

Does that seem reasonable?

llvm/lib/Transforms/Utils/LoopUtils.cpp

jdenny-ornl · 2025-08-29T14:41:24Z

@nikic Any further thoughts on this PR?

jdenny-ornl added 16 commits July 14, 2025 20:42

Merge branch 'main' into pgo-estimated-trip-count

db5920a

Add PGOEstimateTripCounts in more cases

47fbe85

Add unused initialization

f8097fb

Simplify some test changes

7b27203

Extend verify pass to cover new metadata

4c4669a

Fix test for some builds

0f40efd

Somehow, on some of my builds, `llvm::` prefixes are dropped from some symbol names in the printed past list.

Merge branch 'main' into pgo-estimated-trip-count

2791a1c

Apply some small reviewer suggestions

6148922

Attempt to fix windows pre-commit CI

3a49b43

Merge branch 'main' into pgo-estimated-trip-count

2f7daa8

Merge branch 'main' into pgo-estimated-trip-count

c627fc5

Run update script on new test from last merge

f1fa8d9

For estimated trip count metdata.

Reapply 3a18fe3

38ace1e

That was applied after merging this PR last time.

Convert to function pass, avoid needless pass invalidation

92ddaa0

Suggested at <#148758 (comment)>.

Fix layering violation

a3e0d72

Suggested at <#148758 (comment)>.

jdenny-ornl requested review from MatzeB, arsenm, nikic, fhahn, mtrofin and jdoerfert August 8, 2025 18:31

jdenny-ornl mentioned this pull request Aug 8, 2025

[PGO] Add llvm.loop.estimated_trip_count metadata #148758

Merged

Apply clang-format

67f22cd

jdenny-ornl added 2 commits August 9, 2025 10:18

Merge branch 'main' into pgo-estimated-trip-count

f0ff2e2

Merge branch 'main' into pgo-estimated-trip-count

e7eb1fe

jdenny-ornl force-pushed the users/jdenny-ornl/pgo-estimated-trip-count branch from ff9730e to e7eb1fe Compare August 14, 2025 15:10

jdenny-ornl added 2 commits August 18, 2025 12:42

Merge branch 'main' into pgo-estimated-trip-count

0973ab3

Remove PGOEstimateTripCountsPass and no-value form of metadata

680bdc2

nikic reviewed Aug 19, 2025

View reviewed changes

Fix case where nested loops share latch

59cd184

jdenny-ornl commented Aug 19, 2025

View reviewed changes

llvm/lib/Transforms/Utils/LoopUtils.cpp Outdated Show resolved Hide resolved

jdenny-ornl added 3 commits August 25, 2025 13:43

Merge branch 'main' into pgo-estimated-trip-count

5d00250

Remove redundant code

5719779

Clarify recent comments some

98cab7b

[PGO] Add llvm.loop.estimated_trip_count metadata #152775

Are you sure you want to change the base?

[PGO] Add llvm.loop.estimated_trip_count metadata #152775

Conversation

jdenny-ornl commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdenny-ornl commented Aug 8, 2025

Uh oh!

github-actions bot commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdenny-ornl commented Aug 8, 2025

Uh oh!

nikic commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic commented Aug 13, 2025

Uh oh!

jdenny-ornl commented Aug 14, 2025

Uh oh!

nikic commented Aug 17, 2025

Uh oh!

jdenny-ornl commented Aug 18, 2025

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

nikic Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

jdenny-ornl Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

nikic Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

jdenny-ornl Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

nikic commented Aug 19, 2025

Uh oh!

jdenny-ornl commented Aug 19, 2025

Uh oh!

Uh oh!

jdenny-ornl commented Aug 29, 2025

Uh oh!

Uh oh!

jdenny-ornl commented Aug 8, 2025 •

edited

Loading

github-actions bot commented Aug 8, 2025 •

edited

Loading

nikic commented Aug 8, 2025 •

edited

Loading