Skip to content

C++: Speed up alias analysis #17062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 26, 2024
Merged

Conversation

MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Jul 24, 2024

This PR fixes a performance problem that has always been present in the IR alias analysis, but which became much more likely to happen after merging #16139.

The problem

The core problem is here. The hasNonPhiDefinition predicate contains a useLocation column whose only restriction is that it overlaps with defLocation. So if many useLocations overlap with the defLocation this predicate becomes very large.

The reason this became a problem when we merged #16139 is that many more things started to escape the alias analysis, and thus many more things started to overlap.

The fix

This PR fixes the problem by identifying which memory locations will cause an explosion in hasNonPhiDefinition, and then removing those MemoryLocations from the universe of MemoryLocations. This speeds up alias analysis significantly on some projects, at the cost of not being able to properly resolve def-use information for these memory locations.

Test diff

Unfortunately, we need quite a large test to trigger the effect of this PR. So GitHub doesn't really want to show the diff on the .expected changes 😂 I've pasted the relevant diffs to https://www.diffchecker.com/VMt0M8HK/.

@MathiasVP MathiasVP requested a review from a team as a code owner July 24, 2024 13:46
@github-actions github-actions bot added the C++ label Jul 24, 2024
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Jul 24, 2024
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes LGTM.

DCA shows a performance improvement at the cost of three results. They look like good results at a glance. Assuming these are lost due to the changes (not wobble), is there anything we can do? Should the threshold (1024) be a little higher perhaps???

What do you make of the changes in CPP IR inconsistencies?

Also should this have a change note, since it can affect results?

@MathiasVP
Copy link
Contributor Author

Code changes LGTM.

DCA shows a performance improvement at the cost of three results. They look like good results at a glance. Assuming these are lost due to the changes (not wobble), is there anything we can do? Should the threshold (1024) be a little higher perhaps???

We could make the threshold higher, but I don't think it's really worth tuning this number too much since the choice was fairly arbitrary.

What do you make of the changes in CPP IR inconsistencies?

The changes to IR consistencies are all in the b_missingOperandType type column. This means that there are some operands for which operand.getType() no longer has a result. An operand gets its type from its defining instruction, and the reason we're getting more consistency errors here are because we're bailing out on trying to resolve all the def-use information in alias analysis when we think it will be too costly to compute this. So unfortunately, I think we have to live with these extra consistency errors.

It may be possible to silence the consistency errors arising from this PR by excluding "missing type information caused by bailing out of alias analysis" if we want.

It's a bit of a double edged sword, though: we'd obviously like to know about these since they are genuinely consistency problems. However, there's also a risk of drowning in "spurious" consistency errors so that we don't notice some new actual consistency errors.

I could go either way on this. What do you think?

Also should this have a change note, since it can affect results?

I can write one, sure. It's probably going to be something vague as:

Improvement the performance of the alias analysis of large function bodies. As a result, alerts that depend on alias analysis of large function bodies may no longer be found.

I think this phrasing sounds a little bit too alarming given that we're losing 3 alerts across thousands of alerts on ~30 projects. What do you think?

@geoffw0
Copy link
Contributor

geoffw0 commented Jul 25, 2024

I don't think it's really worth tuning this number too much since the choice was fairly arbitrary.

OK, I'm going to do a brief MRVA investigation to see if this is affecting many results elsewhere. My intuition is that it is worth tuning this number, but that may just be because I'm used to Swift DCA where results are sparse and any change in them can be considered a fairly strong signal - not really the case for CPP.

It may be possible to silence the consistency errors arising from this PR by excluding "missing type information caused by bailing out of alias analysis" if we want.
...
I could go either way on this. What do you think?

I don't feel strongly about this. Happy with the consistency errors being visible.

Improvement the performance of the alias analysis of large function bodies. As a result, alerts that depend on alias
analysis of large function bodies may no longer be found.

I think this phrasing sounds a little bit too alarming given that we're losing 3 alerts across thousands of alerts on ~30
projects. What do you think?

How about:

Improved performance of alias analysis of large function bodies. In rare cases, alerts
that depend on alias analysis of large function bodies may be affected.

@MathiasVP
Copy link
Contributor Author

MathiasVP commented Jul 25, 2024

I don't think it's really worth tuning this number too much since the choice was fairly arbitrary.

OK, I'm going to do a brief MRVA investigation to see if this is affecting many results elsewhere. My intuition is that it is worth tuning this number, but that may just be because I'm used to Swift DCA where results are sparse and any change in them can be considered a fairly strong signal - not really the case for CPP.

That's fair. What I suggest doing is to evaluate the numberOfOverlappingUses predicate from SSAConstruction and see the distribution. I expect that 99% of them will be way lower than the 1024 threshold, and we then need to figure out how much above 1024 we need to go to not reintroduce the performance regression. For the nlohmann/json database I was investigating the highest numbers were ~4000. So we certainly don't want to go that far up.

It may be possible to silence the consistency errors arising from this PR by excluding "missing type information caused by bailing out of alias analysis" if we want.
...
I could go either way on this. What do you think?

I don't feel strongly about this. Happy with the consistency errors being visible.

👍

Improvement the performance of the alias analysis of large function bodies. As a result, alerts that depend on alias
analysis of large function bodies may no longer be found.

I think this phrasing sounds a little bit too alarming given that we're losing 3 alerts across thousands of alerts on ~30
projects. What do you think?

How about:

Improved performance of alias analysis of large function bodies. In rare cases, alerts
that depend on alias analysis of large function bodies may be affected.

I like that! I'll add such a change note.

@MathiasVP MathiasVP removed the no-change-note-required This PR does not need a change note label Jul 25, 2024
Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this, but @geoffw0 should approve too.

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigation - Performance vs Results vs Threshold Value

We have three (closely related) lost results on DCA. I did some fairly arbitrary before + after query runs on the MRVA-100 to cast a wider net and found no further result changes, which is reassuring - though this wasn't quite the scale I was hoping for:

query results prior to change* results after change*, threshold = 1024
cpp/bad-strncpy-size (the one that was affected on DCA) 1 1
cpp/double-free 11 11
cpp/suspicious-allocation-size 6 6
cpp/overrunning-write 26 26
cpp/no-space-for-terminator 2 2
cpp/uncontrolled-allocation-size 558 558

* - total number of results on MRVA-100


Here's a breakdown of the range of numberOfOverlappingUses values we're seeing, next to the number of projects with a most overlapping memory location in that range. I think it's fair to say that 1024 is towards the lower end of thresholds we should consider, as quite a few projects are theoretically affected at this level:

largest numberOfOverlappingUses MRVA-100 projects
< 512 49
512-1023 15
1024-2047 8
2048-4095 16
4096-8191 4
8192+ 6

I selected google/libphonenumber as the project from the MRVA-100 with by far the most memory locations in the largest two buckets, and nlohmann/json as the project where we originally saw this issue. nlohmann/json has clusters of exactly 770, 1,105 and 2,715 overlaps - plus a few memory locations with up to 8,136 overlaps. I ran cpp/redundant-null-check-simple on them (a relatively simple query that computes alias analysis) with various overlap thresholds:

threshold time on google/libphonenumber time on nlohmann/json
512 (not run) 57s
1024 26s 63s
2048 27s 88s
2716 (not run) 136s
4096 27s 221s
8192 27s 364s
no threshold 27s 370s

Bringing all of that together, I would like to recommend (in line with my earlier intuition) a threshold higher than 1024 so that we don't risk affecting query results more often than necessary. 2048 or even 4096 would be nice. However the last experiment shows that 2048 is affecting performance just a little bit (with a risk of problems in more extreme projects), and 4096 is clearly not really acceptable.

So I think the safe option is 1024, the braver option is 2048, and I approve merging this PR with any value in the range 1024 - 2048 (which includes where we are now).

@MathiasVP
Copy link
Contributor Author

Thanks for the detailed analysis @geoffw0 🚀 I share your view that we can probably raise this to slightly more than 1024. However, in order to unblock the customer I'd like to start off at 1024, and we can then ask them to try out a version with a slightly higher threshold. Alternatively, we can also make this an extensibly defined value (i.e., from a .yml file) in the future.

@MathiasVP MathiasVP merged commit c0263be into github:main Jul 26, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants