C++: Fix infinite range analysis loop on invalid SSA #19477

MathiasVP · 2025-05-12T19:02:18Z

As part of the IR SSA/alias analysis we compute the which memory allocations overlap. In #17062 we mitigated a problem where, in very large function bodies, the set of overlapping memory allocations was basically a cartesian product.

As a result of this, we don't always have proper SSA information on these large function bodies. It's well-known that even one missing SSA edge can cause the shared range analysis to loop infinitely.

This PR excludes the functions with incomplete SSA information from the new range analysis. Ideally, I would like us to move away from using the IR SSA/alias analysis in the new range analysis library, but that seemed like too big of a task to start right now.

On a customer codebase there were 3 functions in the entire database which contained incomplete SSA and this caused the range analysis to loop infinitely in any of these three functions.

I haven't managed to reproduce a test for this. However, I will make sure that we have testing for this in DCA once this PR is merged.

redsun82

Thanks for this work!

as an external viewer without much QL knowledge, it does seem there is some existing duplication in this code, among different versions of the IR implementation. I wonder if one could factor out some of that duplication?

Also, do you think there's any way to test this, or the code triggering this is too complex to easily put in a test? Can we have this covered in DCA? I think I recall you found this could be related to an endless loop in the SQL repo?

I will leave more knowledgeable QL devs to approve this.

redsun82 · 2025-05-13T06:25:50Z

cpp/ql/lib/semmle/code/cpp/ir/implementation/aliased_ssa/internal/SSAConstruction.qll

@@ -1319,6 +1319,13 @@ predicate canReuseSsaForMemoryResult(Instruction instruction) {
  // We don't support reusing SSA for any location that could create a `Chi` instruction.
 }

+predicate hasIncompleteSsa(IRFunction f) {


I understand this may be more or less an implementation detail, but should we still put a doc comment here?

Happy to do that. Note that this is still a draft since I wanted to check the implications on DCA before I made this more pretty 🙂

I've force-pushed some QLDoc in 510df38

…has correctly modelled SSA information.

MathiasVP · 2025-05-13T10:07:56Z

as an external viewer without much QL knowledge, it does seem there is some existing duplication in this code, among different versions of the IR implementation. I wonder if one could factor out some of that duplication?

The term you're looking for here is "parameterized modules" 😂

The IR implementation precedes parameterized modules and no one has found time to rewrite the IR to use parameterized modules yet. So right now the IR is "pyrameterized" via identical-files.json (so the duplication is actually not that bad since there's a script and a CI job to ensure that they're always identical)

Also, do you think there's any way to test this, or the code triggering this is too complex to easily put in a test? Can we have this covered in DCA? I think I recall you found this could be related to an endless loop in the SQL repo?

Yeah, I plan on bumping a SHA on one of the projects in DCA once this PR goes in to make sure we have a test for this in DCA. I can't do that before this PR goes in since we can't test the "base SHA" since that will hit an infinite loop on the current main.

MathiasVP · 2025-05-13T10:24:04Z

DCA looks uneventful. No performance implications (as expected), and no result changes (since this affects such a small number of function bodies).

jketema · 2025-05-13T10:36:57Z

DCA does seem to show some slowdown? Just over 10% for abseil on Linux, as the worst offender.

MathiasVP · 2025-05-13T10:38:50Z

DCA does seem to show some slowdown? Just over 10% for abseil on Linux, as the worst offender.

Oh! You're right! Why wasn't that in Summary summaries? 😭 Anyway, thanks for the heads up. I'll take a look right away.

I suppose the join order issue highlighted by DCA in RangeAnalysisImpl::ModulusAnalysisInstantiated::ssaModulus is relevant 😅

jketema · 2025-05-13T10:40:11Z

DCA does seem to show some slowdown? Just over 10% for abseil on Linux, as the worst offender.

Oh! You're right! Why wasn't that in Summary summaries? 😭 Anyway, thanks for the heads up. I'll take a look right away.

Because it never is...

redsun82 · 2025-05-13T10:50:42Z

Oh! You're right! Why wasn't that in Summary summaries? 😭 Anyway, thanks for the heads up. I'll take a look right away.

Incidentally, it's the same question I asked yesterday here. As far as I could tell, we swept that kind of landing page alerts under a pretty high threshold in order to avoid wobble. That does seem to make the landing page less useful than it should...

I wonder if the wobble we tried to silence was platform specific (i.e. windows maybe?)

``` Evaluated relational algebra for predicate SsaInternals::getDefImpl/1#1ed4f567@65628fbv with tuple counts: 4935102 ~5% {4} r1 = SCAN `SsaInternals::SsaImpl::Definition.definesAt/3#dispred#7eea4c8f` OUTPUT In.2, In.3, In.0, In.1 104274503 ~1% {3} | JOIN WITH `SsaInternals::DefImpl.hasIndexInBlock/2#dispred#30a6c29f_120#join_rhs` ON FIRST 2 OUTPUT Rhs.2, Lhs.3, Lhs.2 4921319 ~2% {2} | JOIN WITH `SsaInternals::DefImpl.getSourceVariable/0#dispred#72437659` ON FIRST 2 OUTPUT Lhs.2, Lhs.0 return r1 ``` After: ``` Evaluated relational algebra for predicate SsaInternals::SsaImpl::Definition.definesAt/3#dispred#7eea4c8f_1230#join_rhs@b280fb5h with tuple counts: 4935102 ~3% {4} r1 = SCAN `SsaInternals::SsaImpl::Definition.definesAt/3#dispred#7eea4c8f` OUTPUT In.1, In.2, In.3, In.0 return r1 Evaluated relational algebra for predicate SsaInternals::DefImpl.hasIndexInBlock/3#dispred#31d295aa_1230#join_rhs@2be655s4 with tuple counts: 5634706 ~1% {4} r1 = SCAN `SsaInternals::DefImpl.hasIndexInBlock/3#dispred#31d295aa` OUTPUT In.1, In.2, In.3, In.0 return r1 Evaluated relational algebra for predicate SsaInternals::getDefImpl/1#1ed4f567@8afa36uu with tuple counts: 4921319 ~2% {2} r1 = JOIN `SsaInternals::SsaImpl::Definition.definesAt/3#dispred#7eea4c8f_1230#join_rhs` WITH `SsaInternals::DefImpl.hasIndexInBlock/3#dispred#31d295aa_1230#join_rhs` ON FIRST 3 OUTPUT Lhs.3, Rhs.3 return r1 ```

MathiasVP · 2025-05-13T13:26:06Z

DCA does seem to show some slowdown? Just over 10% for abseil on Linux, as the worst offender.

Should hopefully be fixed by 0836f0b. There were two issues:

A small join order issue
I had placed the predicate outside the cached module which added an extra stage along with some re-evaluation.

Additionally, I found another small join order issue that I fixed in f255fc2.

I'll rerun DCA and see if the performance has been fixed 🤞

MathiasVP added 2 commits May 12, 2025 19:43

C++: Expose 'MemoryLocation0'.

e51cb47

C++: Expose 'isBusyDef'.

f1b4e05

github-actions bot added the C++ label May 12, 2025

redsun82 reviewed May 13, 2025

View reviewed changes

MathiasVP added 2 commits May 13, 2025 10:54

C++: Add an 'hasIncompleteSsa' predicate to check whether a function …

510df38

…has correctly modelled SSA information.

C++: Filter out instructions with incomplete SSA in range analysis.

9d2eb3d

MathiasVP force-pushed the fix-infinite-range-analysis-on-incomplete-ssa branch from dfdeeb8 to 9d2eb3d Compare May 13, 2025 10:14

MathiasVP marked this pull request as ready for review May 13, 2025 10:24

MathiasVP requested a review from a team as a code owner May 13, 2025 10:24

C++: Add change note.

c3c18bd

github-actions bot added the documentation label May 13, 2025

MathiasVP added 2 commits May 13, 2025 13:41

C++: Cache and fix join order in 'hasIncompleteSsa'.

0836f0b

jketema approved these changes May 13, 2025

View reviewed changes

MathiasVP merged commit fa79423 into github:main May 13, 2025
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

C++: Fix infinite range analysis loop on invalid SSA #19477

C++: Fix infinite range analysis loop on invalid SSA #19477

Uh oh!

MathiasVP commented May 12, 2025 •

edited

Loading

Uh oh!

redsun82 left a comment

Uh oh!

redsun82 May 13, 2025

Uh oh!

MathiasVP May 13, 2025

Uh oh!

MathiasVP May 13, 2025

Uh oh!

MathiasVP commented May 13, 2025 •

edited

Loading

Uh oh!

MathiasVP commented May 13, 2025

Uh oh!

jketema commented May 13, 2025

Uh oh!

MathiasVP commented May 13, 2025 •

edited

Loading

Uh oh!

jketema commented May 13, 2025

Uh oh!

redsun82 commented May 13, 2025

Uh oh!

MathiasVP commented May 13, 2025

Uh oh!

Uh oh!

Uh oh!

C++: Fix infinite range analysis loop on invalid SSA #19477

C++: Fix infinite range analysis loop on invalid SSA #19477

Uh oh!

Conversation

MathiasVP commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

redsun82 left a comment

Choose a reason for hiding this comment

Uh oh!

redsun82 May 13, 2025

Choose a reason for hiding this comment

Uh oh!

MathiasVP May 13, 2025

Choose a reason for hiding this comment

Uh oh!

MathiasVP May 13, 2025

Choose a reason for hiding this comment

Uh oh!

MathiasVP commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MathiasVP commented May 13, 2025

Uh oh!

jketema commented May 13, 2025

Uh oh!

MathiasVP commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jketema commented May 13, 2025

Uh oh!

redsun82 commented May 13, 2025

Uh oh!

MathiasVP commented May 13, 2025

Uh oh!

Uh oh!

Uh oh!

MathiasVP commented May 12, 2025 •

edited

Loading

MathiasVP commented May 13, 2025 •

edited

Loading

MathiasVP commented May 13, 2025 •

edited

Loading