Skip to content

Shared: Model generator cleanup. #19311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Apr 28, 2025

Conversation

michaelnebel
Copy link
Contributor

@michaelnebel michaelnebel commented Apr 15, 2025

In this PR we

  • Re-factor the shared model generator implementation and put the heuristic queries in its own module.
  • Change the queries */utils/modelgenerator/summary-models to use the implementation from */utils/modelgenerator/mixed-summary-models and then delete the */utils/modelgenerator/mixed-summary-models queries. Also make the same change to the neutral capturing queries.
  • Remove the --with-mixed-* flags from the model generator python script.
  • Re-factor the model generator tests to use explicit summary model tags and change the neutral tests to use the combined neutral generation query.

This effectively means that when genering models with the GenerateFlowModels.py script using the flags --with-summaries and --with-neutrals now generates combined (or mixed as we used to call them) models based on both the content based and heuristic based data flow model generation.

@github-actions github-actions bot added C# Java Rust Pull requests that update Rust code labels Apr 15, 2025
@michaelnebel michaelnebel force-pushed the csharp/generatorcleanup branch 2 times, most recently from 3809598 to 3ef0f89 Compare April 16, 2025 09:12
@michaelnebel michaelnebel marked this pull request as ready for review April 16, 2025 11:49
@Copilot Copilot AI review requested due to automatic review settings April 16, 2025 11:49
@michaelnebel michaelnebel requested review from a team as code owners April 16, 2025 11:49
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the shared model generator by isolating heuristic queries into their own module, updating query annotations in test files, and removing legacy mixed-model queries and flags. Key changes include:

  • Replacing "summary" annotations with "heuristic-summary" in query comments across various test files.
  • Updating change note files to reflect the switch from mixed to combined (heuristic/content-based) models.
  • Removing deprecated flags from the model generator script.

Reviewed Changes

Copilot reviewed 49 out of 64 changed files in this pull request and generated no comments.

File Description
java/ql/src/change-notes/2025-04-16-model-generation.md Updated change note reflecting the query switch for Java models.
csharp/ql/test/utils/modelgenerator/dataflow/Summaries.cs Replaced legacy "summary" tags with "heuristic-summary" where appropriate.
csharp/ql/src/change-notes/2025-04-16-model-generation.md Updated change note reflecting the query switch for C# models.
Files not reviewed (15)
  • csharp/ql/src/utils/modelgenerator/CaptureMixedNeutralModels.ql: Language not supported
  • csharp/ql/src/utils/modelgenerator/CaptureMixedSummaryModels.ql: Language not supported
  • csharp/ql/src/utils/modelgenerator/CaptureNeutralModels.ql: Language not supported
  • csharp/ql/src/utils/modelgenerator/CaptureSinkModels.ql: Language not supported
  • csharp/ql/src/utils/modelgenerator/CaptureSourceModels.ql: Language not supported
  • csharp/ql/src/utils/modelgenerator/CaptureSummaryModels.ql: Language not supported
  • csharp/ql/src/utils/modelgenerator/debug/CaptureSummaryModelsPartialPath.ql: Language not supported
  • csharp/ql/src/utils/modelgenerator/debug/CaptureSummaryModelsPath.ql: Language not supported
  • csharp/ql/test/utils/modelgenerator/dataflow/CaptureHeuristicSummaryModels.ql: Language not supported
  • csharp/ql/test/utils/modelgenerator/dataflow/CaptureNeutralModels.ql: Language not supported
  • csharp/ql/test/utils/modelgenerator/dataflow/CaptureSinkModels.ql: Language not supported
  • csharp/ql/test/utils/modelgenerator/dataflow/CaptureSourceModels.ql: Language not supported
  • java/ql/src/utils/modelgenerator/CaptureMixedNeutralModels.ql: Language not supported
  • java/ql/src/utils/modelgenerator/CaptureMixedSummaryModels.ql: Language not supported
  • java/ql/src/utils/modelgenerator/CaptureNeutralModels.ql: Language not supported
Comments suppressed due to low confidence (2)

java/ql/src/change-notes/2025-04-16-model-generation.md:4

  • [nitpick] There appears to be a potential naming inconsistency: the change note references 'GenerateFlowModel.py', while the PR description uses 'GenerateFlowModels.py'. Consider standardizing the script name for clarity.
* Changes to the MaD model generation infrastructure: Changed the query `java/utils/modelgenerator/summary-models` to use the implementation from `java/utils/modelgenerator/mixed-summary-models`. Removed the now-redundant `java/utils/modelgenerator/mixed-summary-models` query. Similar replacement was made for `java/utils/modelgenerator/neutral-models`. That is, if `GenerateFlowModel.py` is provided with `--with-summaries` combined/mixed models are now generated instead of heuristic models (and similar for `--with-neutrals`).

csharp/ql/src/change-notes/2025-04-16-model-generation.md:4

  • [nitpick] There appears to be a potential naming inconsistency: the change note references 'GenerateFlowModel.py', while the PR description uses 'GenerateFlowModels.py'. Consider standardizing the script name for clarity.
* Changes to the MaD model generation infrastructure: Changed the query `cs/utils/modelgenerator/summary-models` to use the implementation from `cs/utils/modelgenerator/mixed-summary-models`. Removed the now-redundant `cs/utils/modelgenerator/mixed-summary-models` query. Similar replacement was made for `cs/utils/modelgenerator/neutral-models`. That is, if `GenerateFlowModel.py` is provided with `--with-summaries` combined/mixed models are now generated instead of heuristic models (and similar for `--with-neutrals`).

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paldepind FYI as you've been using this in Rust.

---
category: minorAnalysis
---
* Changes to the MaD model generation infrastructure: Changed the query `rust/utils/modelgenerator/summary-models` to use the implementation from `rust/utils/modelgenerator/mixed-summary-models`. Removed the now-redundant `rust/utils/modelgenerator/mixed-summary-models` query. Similar replacement was made for `rust/utils/modelgenerator/neutral-models`. That is, if `GenerateFlowModel.py` is provided with `--with-summaries` combined/mixed models are now generated instead of heuristic models (and similar for `--with-neutrals`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not actually doing change notes for Rust yet, because we're still in preview - though the symmetry with other languages is nice and I can't see this one doing any harm.

@@ -943,7 +1041,7 @@ module MakeModelGenerator<
* 2. If content based flow does not yield any summary for an API, then we try and
* generate flow summaries using the non-content based summary generator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to use the word "heuristic" here to refer to what is Heuristic::captureFlow?

Suggested change
* generate flow summaries using the non-content based summary generator.
* generate flow summaries using the heuristic summary generator.

Maybe it would also make sense to change the "heuristic" three lines above to a different word ("approach"?) as it's not the "heuristic" in Heuristic::captureFlow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

/**
* Gets the source model(s) of `api`, if there is flow from an existing known source to the return of `api`.
*/
string captureSource(DataFlowSourceTargetApi api) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do captureSource and captureSink also use heuristics? I was under the impression they just used normal data flow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dataflow configuration for both of these implements the isAdditionalFlowStep predicate like

predicate isAdditionalFlowStep(DataFlow::Node node1, DataFlow::Node node2) {
  isRelevantTaintStep(node1, node2)
}

The isRelevantTaintStep includes some read/store logic for fields which causes field conflation. In any case for some languages we can't avoid this, as declaring a field of an Argument or a ReturnValue as a source or sink doesn't work (this is not supported for C#).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a consequence the generated models are somewhat imprecise (so I think it is ok to use the term Heuristic for these as well).

@michaelnebel michaelnebel force-pushed the csharp/generatorcleanup branch from 43f3d5e to af5e36a Compare April 23, 2025 14:59
@michaelnebel
Copy link
Contributor Author

Putting back in draft, as the C++ model generator has been merged - so this needs to be handled in this PR as well.

@michaelnebel michaelnebel marked this pull request as draft April 23, 2025 15:00
@michaelnebel michaelnebel force-pushed the csharp/generatorcleanup branch from af5e36a to 27165ef Compare April 24, 2025 12:11
@github-actions github-actions bot added the C++ label Apr 24, 2025
@michaelnebel michaelnebel marked this pull request as ready for review April 24, 2025 12:45
@michaelnebel michaelnebel requested a review from a team as a code owner April 24, 2025 12:45
Comment on lines 1 to 4
---
category: minorAnalysis
---
* Changes to the MaD model generation infrastructure: Changed the query `cpp/utils/modelgenerator/summary-models` to use the implementation from `cpp/utils/modelgenerator/mixed-summary-models`. Removed the now-redundant `cpp/utils/modelgenerator/mixed-summary-models` query. Similar replacement was made for `cpp/utils/modelgenerator/neutral-models`. That is, if `GenerateFlowModel.py` is provided with `--with-summaries` combined/mixed models are now generated instead of heuristic models (and similar for `--with-neutrals`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can omit this for C++, as we didn't even add a change note when we added the infrastructure. If we do want to have a change note, it should probably say that we've added the infrastructure.

@michaelnebel michaelnebel force-pushed the csharp/generatorcleanup branch from 27165ef to 97d78f4 Compare April 24, 2025 12:54
jketema
jketema previously approved these changes Apr 24, 2025
Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ 👍 I do not have an opinion on any other of the changes.

@michaelnebel
Copy link
Contributor Author

The failing CPP integration test is unrelated.
The PR is ready for review. There is a subsequent PR that builds on top of this one: #19382

@michaelnebel michaelnebel force-pushed the csharp/generatorcleanup branch from 97d78f4 to 97ac804 Compare April 25, 2025 11:34
paldepind
paldepind previously approved these changes Apr 25, 2025
Copy link
Contributor

@paldepind paldepind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust changes LGTM 🎉

…adjust other queries to the re-factored implementation.
@michaelnebel michaelnebel dismissed stale reviews from paldepind and jketema via de12222 April 25, 2025 13:53
@michaelnebel michaelnebel force-pushed the csharp/generatorcleanup branch from 97ac804 to de12222 Compare April 25, 2025 13:53
Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java changes look plausible

@michaelnebel
Copy link
Contributor Author

Merging now as it has previously been approved for C++ and Rust as well.

@michaelnebel michaelnebel merged commit 8857f0d into github:main Apr 28, 2025
63 checks passed
@michaelnebel michaelnebel deleted the csharp/generatorcleanup branch April 28, 2025 08:36
MathiasVP added a commit to MathiasVP/ql that referenced this pull request Apr 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C# C++ documentation Java Rust Pull requests that update Rust code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants