-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Shared: Model generator cleanup. #19311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shared: Model generator cleanup. #19311
Conversation
3809598
to
3ef0f89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the shared model generator by isolating heuristic queries into their own module, updating query annotations in test files, and removing legacy mixed-model queries and flags. Key changes include:
- Replacing "summary" annotations with "heuristic-summary" in query comments across various test files.
- Updating change note files to reflect the switch from mixed to combined (heuristic/content-based) models.
- Removing deprecated flags from the model generator script.
Reviewed Changes
Copilot reviewed 49 out of 64 changed files in this pull request and generated no comments.
File | Description |
---|---|
java/ql/src/change-notes/2025-04-16-model-generation.md | Updated change note reflecting the query switch for Java models. |
csharp/ql/test/utils/modelgenerator/dataflow/Summaries.cs | Replaced legacy "summary" tags with "heuristic-summary" where appropriate. |
csharp/ql/src/change-notes/2025-04-16-model-generation.md | Updated change note reflecting the query switch for C# models. |
Files not reviewed (15)
- csharp/ql/src/utils/modelgenerator/CaptureMixedNeutralModels.ql: Language not supported
- csharp/ql/src/utils/modelgenerator/CaptureMixedSummaryModels.ql: Language not supported
- csharp/ql/src/utils/modelgenerator/CaptureNeutralModels.ql: Language not supported
- csharp/ql/src/utils/modelgenerator/CaptureSinkModels.ql: Language not supported
- csharp/ql/src/utils/modelgenerator/CaptureSourceModels.ql: Language not supported
- csharp/ql/src/utils/modelgenerator/CaptureSummaryModels.ql: Language not supported
- csharp/ql/src/utils/modelgenerator/debug/CaptureSummaryModelsPartialPath.ql: Language not supported
- csharp/ql/src/utils/modelgenerator/debug/CaptureSummaryModelsPath.ql: Language not supported
- csharp/ql/test/utils/modelgenerator/dataflow/CaptureHeuristicSummaryModels.ql: Language not supported
- csharp/ql/test/utils/modelgenerator/dataflow/CaptureNeutralModels.ql: Language not supported
- csharp/ql/test/utils/modelgenerator/dataflow/CaptureSinkModels.ql: Language not supported
- csharp/ql/test/utils/modelgenerator/dataflow/CaptureSourceModels.ql: Language not supported
- java/ql/src/utils/modelgenerator/CaptureMixedNeutralModels.ql: Language not supported
- java/ql/src/utils/modelgenerator/CaptureMixedSummaryModels.ql: Language not supported
- java/ql/src/utils/modelgenerator/CaptureNeutralModels.ql: Language not supported
Comments suppressed due to low confidence (2)
java/ql/src/change-notes/2025-04-16-model-generation.md:4
- [nitpick] There appears to be a potential naming inconsistency: the change note references 'GenerateFlowModel.py', while the PR description uses 'GenerateFlowModels.py'. Consider standardizing the script name for clarity.
* Changes to the MaD model generation infrastructure: Changed the query `java/utils/modelgenerator/summary-models` to use the implementation from `java/utils/modelgenerator/mixed-summary-models`. Removed the now-redundant `java/utils/modelgenerator/mixed-summary-models` query. Similar replacement was made for `java/utils/modelgenerator/neutral-models`. That is, if `GenerateFlowModel.py` is provided with `--with-summaries` combined/mixed models are now generated instead of heuristic models (and similar for `--with-neutrals`).
csharp/ql/src/change-notes/2025-04-16-model-generation.md:4
- [nitpick] There appears to be a potential naming inconsistency: the change note references 'GenerateFlowModel.py', while the PR description uses 'GenerateFlowModels.py'. Consider standardizing the script name for clarity.
* Changes to the MaD model generation infrastructure: Changed the query `cs/utils/modelgenerator/summary-models` to use the implementation from `cs/utils/modelgenerator/mixed-summary-models`. Removed the now-redundant `cs/utils/modelgenerator/mixed-summary-models` query. Similar replacement was made for `cs/utils/modelgenerator/neutral-models`. That is, if `GenerateFlowModel.py` is provided with `--with-summaries` combined/mixed models are now generated instead of heuristic models (and similar for `--with-neutrals`).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paldepind FYI as you've been using this in Rust.
--- | ||
category: minorAnalysis | ||
--- | ||
* Changes to the MaD model generation infrastructure: Changed the query `rust/utils/modelgenerator/summary-models` to use the implementation from `rust/utils/modelgenerator/mixed-summary-models`. Removed the now-redundant `rust/utils/modelgenerator/mixed-summary-models` query. Similar replacement was made for `rust/utils/modelgenerator/neutral-models`. That is, if `GenerateFlowModel.py` is provided with `--with-summaries` combined/mixed models are now generated instead of heuristic models (and similar for `--with-neutrals`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not actually doing change notes for Rust yet, because we're still in preview - though the symmetry with other languages is nice and I can't see this one doing any harm.
3e63ea9
to
846eb76
Compare
@@ -943,7 +1041,7 @@ module MakeModelGenerator< | |||
* 2. If content based flow does not yield any summary for an API, then we try and | |||
* generate flow summaries using the non-content based summary generator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to use the word "heuristic" here to refer to what is Heuristic::captureFlow
?
* generate flow summaries using the non-content based summary generator. | |
* generate flow summaries using the heuristic summary generator. |
Maybe it would also make sense to change the "heuristic" three lines above to a different word ("approach"?) as it's not the "heuristic" in Heuristic::captureFlow
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
/** | ||
* Gets the source model(s) of `api`, if there is flow from an existing known source to the return of `api`. | ||
*/ | ||
string captureSource(DataFlowSourceTargetApi api) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do captureSource
and captureSink
also use heuristics? I was under the impression they just used normal data flow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dataflow configuration for both of these implements the isAdditionalFlowStep
predicate like
predicate isAdditionalFlowStep(DataFlow::Node node1, DataFlow::Node node2) {
isRelevantTaintStep(node1, node2)
}
The isRelevantTaintStep
includes some read/store logic for fields which causes field conflation. In any case for some languages we can't avoid this, as declaring a field of an Argument or a ReturnValue as a source or sink doesn't work (this is not supported for C#).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a consequence the generated models are somewhat imprecise (so I think it is ok to use the term Heuristic for these as well).
43f3d5e
to
af5e36a
Compare
Putting back in draft, as the C++ model generator has been merged - so this needs to be handled in this PR as well. |
af5e36a
to
27165ef
Compare
--- | ||
category: minorAnalysis | ||
--- | ||
* Changes to the MaD model generation infrastructure: Changed the query `cpp/utils/modelgenerator/summary-models` to use the implementation from `cpp/utils/modelgenerator/mixed-summary-models`. Removed the now-redundant `cpp/utils/modelgenerator/mixed-summary-models` query. Similar replacement was made for `cpp/utils/modelgenerator/neutral-models`. That is, if `GenerateFlowModel.py` is provided with `--with-summaries` combined/mixed models are now generated instead of heuristic models (and similar for `--with-neutrals`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can omit this for C++, as we didn't even add a change note when we added the infrastructure. If we do want to have a change note, it should probably say that we've added the infrastructure.
27165ef
to
97d78f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ 👍 I do not have an opinion on any other of the changes.
The failing CPP integration test is unrelated. |
97d78f4
to
97ac804
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust changes LGTM 🎉
…n its own module.
…adjust other queries to the re-factored implementation.
…c version instead.
97ac804
to
de12222
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java changes look plausible
Merging now as it has previously been approved for C++ and Rust as well. |
In this PR we
*/utils/modelgenerator/summary-models
to use the implementation from*/utils/modelgenerator/mixed-summary-models
and then delete the*/utils/modelgenerator/mixed-summary-models
queries. Also make the same change to the neutral capturing queries.--with-mixed-*
flags from the model generator python script.This effectively means that when genering models with the
GenerateFlowModels.py
script using the flags--with-summaries
and--with-neutrals
now generates combined (or mixed as we used to call them) models based on both the content based and heuristic based data flow model generation.