Skip to content

Overlay: Enable overlay compilation for Java #19872

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 17, 2025

Conversation

kaspersv
Copy link
Contributor

@kaspersv kaspersv commented Jun 25, 2025

This PR enables overlay compilation for Java. With overlay compilation enabled, the compiler will model the overlay frontier throughout optimisation passes and generate QLX that supports both non-overlay aware evaluation and overlay aware evaluation. The current non-overlay aware evaluation will remain the default. Enabling overlay compilation will not affect semantics under non-overlay aware evaluation, but may affect performance as some previously valid optimisations are no longer valid under overlay compilation.

Previous DCA experiments identified two real regressions stemming from TC bounds and magic bounds that were hampered by the overlay frontier. The TC bound regressions both involved use of DataFlow::localExprFlow and affected 2 queries. The TC bound regressions were fixed client-side by marking the callers overlay[local?]. The magic bounds issue affected all 11 queries that use RefType.getAStrictAncestor. The magic bounds issue was fixed callee-side by marking getAStrictAncestor overlay[caller?], in case any custom queries similarly rely on magic bounds for proper performance of getAStrictAncestor.

With the fixes above, DCA experiments shows an overall regression below 0.6%. apache/commons-* sources show large variability ranging from 25% regressions to 25% improvements. Note that the two latest DCA experiments are equivalent, except with base and variant swapped for the -rev experiment.

DIL size increases are expected, due to the additional HOPs to model the overlay frontier, discard predicates and helpers that can't get inlined across the overlay frontier.

@github-actions github-actions bot added the Java label Jun 25, 2025
@kaspersv kaspersv force-pushed the kaspersv/overlay-java-enable branch from 981ab79 to 2575d8e Compare June 26, 2025 07:24
@kaspersv kaspersv force-pushed the kaspersv/overlay-java-discarding branch 2 times, most recently from a9f1540 to 64f27e2 Compare June 26, 2025 09:36
Base automatically changed from kaspersv/overlay-java-discarding to main June 30, 2025 09:17
@kaspersv kaspersv force-pushed the kaspersv/overlay-java-enable branch from 2575d8e to 57752ca Compare June 30, 2025 09:52
@kaspersv kaspersv force-pushed the kaspersv/overlay-java-enable branch 2 times, most recently from 7c1acbb to b3d61f7 Compare July 15, 2025 11:48
@kaspersv kaspersv force-pushed the kaspersv/overlay-java-enable branch from b3d61f7 to 10a678d Compare July 15, 2025 14:24
@kaspersv kaspersv requested a review from aschackmull July 17, 2025 08:04
@kaspersv kaspersv marked this pull request as ready for review July 17, 2025 08:04
@Copilot Copilot AI review requested due to automatic review settings July 17, 2025 08:04
@kaspersv kaspersv requested a review from a team as a code owner July 17, 2025 08:04
@kaspersv kaspersv added the no-change-note-required This PR does not need a change note label Jul 17, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables overlay compilation for Java by adding the compileForOverlayEval: true configuration to the qlpack.yml file. The overlay compilation feature allows the compiler to model the overlay frontier throughout optimization passes and generate QLX that supports both non-overlay aware and overlay aware evaluation.

Key Changes:

  • Enables overlay compilation for Java queries through configuration
  • Maintains backward compatibility with non-overlay aware evaluation as default
  • Addresses performance regressions identified in DCA experiments

@aschackmull
Copy link
Contributor

Looks like there's some missing data or something for java/csrf-unprotected-request-type in the dca runs (both version) for multiple repos. That seems odd, do you know what that's about?

@kaspersv
Copy link
Contributor Author

Looks like there's some missing data or something for java/csrf-unprotected-request-type in the dca runs (both version) for multiple repos. That seems odd, do you know what that's about?

The query produces too much data for DCA to analyse for those projects. For openjdk__jdk11u the baseline bqrs result size for java/csrf-unprotected-request-type is ~100MB and the jsonl variant that DCA normally analyses is 1.4GB.

I've seen it in past Java DCA nightly experiments (example).

@aschackmull
Copy link
Contributor

The query produces too much data for DCA to analyse for those projects. For openjdk__jdk11u the baseline bqrs result size for java/csrf-unprotected-request-type is ~100MB and the jsonl variant that DCA normally analyses is 1.4GB.

I've seen it in past Java DCA nightly experiments (example).

Ah! That makes a lot of sense - looking at the query I see that it's a custom path-problem implementation that doesn't prune the graph, which means that the query result gets a huge amount of superfluous graph edges. I should probably just fix that.

@aschackmull
Copy link
Contributor

I'd say dca looks pretty good then - there's one potential regression in apache/flink. It could be noise, but it's flagged in both experiments, so it's worth checking. The stage timings do give some pointers so you can see whether any of those queries reproduce anything locally. OTOH the buildless variant doesn't show the same tendency, which does indicate that it might just be noise.

@kaspersv
Copy link
Contributor Author

I ran the full nightly query suite on apache/flink locally and the variant with overlay compilation spent ~16s. longer evaluating predicates according to eval logs. ~11s. of that was because CommandArguments::isShell/1#cf24db79 (which is surprisingly expensive) happened to be evaluated twice (for java/command-line-injection and java/concatenated-command-line) with overlay compilation enabled as there was a cachaca cache miss and only once without overlay compilation as there was a cachaca cache hit. My interpretation is that this is noise.

Most expensive predicates without overlay compilation for apache/flink vs with overlay compilation.

Copy link
Contributor

@aschackmull aschackmull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we're ready to merge, then!

@kaspersv kaspersv merged commit a807db5 into main Jul 17, 2025
18 of 19 checks passed
@kaspersv kaspersv deleted the kaspersv/overlay-java-enable branch July 17, 2025 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Java no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants