Skip to content

HV-1831 Optimize cascading validation for large lists #1331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

marko-bekhta
Copy link
Member

@marko-bekhta marko-bekhta commented Sep 15, 2023

https://hibernate.atlassian.net/browse/HV-1831

this is only a version of #1157 for a main branch with javax->jakarta moved so far.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on licensing, please check here.


Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
6.6% Duplication on New Code (required ≤ 3%)
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

@marko-bekhta marko-bekhta force-pushed the feat/HV-1831-experiments-8-0 branch from 39b70af to 33dc3e4 Compare May 5, 2025 06:44
@hibernate-github-bot
Copy link

hibernate-github-bot bot commented May 5, 2025

Thanks for your pull request!

This pull request appears to follow the contribution rules.

› This message was automatically generated.

@marko-bekhta marko-bekhta force-pushed the feat/HV-1831-experiments-8-0 branch from 33dc3e4 to adb2082 Compare July 17, 2025 19:46
@marko-bekhta marko-bekhta force-pushed the feat/HV-1831-experiments-8-0 branch from adb2082 to dcb3188 Compare August 1, 2025 16:39
@marko-bekhta marko-bekhta force-pushed the feat/HV-1831-experiments-8-0 branch 2 times, most recently from d2e9beb to e74474d Compare August 12, 2025 20:17
gsmet and others added 6 commits August 14, 2025 10:01
HV-1831 Enhance ExecutableMetaData with tracking information

HV-1831 Create ProcessedBeansTrackingVoter contract

This contract allows to override the default bean process tracking
behavior without exposing our internal structures.

It needs a bit more love on the config side so that we can define it via
XML too and some documentation.

HV-1831 New zero cost approach to processed bean tracking strategy

I removed it from the traditional VF for now as I would like us to focus
on the case where it is useful first.

We will reintroduce it later once we have validated the approach where
it is the most useful.

I'm a bit unclear right now if we should use the same contract for
traditional and predefined scope VF as we are dealing with different
things and they won't be evaluated at the same moment.

I'm thinking that maybe this needs to be a different contract.

HV-1831 : Wrap a `BeanMetaData` in a `NonTrackedBeanMetaDataImpl` if tracking is not required

HV-1831 Add some guidance about next step

HV-1831 Specific benchmark infrastructure for predefined scope

HV-1831 : Update Cascade tests to use PredefinedScopeHibernateValidator with -p=predefined=true

HV-1831 : Experiment detecting cycles in bean classes

Add test for Map

HV-1831 : Experiment detecting cycles in bean classes

Add support for containers; add tests for List w/ and w/o duplicated values

HV-1831 : Experiment detecting cycles in bean classes

HV-1831 Copy nodes when changing the nature of the leaf

HV-1831 Add the same bean to List twice

HV-1831 Clean up another experiment that shouldn't have been committed

HV-1831 Add a couple of examples illustrating various cases

HV-1831 Unfinished experiments

Signed-off-by: marko-bekhta <marko.prykladna@gmail.com>
since the map won't be able to hold nulls

Signed-off-by: marko-bekhta <marko.prykladna@gmail.com>
Signed-off-by: marko-bekhta <marko.prykladna@gmail.com>
Signed-off-by: marko-bekhta <marko.prykladna@gmail.com>
Signed-off-by: marko-bekhta <marko.prykladna@gmail.com>
Signed-off-by: marko-bekhta <marko.prykladna@gmail.com>
@marko-bekhta marko-bekhta force-pushed the feat/HV-1831-experiments-8-0 branch from a5f215e to 075d180 Compare August 14, 2025 19:28
Copy link

Copy link
Member Author

@marko-bekhta marko-bekhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an implementation for detecting return value/parameters tracking detection and tried to go through more cases for bean tracking detection and adjusted the steps to account for more scenarios I could think of. Instead of looking at the validated type I tried to leverage the cascading metadata instead... my thinking was that we may have something like:

Map<KeyThatHasCycles, @Valid ValueNoCycles> prop;

and if we just inspect the map we'd think that we need tracking, but since there is no @Valid for the key we actually don't. + this seems to simplify the case when the @Valid is nested much deeper in the type arguments.

And some numbers:

with the tracking changes:

Benchmark                                                                                                     Mode  Cnt      Score     Error  Units
CascadedWithLotsOfItemsValidation.testCascadedValidationWithLotsOfItems                                      thrpt   20  16303.592 ± 205.026  ops/s
CascadedWithLotsOfItemsValidation.testCascadedValidationWithLotsOfItems:async                                thrpt             NaN              ---
PredefinedScopeCascadedWithLotsOfItemsValidation.testPredefinedScopeCascadedValidationWithLotsOfItems        thrpt   20  24974.083 ± 276.476  ops/s
PredefinedScopeCascadedWithLotsOfItemsValidation.testPredefinedScopeCascadedValidationWithLotsOfItems:async  thrpt             NaN              ---
Benchmark                                                                                               Mode  Cnt      Score     Error  Units
CascadedWithLotsOfItemsAndCyclesValidation.testCascadedValidationWithLotsOfItems                       thrpt   20  10825.431 ± 154.286  ops/s
CascadedWithLotsOfItemsAndCyclesValidation.testCascadedValidationWithLotsOfItems:async                 thrpt             NaN              ---
PredefinedScopeCascadedWithLotsOfItemsAndCyclesValidation.testCascadedValidationWithLotsOfItems        thrpt   20  10500.602 ± 139.836  ops/s
PredefinedScopeCascadedWithLotsOfItemsAndCyclesValidation.testCascadedValidationWithLotsOfItems:async  thrpt             NaN              ---

^ this seems reasonable, as when there are no cycles and we detect that tracking can be skipped in a predefined scope case things look much better, while with cycles in place the results are similar between the predefine scope and regular validator.

btw I also tried to do the tracking-without-maps:

tracking+no maps:

Benchmark                                                                                               Mode  Cnt      Score     Error  Units
CascadedWithLotsOfItemsAndCyclesValidation.testCascadedValidationWithLotsOfItems                       thrpt   20  11101.882 ± 113.787  ops/s
CascadedWithLotsOfItemsAndCyclesValidation.testCascadedValidationWithLotsOfItems:async                 thrpt             NaN              ---
PredefinedScopeCascadedWithLotsOfItemsAndCyclesValidation.testCascadedValidationWithLotsOfItems        thrpt   20  11104.529 ± 109.809  ops/s
PredefinedScopeCascadedWithLotsOfItemsAndCyclesValidation.testCascadedValidationWithLotsOfItems:async  thrpt             NaN              ---

and the results are only slightly better than when the maps are used ... will take a closer look at it to see if it can be improved

Comment on lines +31 to +36
// TODO: signature is just name and parameters so that can clash between different beans.
// with that.. do we even need to track it per signature or since
// we already built the `trackingEnabledForBeans` we can just "inspect" the cascadable as we go
// and check against this `trackingEnabledForBeans` to see if tracking is required ?
private final Map<Signature, Boolean> trackingEnabledForReturnValues;
private final Map<Signature, Boolean> trackingEnabledForParameters;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need to prebuild these maps, but I wanted to check with you first before dropping them...
As per that TODO, it seems that we would only check an executable once when building the metadata and I think I got an impl that would rely only on the trackingEnabledForBeans ...
and since the signatures could clash ... I guess we can drop these maps, right?

Comment on lines +62 to +69
this.subtypesMap = CollectionHelper.newHashMap( rawBeanMetaDataMap.size() );
for ( Class<?> beanClass : rawBeanMetaDataMap.keySet() ) {
for ( Class<?> otherBeanClass : rawBeanMetaDataMap.keySet() ) {
if ( beanClass.isAssignableFrom( otherBeanClass ) ) {
subtypesMap.computeIfAbsent( beanClass, k -> new HashSet<>() )
.add( otherBeanClass );
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This subtype map is here to help identify subtypes that can potentially, at runtime, introduce a cycle, and if that's so, we would mark that path as requiring the bean tracking.

Comment on lines +118 to +125
// We also need to account for the case when the subtype is used at runtime that may change the cycles:
// 4) A -> B -> C -> D
// And C1 extends C where C1 -> A
// Hence, at runtime we "may" get:
// A -> B -> C1 -> D
// ^ |
// | |
// -----------
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this example is for that "sub-type" map...

// directCascadedBeanClasses.add( (Class<?>) cascadable.getCascadableType() );
//
// TODO: or be much more cautious and just assume that it can be "anything":
directCascadedBeanClasses.add( Object.class );
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts here are that if we are unsure if the tracking could be required or not we should stay on the safe side and assume it would be...
these potentially-container-cascading cases are trouble anyway so I'd hope users would stay away from them ...

btw, this also reminds me of the other "problem", once we drop the support for @Valid List<MyBean>, things with these potentially cascading cases won't really work. I've started adding some thoughts on that here: jakartaee/validation#266

Comment on lines +321 to +329
Set<Class<?>> directCascadedBeanClasses = new HashSet<>();
for ( Cascadable cascadable : returnValueMetaData.getCascadables() ) {
processSingleCascadable( cascadable, directCascadedBeanClasses );
}
for ( Class<?> directCascadedBeanClass : directCascadedBeanClasses ) {
if ( trackingEnabledForBeans.get( directCascadedBeanClass ) ) {
return true;
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea here was to reuse the logic from bean tracking detection ^ and go through cascadables, collect what "top level" classes we can cascade into and then use the trackingEnabledForBeans map to see if tracking is required ...

Comment on lines +152 to +157
ResourceBundle bundle = doGetResourceBundle( localeToPreload );
if ( bundle == null ) {
LOG.resourceBundleNotPreLoaded( localeToPreload );
continue;
}
tmpPreloadedResourceBundles.put( localeToPreload, bundle );
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to do this, as with the change to Map.copyOf nulls are not allowed anymore and the startup would fail ... (noticed it when running the benchmarks)

@gsmet
Copy link
Member

gsmet commented Aug 15, 2025

I had a look at the numbers but I'm not sure to what I should compare them to? Could you clarify?

@gsmet
Copy link
Member

gsmet commented Aug 15, 2025

I think I understand them: could you check if in the case with a cycle, we are not consistently slower? I would have expected things to be mostly transparent?

@gsmet
Copy link
Member

gsmet commented Aug 15, 2025

Also both versions tested include your previous optimization, right?

@marko-bekhta
Copy link
Member Author

marko-bekhta commented Aug 15, 2025

Also both versions tested include your previous optimization, right?

yes, all the runs are on top of main, so the previous improvements are already included.

I had a look at the numbers but I'm not sure to what I should compare them to? Could you clarify?

I haven't initially compared to results from main, just predefined/default, as my thought was that the change is really about disabling the tracking ... here are the result from running the benchmark on main:

Benchmark                                                                                                     Mode  Cnt      Score     Error  Units
CascadedWithLotsOfItemsValidation.testCascadedValidationWithLotsOfItems                                      thrpt   20  15547.480 ± 187.809  ops/s
PredefinedScopeCascadedWithLotsOfItemsValidation.testPredefinedScopeCascadedValidationWithLotsOfItems        thrpt   20  16213.816 ± 180.518  ops/s

so with that compared to the case when we disable tracking:

Benchmark                                                                                                     Mode  Cnt      Score     Error  Units
CascadedWithLotsOfItemsValidation.testCascadedValidationWithLotsOfItems                                      thrpt   20  16303.592 ± 205.026  ops/s
PredefinedScopeCascadedWithLotsOfItemsValidation.testPredefinedScopeCascadedValidationWithLotsOfItems        thrpt   20  24974.083 ± 276.476  ops/s

default case matches the results from main (no tracking improvements), and the predefined is much better as we disable tracking entirely there.

let me also run the one with cycles on main too...

@marko-bekhta
Copy link
Member Author

Here are the results for the case with cycles:

main:

Benchmark                                                                                               Mode  Cnt      Score     Error  Units
CascadedWithLotsOfItemsAndCyclesValidation.testCascadedValidationWithLotsOfItems                       thrpt   20  10976.153 ± 139.115  ops/s
PredefinedScopeCascadedWithLotsOfItemsAndCyclesValidation.testCascadedValidationWithLotsOfItems        thrpt   20  10829.914 ± 130.121  ops/s

this patch:

Benchmark                                                                                               Mode  Cnt      Score     Error  Units
CascadedWithLotsOfItemsAndCyclesValidation.testCascadedValidationWithLotsOfItems                       thrpt   20  10850.865 ± 103.808  ops/s
PredefinedScopeCascadedWithLotsOfItemsAndCyclesValidation.testCascadedValidationWithLotsOfItems        thrpt   20  10835.561 ± 100.276  ops/s

It's slower than CascadedWithLotsOfItemsValidation, but I guess the reason is that we have to try to cascade into the object that creates a cycle, and that means at minimum getting the value, which involves reflection and so on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants