-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH compute histograms only for allowed features in HGBT #24856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH compute histograms only for allowed features in HGBT #24856
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codewise, I suspect this will improve performance. May you run a quick benchmark to verify?
Summary: This PR clearly reduces fit time with interaction constraints (35% on Higgs benchmark) and no performance penalty without interactions. With Interaction ConstraintsNot interactions allowed. MAIN with commit 85a8aa6 with interaction constraints
this PR with interaction constraints
Without interaction constraintsMAIN
This PR
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo a few suggestions.
has_interaction_cst = allowed_features is not None | ||
if has_interaction_cst: | ||
n_allowed_features = allowed_features.shape[0] | ||
|
||
with nogil: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows reusing thread, eventually reducing OpenMP's overhead.
with nogil: | |
with nogil, parallel(num_threads=n_threads: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we try this out in another PR?
I did not touch those implementation parts and would like to keep it that way.
for f_idx in prange( | ||
n_allowed_features, schedule='static', num_threads=n_threads | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the here-above suggested nogil, parallel
-context is used, this num_threads
must be removed here and in previous prange
loops I can't suggest on.
for f_idx in prange( | |
n_allowed_features, schedule='static', num_threads=n_threads | |
): | |
for f_idx in prange(n_allowed_features, schedule='static'): |
Note that this partern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In another PR, maybe. See comment above.
if has_interaction_cst: | ||
feature_idx = allowed_features[f_idx] | ||
else: | ||
feature_idx = f_idx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there performances benefit of making the branching outside the prange
loops and have one prange
loop per branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that, but I did not observe any runtime difference. Usually, the number of features is small (less than 10, 100, maybe 1000), but sample size is large. The inner loop is then over n_samples and there branching should be avoided.
@jjerphan I hope it's ready now. All issues/merge conflicts are fixed. I also added a whatsnew entry. |
Still LGTM, yes. I just have labelled this PR as "Waiting for a Second Reviewer". |
Thanks @lorentzenchr! |
Reference Issues/PRs
Follow-up of #21020.
What does this implement/fix? Explain your changes.
This PR restricts the computation of histograms in
HistGradientBoostingRegressor
andHistGradientBoostingClassifier
to features that are allowed to be split on. This gives a boost in performance (fit time).Any other comments?