-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Open
Labels
Breaking ChangeIssue resolution would not be easily handled by the usual deprecation cycle.Issue resolution would not be easily handled by the usual deprecation cycle.Needs BenchmarksA tag for the issues and PRs which require some benchmarksA tag for the issues and PRs which require some benchmarksNeeds DecisionRequires decisionRequires decisionNew FeaturePerformancemodule:ensemble
Description
Related issues: #25210
Current State
HistGradientBootingClassifier
and HistGradientBootingRegressor
both:
- Calculate the sample size
count
in histograms - Use
count
for splitting (mostly excluding split candidates) - Save the
count
in the final trees and use it in partial dependence computations.
Proposition
- Evaluate if removing
count
from the histograms (LightGBM only sums gradient and hessian in histograms, no count) gives a good speed-up .(I measured a roughly 10-20% speed-up.)
Edit: LightGBM uses an approximate count based on the hessian to check for min sample size. So this might not be what we want. - Add an option to save counts and sample weights to final trees at the very end of
fit
(where the binned trainingX
is still available). - Use partial dependence
method='recursion'
if the above option was set, else usemethod='brute'
.
Why?
#25431 concluded that adding weights to the trees is too expensive. The above proposition gives a user a clear choice: Faster training time or faster pdp afterwards.
adrinjalali
Metadata
Metadata
Assignees
Labels
Breaking ChangeIssue resolution would not be easily handled by the usual deprecation cycle.Issue resolution would not be easily handled by the usual deprecation cycle.Needs BenchmarksA tag for the issues and PRs which require some benchmarksA tag for the issues and PRs which require some benchmarksNeeds DecisionRequires decisionRequires decisionNew FeaturePerformancemodule:ensemble