-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FEA add quantile HGBT #21800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEA add quantile HGBT #21800
Conversation
I suspect we would want to benchmark the new loss implementations and compare it to Edit: I just saw that #20811 does the loss migrations. I guess we want to merge that one first and then work on this one? |
Marking as high priority according to https://scikit-learn.fondation-inria.fr/technical-committee-june-2-2021-fr/. |
a8bed32
to
57505de
Compare
After merging #20811 and git rebase, this is an enjoyable short PR for a new feature! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update!
loss_param : float, default=None | ||
If loss is "quantile", this parameter specifies which quantile to be estimated | ||
and must be between 0 and 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a few paths forward with passing in more parameters for loss:
-
If we want to be fully generic, then I think
loss_param
could become a dictionary, which will be passed down into the loss directly. This way, other parameterized losses can be generically supported by this dictionary. We kind of already do this withmetric_kwargs
. -
We can copy
GradientBoostingRegressor
's API and use a single parameter (alpha
) that is just forquantile
loss. InHistGradientBoosting*
case, I would call itquantile
instead ofalpha
. -
(Likely out of reach in the near-term) Loss become public and users can pass create a
PinballLoss(quantile=0.3)
and pass it directly intoHistGradient*
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to add the option:
loss_param
is single parameter with different meaning depending on the loss, i.e. a quantile level forloss="quantile"
, and tweedie power forloss="tweedie"
(in a later PR).
Comments:
loss_param
dictionary
This makes the clear distinction which loss is used with which parameter. As the loss parameter should not be used for hyperparameter tuning, this is ok.- Special parameter per loss.
If we add further losses, this would clutter the API. - Pass a loss object instance
To not do this and instead use simple parameter types like string and float was an original design decision for scikit-learn. It would, however, make live sometimes easier (meaning, I'm not opposed to this direction).
General: I really like consistency and would favor to use/adapt the choice we make here to other estimators, e.g. GradientBoostingRegressor
, maybe one day DecisionTreeRegressor
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loss_param is single parameter with different meaning depending on the loss, i.e. a quantile level for loss="quantile", and tweedie power for loss="tweedie" (in a later PR).
I was thinking of losses with more than one parameter. Although, I think supporting multiple parameters can be a considered a non-issue. For advanced users, they can create their own loss object and pass it in. Kind of like how we unofficially support passing in a Criterion
object into the trees:
scikit-learn/sklearn/tree/_classes.py
Lines 360 to 363 in e1e8c66
else: | |
# Make a deepcopy in case the criterion has mutable attributes that | |
# might be shared and modified concurrently during parallel fitting | |
criterion = copy.deepcopy(criterion) |
As a starting point, I am okay with a single loss_param
. I think the API discussion is worth bringing up during the next dev meeting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we already have a design similar to the proposed loss_param
in scikit-learn, do we?
Do we already have a clear idea of what future losses we want to support and what parameters they would need, apart from Tweedie? If we don't, using loss_param
instead of the more consistent and simpler quantile
parameter might be solving a problem we don't really have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one reason to go towards loss_param be a dictionary is to support losses with more than two parameters. Although traditional losses e.g. epsilon insensitive loss, huber loss have also only one param maybe this can be a restriction in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we already have a design similar to the proposed loss_param in scikit-learn, do we?
Similar to this is param
in GenericUnivariateSelect
but it's always seemed quite ugly to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the dev meeting, I think the most agreeable and quickest path forward is to change loss_param
toquantile
and only accept a single float.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Please consider including lorentzenchr#2 if you like it.
Let's wait for the decision at the dev meeting w.r.t. loss_param
. For the record I am fine with the way it is unless we already know of a loss that we plan to add and that would naturally require more than 1 parameter, in which case we can go for the dict option directly.
Note that we already have neighbors models that accept constructor params defined as metric="minknowski", metric_params={"p": 1.5, "w": [0.2, 0.8]}
for instance.
@lorentzenchr brainstorming for a potential follow-up PR: do you think it would be possible to have a quantile loss that could predict multiple quantiles at once (similarly to loss multinomial that can predict one output per possible class)?. That would still require having many n times more trees in the ensemble when predicting n quantiles instead of 1 (as for multinomial) but that could be helpful from a usability standpoint. In a follow-up, follow-up PR we could even extend the capabilities of the individual HGBR trees so as to predict n outputs per tree directly and have the split select the highest improvement in the aggregate loss directly. This would have a different inductive bias but could be computationally much more efficient, both for multi-quantile regression and for multi-class classification (and we could also support mult-label in a similar way). |
@ogrisel For the (pinball) loss itself, there is no gain (that I see) in enabling several quantile levels at the same time other than user convenience: One pinball loss is tailored for only one quantile level.
My first thought is that a single tree/split for all quantiles could be too simplistic for the whole predictive distribution. The problem is the weighting: should we weight the median more than the 99% quantile or the other way around. It works, however, for quantile regression forests, see Meinshausen 2006. I have to think about it more deeply. |
From the dev meeting, I think the most agreeable and quickest path forward
is to change loss_param toquantile and only accept a single float.
works for me
… Message ID: ***@***.***
com>
|
@ogrisel Do you stand by your approval? |
@agramfort Thanks for your review. It is good to know that there are other people interested in quantile regression 😄 |
you're not alone :)
… Message ID: ***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update!
sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py
Outdated
Show resolved
Hide resolved
sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py
Outdated
Show resolved
Hide resolved
This reverts commit 539c3a2.
Status report: 2 approvals, all lights green and no comments left. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@ogrisel Your approval was when we still had loss_param
. Are you okay with using the quantile
instead?
Reference Issues/PRs
Follow-up of #20567 and #20811. Solves #17955.
It is based on top of #20811, so it should be merged after that PR.Edit: merged.What does this implement/fix? Explain your changes.
This PR adds
loss="quantile"
toHistGradientBoostingRegressor
.Any other comments?
This also introduces the new parameter
loss_param
for specifying which quantile to model. This is in anticipation of possible other losses with one parameter as the Tweedie deviance.