Skip to content

FEA add quantile HGBT #21800

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 22, 2022
Merged

FEA add quantile HGBT #21800

merged 8 commits into from
Feb 22, 2022

Conversation

lorentzenchr
Copy link
Member

@lorentzenchr lorentzenchr commented Nov 26, 2021

Reference Issues/PRs

Follow-up of #20567 and #20811. Solves #17955.

It is based on top of #20811, so it should be merged after that PR. Edit: merged.

What does this implement/fix? Explain your changes.

This PR adds loss="quantile" to HistGradientBoostingRegressor.

Any other comments?

This also introduces the new parameter loss_param for specifying which quantile to model. This is in anticipation of possible other losses with one parameter as the Tweedie deviance.

@thomasjpfan
Copy link
Member

thomasjpfan commented Nov 26, 2021

Should we do the loss migration first and then add quantile loss in a follow up? I do not want an API discussion around loss_param or tests for quantile loss to block the loss migration.

I suspect we would want to benchmark the new loss implementations and compare it to main in this PR.

Edit: I just saw that #20811 does the loss migrations. I guess we want to merge that one first and then work on this one?

@lorentzenchr lorentzenchr linked an issue Dec 10, 2021 that may be closed by this pull request
@lorentzenchr lorentzenchr added the High Priority High priority issues and pull requests label Dec 10, 2021
@lorentzenchr
Copy link
Member Author

Marking as high priority according to https://scikit-learn.fondation-inria.fr/technical-committee-june-2-2021-fr/.

@lorentzenchr
Copy link
Member Author

After merging #20811 and git rebase, this is an enjoyable short PR for a new feature!

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

Comment on lines 1148 to 1150
loss_param : float, default=None
If loss is "quantile", this parameter specifies which quantile to be estimated
and must be between 0 and 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a few paths forward with passing in more parameters for loss:

  1. If we want to be fully generic, then I think loss_param could become a dictionary, which will be passed down into the loss directly. This way, other parameterized losses can be generically supported by this dictionary. We kind of already do this with metric_kwargs.

  2. We can copy GradientBoostingRegressor's API and use a single parameter (alpha) that is just for quantile loss. In HistGradientBoosting* case, I would call it quantile instead of alpha.

  3. (Likely out of reach in the near-term) Loss become public and users can pass create a PinballLoss(quantile=0.3) and pass it directly into HistGradient*.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to add the option:

  1. loss_param is single parameter with different meaning depending on the loss, i.e. a quantile level for loss="quantile", and tweedie power for loss="tweedie" (in a later PR).

Comments:

  1. loss_param dictionary
    This makes the clear distinction which loss is used with which parameter. As the loss parameter should not be used for hyperparameter tuning, this is ok.
  2. Special parameter per loss.
    If we add further losses, this would clutter the API.
  3. Pass a loss object instance
    To not do this and instead use simple parameter types like string and float was an original design decision for scikit-learn. It would, however, make live sometimes easier (meaning, I'm not opposed to this direction).

General: I really like consistency and would favor to use/adapt the choice we make here to other estimators, e.g. GradientBoostingRegressor, maybe one day DecisionTreeRegressor as well.

Copy link
Member

@thomasjpfan thomasjpfan Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loss_param is single parameter with different meaning depending on the loss, i.e. a quantile level for loss="quantile", and tweedie power for loss="tweedie" (in a later PR).

I was thinking of losses with more than one parameter. Although, I think supporting multiple parameters can be a considered a non-issue. For advanced users, they can create their own loss object and pass it in. Kind of like how we unofficially support passing in a Criterion object into the trees:

else:
# Make a deepcopy in case the criterion has mutable attributes that
# might be shared and modified concurrently during parallel fitting
criterion = copy.deepcopy(criterion)

As a starting point, I am okay with a single loss_param. I think the API discussion is worth bringing up during the next dev meeting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we already have a design similar to the proposed loss_param in scikit-learn, do we?

Do we already have a clear idea of what future losses we want to support and what parameters they would need, apart from Tweedie? If we don't, using loss_param instead of the more consistent and simpler quantile parameter might be solving a problem we don't really have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one reason to go towards loss_param be a dictionary is to support losses with more than two parameters. Although traditional losses e.g. epsilon insensitive loss, huber loss have also only one param maybe this can be a restriction in the future?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we already have a design similar to the proposed loss_param in scikit-learn, do we?

Similar to this is param in GenericUnivariateSelect but it's always seemed quite ugly to me!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the dev meeting, I think the most agreeable and quickest path forward is to change loss_param toquantile and only accept a single float.

@ogrisel ogrisel self-requested a review January 25, 2022 15:22
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Please consider including lorentzenchr#2 if you like it.

Let's wait for the decision at the dev meeting w.r.t. loss_param. For the record I am fine with the way it is unless we already know of a loss that we plan to add and that would naturally require more than 1 parameter, in which case we can go for the dict option directly.

Note that we already have neighbors models that accept constructor params defined as metric="minknowski", metric_params={"p": 1.5, "w": [0.2, 0.8]} for instance.

@ogrisel
Copy link
Member

ogrisel commented Jan 26, 2022

@lorentzenchr brainstorming for a potential follow-up PR: do you think it would be possible to have a quantile loss that could predict multiple quantiles at once (similarly to loss multinomial that can predict one output per possible class)?.

That would still require having many n times more trees in the ensemble when predicting n quantiles instead of 1 (as for multinomial) but that could be helpful from a usability standpoint.

In a follow-up, follow-up PR we could even extend the capabilities of the individual HGBR trees so as to predict n outputs per tree directly and have the split select the highest improvement in the aggregate loss directly. This would have a different inductive bias but could be computationally much more efficient, both for multi-quantile regression and for multi-class classification (and we could also support mult-label in a similar way).

@lorentzenchr
Copy link
Member Author

do you think it would be possible to have a quantile loss that could predict multiple quantiles at once?

@ogrisel For the (pinball) loss itself, there is no gain (that I see) in enabling several quantile levels at the same time other than user convenience: One pinball loss is tailored for only one quantile level.
For quantiles, it happens to work that the losses for, say, 10 quantile levels can be combined into a single loss as the sum of the 10 corresponding pinball losses. But again, I see no advantage (apart from an additional metric for 2 symmetric quantile levels, alpha/2 and 1-alpha/2, for user convenience).
Combining several quantile levels seems more promising for linear losses, where one could forbid crossings (via inequality constraints in the linprog solver).

In a follow-up, follow-up PR we could even extend the capabilities of the individual HGBR trees so as to predict n outputs per tree directly and have the split select the highest improvement in the aggregate loss directly.

My first thought is that a single tree/split for all quantiles could be too simplistic for the whole predictive distribution. The problem is the weighting: should we weight the median more than the 99% quantile or the other way around. It works, however, for quantile regression forests, see Meinshausen 2006. I have to think about it more deeply.

@agramfort
Copy link
Member

agramfort commented Feb 1, 2022 via email

@lorentzenchr
Copy link
Member Author

@ogrisel Do you stand by your approval?

@lorentzenchr
Copy link
Member Author

@agramfort Thanks for your review. It is good to know that there are other people interested in quantile regression 😄

@agramfort
Copy link
Member

agramfort commented Feb 1, 2022 via email

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

@lorentzenchr
Copy link
Member Author

Status report: 2 approvals, all lights green and no comments left.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ogrisel Your approval was when we still had loss_param. Are you okay with using the quantile instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
High Priority High priority issues and pull requests module:ensemble Waiting for Reviewer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HistGradientBoostingRegressor and quantile loss function
6 participants