API to predict multiple quantiles at once

Classifiers have a `predict_proba` method that makes it possible to quantify probabilistic ally the certainty in the predictions for a given input `X_i`.

Currently most regressors in scikit-learn only predict a conditional expectile E[Y|X], and some have a `return_std` option that makes it also possible to estimate sqrt(VAR[Y|X]), which can be used to quantify the certainty when assuming a Gaussian predictive distribution (typically for Gaussian processes which estimate a Gaussian predictive posterior distribution).

We do have pointwise quantile estimators (linear models, gradient boosting, hist gradient boosting) where the `predict` method returns a single point estimate for target quantile passed as an hyper-parameter instead of estimating an expectile.

Several people have expressed the need to have more generic API that can return an array of quantile estimates for a given input `X_i`.

The goal of this issue is to centralize the discussion of an API extension to be able to do this more uniformly in scikit-learn, either via a meta-estimator that wraps an array of point-wise quantile estimator to turn it into a quantile-array estimator or to directly have the base estimators able to do this directly (and sometimes more efficiently).

Some non-exhausitive list of related PRs and issues (feel free to add or suggest new ones):

- https://github.com/scikit-learn/scikit-learn/issues/19851
- https://github.com/scikit-learn/scikit-learn/pull/19754
- https://github.com/scikit-learn/scikit-learn/issues/20964

Also related:

- conformal predictions: https://github.com/scikit-learn-contrib/MAPIE
- XGBoostLSS (location, scale and shape) https://github.com/StatMixedML/XGBoostLSS

Furthermore, models like Poisson regression that make a specific assumption about the conditional Y|X distribution, it would be possible to estimates of the inverse-CDF values of the estimated Y|X for instance. Those could probably also benefit from an expanded API.

If we do this, then we have the side question of how to evaluate such multi-quantile models. We could probably extend the pinball_loss scorer to average the pinball scores for an array of quantiles for instance.

/cc @GaelVaroquaux @amueller @lorentzenchr 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API to predict multiple quantiles at once #23334

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

API to predict multiple quantiles at once #23334

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions