Regression Probability Distribution & Multi-Quantile Output API

### Describe the workflow you want to enable

Scikit-learn has a `predict` and `predict_proba` method for Classification classes but only a `predict` method for regression, with the option of quantile. Scikit-learn is adding more quantile output functionality [HistGradientBoostingRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html) and [QuantileRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.QuantileRegressor.html#sklearn.linear_model.QuantileRegressor) - no doubt more will come in due course. The single `quantile` parameter is set at the class init step.

LightGBM and other packages also follow a similar API.

[MAPIE](https://mapie.readthedocs.io/en/latest/generated/mapie.regression.MapieQuantileRegressor.html#mapie.regression.MapieQuantileRegressor) allows alpha being set on class init and predict.

[XGBoost](https://xgboost.readthedocs.io/en/latest/python/examples/quantile_regression.html) also has this option but also allows multiple outputs with e.g. `alpha=np.array([0.05, 0.5, 0.95])`. Currently, this isn't documented in the scikit-learn documentation. Clearly, this is a far superior piece of functionality where possible.

Additionally, distributional regression packages like:
[XGBoostLSS](https://statmixedml.github.io/XGBoostLSS/) allow options on the `predict` method such as: `pred_type` = `quantiles`, `parameters`, `expectiles`. This returns a m x n array.

[PGBM](https://github.com/elephaint/pgbm) uses `predict` with just mean and an `return_std=True` option as a 1 x n or 2 x n array.

[XGBD](https://github.com/CDonnerer/xgboost-distribution?tab=readme-ov-file) returns the mean and std as a namedtuple.

[NGBoost](https://github.com/stanfordmlgroup/ngboost) has `predict` and `pred_dist` which return point predictions and the distribution parameters that can be passed to a scipy.stats distribution object. E.g. `normal`.

All of these packages use scikit learn style APIs or aim to add this as a feature.

### Describe your proposed solution

All this is to say, I think scikit-learn has the authority and opportunity to lead the way on unifying an API to cover both distributional outputs and quantile outputs. Therefore, I would like to open a discussion with the following points:

1) Should this be added to the core scikit learn API/package? Is this within scope?
2) If not, should it be an sckit-learn contrib package or is this something the "distributional python ML community" should sort out themselves?

3) If this is something scikit-learn would like to take a lead on, what should this API look like? If scikit-learn admin/owners think this is outside of scope but still have insights/opinions on this it would still be extremely valuable to hear them!

### Describe alternatives you've considered, if relevant

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Regression Probability Distribution & Multi-Quantile Output API #28060

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Regression Probability Distribution & Multi-Quantile Output API #28060

Description

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions