Description
Describe the workflow you want to enable
Scikit-learn has a predict
and predict_proba
method for Classification classes but only a predict
method for regression, with the option of quantile. Scikit-learn is adding more quantile output functionality HistGradientBoostingRegressor and QuantileRegressor - no doubt more will come in due course. The single quantile
parameter is set at the class init step.
LightGBM and other packages also follow a similar API.
MAPIE allows alpha being set on class init and predict.
XGBoost also has this option but also allows multiple outputs with e.g. alpha=np.array([0.05, 0.5, 0.95])
. Currently, this isn't documented in the scikit-learn documentation. Clearly, this is a far superior piece of functionality where possible.
Additionally, distributional regression packages like:
XGBoostLSS allow options on the predict
method such as: pred_type
= quantiles
, parameters
, expectiles
. This returns a m x n array.
PGBM uses predict
with just mean and an return_std=True
option as a 1 x n or 2 x n array.
XGBD returns the mean and std as a namedtuple.
NGBoost has predict
and pred_dist
which return point predictions and the distribution parameters that can be passed to a scipy.stats distribution object. E.g. normal
.
All of these packages use scikit learn style APIs or aim to add this as a feature.
Describe your proposed solution
All this is to say, I think scikit-learn has the authority and opportunity to lead the way on unifying an API to cover both distributional outputs and quantile outputs. Therefore, I would like to open a discussion with the following points:
-
Should this be added to the core scikit learn API/package? Is this within scope?
-
If not, should it be an sckit-learn contrib package or is this something the "distributional python ML community" should sort out themselves?
-
If this is something scikit-learn would like to take a lead on, what should this API look like? If scikit-learn admin/owners think this is outside of scope but still have insights/opinions on this it would still be extremely valuable to hear them!
Describe alternatives you've considered, if relevant
No response
Additional context
No response