Loss Function (All-Threshold Loss) for ordinal response variables HistGradientBoosting

#### Describe the workflow you want to enable
If the target is discrete multiclass, but ordinal (ordered) in nature (e.g. Likert scale, user ratings, preference levels), as opposed to nominal, would be great to have support for loss function that recognize the ordinal nature of the target variable.

(Note the difference between "learning-to-rank" and "ordinal classification". "learning-to-rank", implemented by popular boosting libraries XGBoost, LightGBM, CatBoost, produces relative ordering between items, and the output predictions are not class labels, while "ordinal classification" does produce class labels. Here is a blog post to introduce the reader to [loss functions for ordinal classification](http://fa.bianp.net/blog/2013/loss-functions-for-ordinal-regression/).)
#### Describe your proposed solution
All-Threshold loss is the proposed solution. [Rennie, et al](https://ttic.uchicago.edu/~nati/Publications/RennieSrebroIJCAI05.pdf) described the All-Threshold loss function. Here is an easy-to-understand blog post introducing readers to [loss functions for ordinal classification](http://fa.bianp.net/blog/2013/loss-functions-for-ordinal-regression/).
Fabian, et al implemented the [mord](https://github.com/fabianp/mord) package using logistic regression with All Threshold Loss

#### Describe alternatives you've considered, if relevant
Historically, in Logistic Regression, popular approach proposed by McCullagh ([paper](https://www.jstor.org/stable/2984952)) uses Cumulative Logit link function. Here is the main disadvantage of this approach:
> Probabilistic models for discrete ordinal response have also
been studied in the statistics literature [McCullagh, 1980;
Fu and Simpson, 2002]. However, the models suggested are
much more complex, and even just evaluating the likelihood
of a predictor is not straight-forward.
From Page 2 Rennie, Jason D. M. and Srebro, Nathan . [Loss functions for preference levels: Regression with discrete ordered labels](https://ttic.uchicago.edu/~nati/Publications/RennieSrebroIJCAI05.pdf).

Note that Cumulative Logit is the probabilistic approach, proposed by McCullagh in 1980.
All threshold loss is much easier to scale to large datasets, which is very much the purpose of HistGradientBoosting.
#### References
Papers
[1] Rennie, Jason D. M. and Srebro, Nathan, Loss functions for preference levels: Regression with discrete ordered labels.
[2] Dembczynski, Krzysztof and Kotlowski, Wojciech and Slowinski, Roman. Ordinal Classification with Decision Rules.
[3] Pedregosa, Fabian and Bach, Francis R. and Gramfort, Alexandre. On the Consistency of Ordinal Regression Methods.
[4] Li, Ling and Lin, Hsuan-Tien. Ordinal Regression by Extended Binary Classification.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Loss Function (All-Threshold Loss) for ordinal response variables HistGradientBoosting #16694

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Loss Function (All-Threshold Loss) for ordinal response variables HistGradientBoosting #16694

Description

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions