Skip to content

Improve documentation: consistent scoring functions  #10584

Closed
@lorentzenchr

Description

@lorentzenchr

Abstract

Improve the documentation 3.3. Model Evaluation. Give advice to use strictly proper scoring functions.

Explanation

The documentation of scikit-learn is amazing and a strong argument for usage (and fame). I think that one could get a bit lost by choosing the right scoring function or metric out of the many options for model evaluation. There are some influential papers which advocate the usage of strictly proper scoring functions:

  1. Making and Evaluating Point Forecasts, alt. link

  2. Strictly Proper Scoring Rules, Prediction,
    and Estimation
    , alt. link

For classification and regression, most of the time one is interested in the mean functional of the distribution of the target variable y. Then, the scoring function used to compare the predictive power of different models (like when choosing the regularization strength via cross validation) should be strictly consistent for the mean functional.

Examples

For binary classification knowing the mean of y is equivalent to knowing the whole distribution of y. Brier (squared error) and logistic (log) loss are strictly consistent for the mean. Accuracy and 0-1 loss are only consistent, but not strictly consistent (they are strictly consistent for the mode which is less informative than the mean for classification). ROC, precision, recall etc. are not consistent for the mean, afaik.

For regression, Bregman functions (eq. (18) of 1.) are strictly consistent for the mean. For targets y on the whole real numbers, the squared error is one example. For positive-valued targets y, the squared error (b=2), Gamma deviance (b=0) and Poisson deviance (b=1) are examples, see eq. (20) of 1.

Maybe one should check the many metrics already provided by scikit-learn, if they are (strictly) consistent for a certain functional and if they are equivalent to one another.

Disclaimer

I do not know the authors of the cited papers and I'm not pushing my own research, just my opinion on how to improve this fantastic library 😏

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions