Regression metrics - which strategy ? #13482

smarie · 2019-03-20T15:29:32Z

I recently came across #12895 (with PR #13467) and the older #6457, this woke up an old topic that I would like to share.

In our team, we had the need to provide model performance metrics, for regression models. This is a slightly different goal than using metrics for grid-search or model selection. Indeed the metric is not only used to "select the best model" but to provide users with feedback about "how good a model is".

For regression models I introduced three categories of metrics that happened to be quite intuitive:

Absolute performance (L2 RMSE, L1 MAE): these metrics can all be interpreted as an "average prediction error" ("average" in the broad sense here) expressed in the unit of the prediction target (e.g. "average error of 12kWh")
Relative performance (L2 CVRMSE, L1 CVMAE, and per-point relative metrics such as MAPE or MARE, MARES, MAREL...): these metrics can all be interpreted as an "average relative prediction error" expressed as a percentage of the target (e.g. "average error of 10%").
Comparison to a dummy model (L2 RRSE, L1 RAE): these metrics can all be interpreted as a ratio between the performance of the model at hand, and the performance of a dummy, constant model (predicting always the average). These need to be inverted to be intuitive e.g. "20% -> 5 times better than a dummy model"

Of course these categories are "applicative". They all make sense from a user point of view, however as far as model selection is concerned, only two make sense (MAE and RMSE). Not even R² because R²=1-RRSE² so it is not a performance metric but a comparison to dummy metric (but I dont want to open the debate here so please refrain from objecting on that one :) ).

Anyway my question for the core sklearn team is: shall I propose a pull request with all these metrics ? I'm ready to shoot since we've done it in our private repo, aligned with sklearn regression.py file. So it is rather a matter of deciding if this is a good idea. And if so, introducing categories might be needed, to help users better understand.

An alternative might be to create a small independent projet containing all the metrics, leaving only the mean_absolute_error (L1) and mean_squared_error (L2) in sklearn.

Any thoughts on this ?

The text was updated successfully, but these errors were encountered:

smarie · 2019-03-21T08:36:35Z

Oh and I forgot: associated with this, I realized that the r2 score in regression.py is wrong since it is automatically set to 0 when the data is constant, instead of -Inf (x/0) or NaN (0/0). This might obviously be a good approximation for model selection, but it is definitely mathematically wrong if that metric is used to display back the model performance to users.

See the source code of r2_score:

# arbitrary set to zero to avoid -inf scores, having a constant
# y_true is not interesting for scoring a regression anyway
output_scores[nonzero_numerator & ~nonzero_denominator] = 0.

In my implementation, I rather keep all NaN and Inf but set numpy temporarily to silent mode concerning division by zero and nan related warnings.

smarie · 2019-03-21T08:47:58Z

And finally: as a user I would prefer to manipulate first-class citizens for all base metrics above than having to partialize more generic metrics. Of course implementation can be shared using more generic routines, but a user looking for rmse should be able to directly find it.

In my implementation I even added aliases for all metrics

initials: e.g. rmse = root_mean_squared_error
popular alternate naming: e.g. cv_rmsd = cv_rmse.

Indeed the overhead in creating aliases is extremely low, compared to the educational power. All users would be able to find their favorite metric, and then realize that it is the same than some other.

lorentzenchr mentioned this issue Nov 19, 2021

RFC Principled metrics for scoring and calibration of supervised learning #21718

Open

cmarmo added module:metrics RFC labels Feb 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression metrics - which strategy ? #13482

Regression metrics - which strategy ? #13482

smarie commented Mar 20, 2019 •

edited

Loading

smarie commented Mar 21, 2019 •

edited

Loading

smarie commented Mar 21, 2019

Regression metrics - which strategy ? #13482

Regression metrics - which strategy ? #13482

Comments

smarie commented Mar 20, 2019 • edited Loading

smarie commented Mar 21, 2019 • edited Loading

smarie commented Mar 21, 2019

smarie commented Mar 20, 2019 •

edited

Loading

smarie commented Mar 21, 2019 •

edited

Loading