Skip to content

FEA Regression error characteristic curve and plotting #31380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

alexshtf
Copy link

@alexshtf alexshtf commented May 18, 2025

Description

Compute regression error characteristic [1] curve, which is essentially the CDF of the regression errors. Its function is similar to that of ROC curves - allows comparing performance profiles of regressors beyond just one summary statistic, such as RMSE or MAE.

Examples

Used like this:

from sklearn.metrics import RecCurveDisplay

lr_estimator = LinearRegression()
lr_estimator.fit(X_train, y_train)

RecCurveDisplay.from_estimator(lr_estimator, X_test, y_test, name="Linear regression")

The result is like this:
image
It allows comparing one regressor to a constant prediction baseline (by default - the median of the test samples).

We can also compare several regressors:

fig, ax = plt.subplots()

RecCurveDisplay.from_predictions(
    y_test,
    pred_lr,
    ax=ax,
    name=f"Linear regression (MAE={mean_absolute_error(pred_lr, y_test):.2f})",
    plot_const_predictor=False,
)

RecCurveDisplay.from_predictions(
    y_test,
    pred_knn,
    ax=ax,
    name=f"KNN (MAE={mean_absolute_error(pred_lr, y_test):.2f})",
    plot_const_predictor=False,
)

# this one will plot the constant predictor for reference - note that the plot_const_predictor is not set to False!
RecCurveDisplay.from_predictions(
    y_test,
    pred_hgbr,
    ax=ax,
    name=f"Gradient Boosting Regressor (MAE={mean_absolute_error(pred_lr, y_test):.2f})",
)

fig.show()

This will plot something like this:
image
You can see here a clear domination of regressors.

Sometimes the curves will cross - in this case there is no clear domination, and the performance profiles of both regressors differs. Some are better for smaller error tolerance, whereas others may be better for larger error tolerance.

P.S - my first ever contribution to this phenomenal library. I hope I haven't missed something important.


References

[1]: Bi, J. and Bennett, K.P., 2003. Regression error characteristic curves. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 43-50).

Copy link

github-actions bot commented May 18, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: bd3649b. Link to the linter CI: here

@alexshtf alexshtf marked this pull request as draft May 18, 2025 19:08
@alexshtf alexshtf marked this pull request as ready for review May 19, 2025 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant