Skip to content

Regression error characteristic curve #31441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alexshtf opened this issue May 28, 2025 · 5 comments
Closed

Regression error characteristic curve #31441

alexshtf opened this issue May 28, 2025 · 5 comments
Labels
Needs Triage Issue requires triage New Feature

Comments

@alexshtf
Copy link

alexshtf commented May 28, 2025

Describe the workflow you want to enable

Add more fine-grained diagnostic, similar to ROC or Precision-Recall curves, to regression problems. It appears that this library has a lot of excellent tools for classification, and I believe it would benefit from some additional tools for regression.

Describe your proposed solution

Compute Regression Error Characteristic (REC) [1] curve - for each error threshold the percentage of samples whose error is below that threshold. This is essentially the CDF of the regression errors. Its function is similar to that of ROC curves - allows comparing performance profiles of regressors beyond just one summary statistic, such as RMSE or MAE.

I already implement a pull-request:
#31380

Screenshot from the merge request:

Image

If you believe this feature is useful, please help me with reviewing and merging it.

Describe alternatives you've considered, if relevant

Regression Receiver Operating Characteristic (RROC) curves, proposed [2], which plot over-prediction vs under-prediction, are a different form of diagnostic curves for regression. They may also be useful, but I think we should begin from somewhere, and I belive it's better to begin from REC, both because the paper has more citations, and because it turned out to be very useful for me at work, and I believe it can be similarly useful to other scientists.

Additional context

References

[1]: Bi, J. and Bennett, K.P., 2003. Regression error characteristic curves. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 43-50).
[2]: Hernández-Orallo, J., 2013. ROC curves for regression. Pattern Recognition, 46(12), pp.3395-3411.

@alexshtf alexshtf added New Feature Needs Triage Issue requires triage labels May 28, 2025
@HussainAther
Copy link

This is a fantastic idea. thank you for bringing REC curves into the conversation!

Having more nuanced diagnostics for regression models has been a long-standing gap in the standard ML workflow. The analogy to ROC/PR curves is spot-on: one summary metric (e.g., MAE/RMSE) can’t always capture performance tradeoffs, especially across varying error tolerances. The cumulative nature of the REC curve gives a much clearer picture of model robustness in practical contexts.

Also appreciate the inclusion of reference. Bi & Bennett (2003) is a solid foundation. And your submitted PR (#31380) looks very promising from a quick glance. I’d love to see this feature included in future versions of scikit-learn. It could really benefit both academic research and applied ML workflows.

Looking forward to seeing how this progresses! Great work.

@lorentzenchr
Copy link
Member

@alexshtf Thanks for opening this issue.

I am -1 on this feature for 2 reasons:

  • The proposed functionality is very easy to implement:
     import matplotlib.pyplot as plt
     from scipy.stats import ecdf
     errors = my_error_function(y_obs, y_pred)
     cdf_errors = ecdf(errors)
     ax = plt.subplot()
     cdf_errors.cdf.plot(ax)
     plt.show()
  • I am not convinced about the insight this generates. A much better diagnostic tool for regression in the CAP curve, see Add sklearn.metrics.cumulative_gain_curve and sklearn.metrics.lift_curve #10003.

Therefore, I am closing this issue. But still, feel free to continue discussion.

@alexshtf
Copy link
Author

alexshtf commented Jun 2, 2025

@lorentzenchr I believe ease of implementation is not that important - otherwise, why does scikit-learn have PredictionErrorDisplay visualization?

So I believe the only remaining issue are the insights. Regressors are used for many downstream tasks, not just "let's predict a value and show it to the user", i.e,. computing bids for ad auctions. In general, more accurate bids generate better revenue, but simple summary metrics don't always help you understand why a certain predictors behaves as it does. So I thought it might be useful and valuable for other people as well.

It appears you disagree on the diagnostic value, but it appears I should first dig deeper into CAP curves and see if they could provide similar value for the same use cases.

@lorentzenchr
Copy link
Member

Counter question: What to you learn from a fitted model by looking at the CDF of some error (loss function or score) of its predictions?

@alexshtf
Copy link
Author

alexshtf commented Jun 2, 2025

Well, the x axis of the curve serves as an error threshold, so directly what you observe is what portion of the data has errors less than $\epsilon$ for various values of $\epsilon$. So this may already be directly relevant in terms of business - "we have errors < 0.5 for 95% of the data", and you can see it visually for any error threshold beyond 0.5.

Secondly, it lets you do is compare models. In some sense, model A is better than model B if for any threshold $\epsilon$ it has more data whose errors are under $\epsilon$.

Finally, If your curve quickly grows, stops at, say, 0.7, and grows very slowly towards 1, you understand that you have a long tail of samples with a large error (a kind of a upwards knee shape). Alternatively, if it grows a bit slowly but saturates at 1, you see that you do not have a long tail. Plotting several curves of various models lets you compare this behavior, because "quickly" and "slowly" are more appreciated by humans as relative to some baseline, rather than some absolute qualitative concept. And there is always a baseline on the plot - a constant predictor.

So for me these aspects were very useful. But as I said, I need to understand CAP curves more deeply to understand if similar observations can be made using CAP curves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Triage Issue requires triage New Feature
Projects
None yet
Development

No branches or pull requests

3 participants