Skip to content

Feature request: add absolute=True/False to regression metric mean_absolute_error #17853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
raybellwaves opened this issue Jul 6, 2020 · 2 comments

Comments

@raybellwaves
Copy link
Contributor

Apologies if this has been brought up before. I did have a quick check of https://github.com/scikit-learn/scikit-learn/issues?q=is%3Aissue+sort%3Aupdated-desc+mean+error+is%3Aclosed+label%3Amodule%3Ametrics

Describe the workflow you want to enable

I am interested in the sign of the error and if possible I would rather do it in scikit-learn than numpy.

mean_absolute_error(y_true, y_pred, absolute=False) to return np.average(y_pred - y_true)
mean_absolute_error(y_true, y_pred) returns same as before

Describe your proposed solution

Adding the absolute=False will update https://github.com/scikit-learn/scikit-learn/blob/fd237278e/sklearn/metrics/_regression.py#L122L190 to be

                           absolute=True, sample_weight=None,
                           multioutput='uniform_average'):
...
    Parameters
    ----------
    absolute: bool
        Return MAE or ME.
...
    if absolute:
        output_errors = np.average(np.abs(y_pred - y_true),
                                   weights=sample_weight, axis=0)
    else:
        output_errors = np.average(y_pred - y_true,
                                   weights=sample_weight, axis=0)

Describe alternatives you've considered, if relevant

Not sure if a new metric is needed i.e. mean_error.

I believe this approach of a new argument was applied to mean_squared_error (squared=True). See #12895

Additional context

If this is of interest I'll be happy to work on this during the scipy sprint.

@thomasjpfan
Copy link
Member

I would be -1 on this.

  1. Adding an absolute=False to mean_absolute_error would be counter to the function's name.
  2. I think this would be a poor metric because errors can cancel out:
y_true = np.array([1, -1, 1, -1])
y_pred = np.array([10000, -10000, 10000, -10000])

np.average(y_true - y_pred)
# 0.0

@raybellwaves
Copy link
Contributor Author

True. I guess it's not really an error metric but more so its further analysis on the average distribution of the errors (bias is probably a better term for np.average(y_true - y_pred)?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants