Skip to content

Median absolute error #3761

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Oct 12, 2014
Merged

Median absolute error #3761

merged 6 commits into from
Oct 12, 2014

Conversation

FlorianWilhelm
Copy link
Contributor

The median absolute error is a robust regression metric as suggested by @GaelVaroquaux in PR #2949 for the Theil-Sen regressor. @arjoly suggested to make this an own PR for clearer distinction.


\text{MedAE}(y, \hat{y}) = \text{median}(\mid y_1 - \hat{y}_1 \mid, \ldots, \mid y_n - \hat{y}_n \mid).

Here a small example of usage of the :func:`median_absolute_error`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Here a" -> "Here is a"

@jnothman
Copy link
Member

Apart from that grammar typo, this LGTM; and thanks for the PEP8 fixes!

larsmans added a commit that referenced this pull request Oct 12, 2014
add median absolute error (MAE) regression metric
@larsmans larsmans merged commit 1e4492a into scikit-learn:master Oct 12, 2014
@larsmans
Copy link
Member

Merged. Thanks!

@FlorianWilhelm FlorianWilhelm deleted the median_absolute_error branch October 12, 2014 19:45
@arjoly
Copy link
Member

arjoly commented Oct 13, 2014

Thanks for your contribution and for making a separate pr!

@arjoly
Copy link
Member

arjoly commented Oct 13, 2014

@FlorianWilhelm Is it the standard way to compute the median_absolute_error in the multi-output case

>>> y_true = [[0.5, 1], [-1, 1], [7, -6]]
>>> y_pred = [[0, 2], [-1, 2], [8, -5]]
>>> median_absolute_error(y_true, y_pred)

?

@arjoly
Copy link
Member

arjoly commented Oct 13, 2014

I am not sure this is consistent with the averaging proposed in #3474.

@FlorianWilhelm
Copy link
Contributor Author

@arjoly I agree that it is not consistent. What we do here is to calculate the median over the flattened array. Functions like mean_absolute_error average on different levels, first along the multi-output vector then along all samples. Same could be done for the median.
@GaelVaroquaux contributed the metric as far as I remember in the Theil-Sen PR. What do you think about it? Should we change it to first calculate the median along the multi-output vector and then along the samples?

@arjoly
Copy link
Member

arjoly commented Oct 13, 2014

If we are unsure how to proceed now, I would raise an error for multi-output data.

@FlorianWilhelm
Copy link
Contributor Author

@arjoly I added a more consistent definition in PR #3764. Like in mean_absolute_error, the metric is first applied component-wise then on the number of samples. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants