Skip to content

a bug in average precision function #6377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
roadjiang opened this issue Feb 17, 2016 · 7 comments
Closed

a bug in average precision function #6377

roadjiang opened this issue Feb 17, 2016 · 7 comments
Labels
Milestone

Comments

@roadjiang
Copy link

The bug is as follows:

y_true = np.array([0 for _ in xrange(10000)])
y_true[0] = 1
y_scores = np.array([0 for _ in xrange(10000)])
average_precision_score(y_true,y_scores)

Out[562]: 0.50004999999999999

It overestimates the true map. Think about it. Basically all 10000 the samples have the same rank (a dummy ranked list) and it gets lucky the first one is the true positive. How could this ap is 0.5?

If this is true, that is to say, for every problem if the code output nothing it gets a pretty good ap as 0.5.
In many hard problems, to get even 0.1-0.2 average precision is pretty hard.

Basically, you need shuffle the ranked list before calculating map to avoid this overestimation.

@hlin117
Copy link
Contributor

hlin117 commented Sep 6, 2016

So what do you assert should be the actual MAP value?

@amueller amueller added the Bug label Sep 6, 2016
@amueller
Copy link
Member

amueller commented Sep 6, 2016

Looks like a weird edge-case, but I wouldn't say it's very critical. This happens if the scores are floating point equal, which is very unlikely to happen in any real application. And it's unclear what the best tie-breaking behavior is. I wouldn't really say the current one is wrong, but it's probably not optimal. If we want to randomize it, we need to somehow keep track of the random state. I'm not entirely sure how to do this. Currently all scores are deterministic.

@roadjiang
Copy link
Author

So what do you assert should be the actual MAP value?

I assume the result would be close to a random average precision of 10000 elements. Definitely less than 0.01.

@ndingwall
Copy link
Contributor

@amueller There's no need to randomize to perform tie-breaking, because random tie-breaking leads (in the limit) to horizontal segments. The step-wise interpolation option introduced by this PR implements this and gives average precision of 0.0001 for @roadjiang's example above.

I'm not convinced that this is an edge case: a weak model with few features that can only assign one of (say) 10 scores will have artificially inflated average precision using the existing implementation, and so may seem to be preferable to a stronger model that assigns one of 1000 scores. In this blog post I highlight this by showing that rounding a model's scores to the nearest tenth can actually improve apparent performance, which is clearly crazy!

@amueller
Copy link
Member

amueller commented Sep 8, 2016

@ndingwall Which definition of average precision are you using here? The wikipedia one is not defined for ties. The blog post is neat, but it brushes over the distinctions between the "left-stepwise interpolated integral", the wikipedia definition, and the "smoothed" integral.

@amueller
Copy link
Member

amueller commented Sep 8, 2016

Maybe I'll write my own blog-post rant tomorrow ;)

@GaelVaroquaux
Copy link
Member

Fixed by #9017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants