-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
a bug in average precision function #6377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So what do you assert should be the actual MAP value? |
Looks like a weird edge-case, but I wouldn't say it's very critical. This happens if the scores are floating point equal, which is very unlikely to happen in any real application. And it's unclear what the best tie-breaking behavior is. I wouldn't really say the current one is wrong, but it's probably not optimal. If we want to randomize it, we need to somehow keep track of the random state. I'm not entirely sure how to do this. Currently all scores are deterministic. |
I assume the result would be close to a random average precision of 10000 elements. Definitely less than 0.01. |
@amueller There's no need to randomize to perform tie-breaking, because random tie-breaking leads (in the limit) to horizontal segments. The step-wise interpolation option introduced by this PR implements this and gives average precision of 0.0001 for @roadjiang's example above. I'm not convinced that this is an edge case: a weak model with few features that can only assign one of (say) 10 scores will have artificially inflated average precision using the existing implementation, and so may seem to be preferable to a stronger model that assigns one of 1000 scores. In this blog post I highlight this by showing that rounding a model's scores to the nearest tenth can actually improve apparent performance, which is clearly crazy! |
@ndingwall Which definition of average precision are you using here? The wikipedia one is not defined for ties. The blog post is neat, but it brushes over the distinctions between the "left-stepwise interpolated integral", the wikipedia definition, and the "smoothed" integral. |
Maybe I'll write my own blog-post rant tomorrow ;) |
Fixed by #9017 |
The bug is as follows:
Out[562]: 0.50004999999999999
It overestimates the true map. Think about it. Basically all 10000 the samples have the same rank (a dummy ranked list) and it gets lucky the first one is the true positive. How could this ap is 0.5?
If this is true, that is to say, for every problem if the code output nothing it gets a pretty good ap as 0.5.
In many hard problems, to get even 0.1-0.2 average precision is pretty hard.
Basically, you need shuffle the ranked list before calculating map to avoid this overestimation.
The text was updated successfully, but these errors were encountered: