a bug in average precision function #6377

roadjiang · 2016-02-17T06:07:43Z

The bug is as follows:

y_true = np.array([0 for _ in xrange(10000)])
y_true[0] = 1
y_scores = np.array([0 for _ in xrange(10000)])
average_precision_score(y_true,y_scores)

Out[562]: 0.50004999999999999

It overestimates the true map. Think about it. Basically all 10000 the samples have the same rank (a dummy ranked list) and it gets lucky the first one is the true positive. How could this ap is 0.5?

If this is true, that is to say, for every problem if the code output nothing it gets a pretty good ap as 0.5.
In many hard problems, to get even 0.1-0.2 average precision is pretty hard.

Basically, you need shuffle the ranked list before calculating map to avoid this overestimation.

hlin117 · 2016-09-06T07:59:21Z

So what do you assert should be the actual MAP value?

amueller · 2016-09-06T17:31:27Z

Looks like a weird edge-case, but I wouldn't say it's very critical. This happens if the scores are floating point equal, which is very unlikely to happen in any real application. And it's unclear what the best tie-breaking behavior is. I wouldn't really say the current one is wrong, but it's probably not optimal. If we want to randomize it, we need to somehow keep track of the random state. I'm not entirely sure how to do this. Currently all scores are deterministic.

roadjiang · 2016-09-06T17:46:59Z

So what do you assert should be the actual MAP value?

I assume the result would be close to a random average precision of 10000 elements. Definitely less than 0.01.

ndingwall · 2016-09-07T22:13:36Z

@amueller There's no need to randomize to perform tie-breaking, because random tie-breaking leads (in the limit) to horizontal segments. The step-wise interpolation option introduced by this PR implements this and gives average precision of 0.0001 for @roadjiang's example above.

I'm not convinced that this is an edge case: a weak model with few features that can only assign one of (say) 10 scores will have artificially inflated average precision using the existing implementation, and so may seem to be preferable to a stronger model that assigns one of 1000 scores. In this blog post I highlight this by showing that rounding a model's scores to the nearest tenth can actually improve apparent performance, which is clearly crazy!

amueller · 2016-09-08T21:54:16Z

@ndingwall Which definition of average precision are you using here? The wikipedia one is not defined for ties. The blog post is neat, but it brushes over the distinctions between the "left-stepwise interpolated integral", the wikipedia definition, and the "smoothed" integral.

amueller · 2016-09-08T21:54:33Z

Maybe I'll write my own blog-post rant tomorrow ;)

GaelVaroquaux · 2017-06-09T22:58:14Z

Fixed by #9017

achalddave mentioned this issue Mar 12, 2016

error in average_precision_score #5379

Closed

amueller added the Bug label Sep 6, 2016

ndingwall mentioned this issue Sep 7, 2016

[MRG] Bug fix and new feature: fix implementation of average precision score and add eleven-point interpolated option #7356

Closed

amueller modified the milestone: 0.19 Sep 29, 2016

GaelVaroquaux mentioned this issue Jun 6, 2017

[MRG] Bug fix and new feature: fix implementation of average precision score and add eleven-point interpolated option (7356 rebased) #9017

Closed

GaelVaroquaux closed this as completed Jun 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a bug in average precision function #6377

a bug in average precision function #6377

roadjiang commented Feb 17, 2016

hlin117 commented Sep 6, 2016

amueller commented Sep 6, 2016

roadjiang commented Sep 6, 2016

ndingwall commented Sep 7, 2016

amueller commented Sep 8, 2016 •

edited

Loading

amueller commented Sep 8, 2016

GaelVaroquaux commented Jun 9, 2017

a bug in average precision function #6377

a bug in average precision function #6377

Comments

roadjiang commented Feb 17, 2016

hlin117 commented Sep 6, 2016

amueller commented Sep 6, 2016

roadjiang commented Sep 6, 2016

ndingwall commented Sep 7, 2016

amueller commented Sep 8, 2016 • edited Loading

amueller commented Sep 8, 2016

GaelVaroquaux commented Jun 9, 2017

amueller commented Sep 8, 2016 •

edited

Loading