-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Closed
Description
The bug is as follows:
y_true = np.array([0 for _ in xrange(10000)])
y_true[0] = 1
y_scores = np.array([0 for _ in xrange(10000)])
average_precision_score(y_true,y_scores)
Out[562]: 0.50004999999999999
It overestimates the true map. Think about it. Basically all 10000 the samples have the same rank (a dummy ranked list) and it gets lucky the first one is the true positive. How could this ap is 0.5?
If this is true, that is to say, for every problem if the code output nothing it gets a pretty good ap as 0.5.
In many hard problems, to get even 0.1-0.2 average precision is pretty hard.
Basically, you need shuffle the ranked list before calculating map to avoid this overestimation.