-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
NDCG score doesn't work with binary relevance and a list of 1 element #21335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It doesn't seem like a well-defined problem in the case of a single input to me. I'm not sure what you'd expect to get |
I'm skipping the computation if there are 0 relevant documents (any(truths) is False), since the metric is undefined. |
pinging @jeremiedbb and @jeromedockes who worked on the implementation. |
which ndcg definition, could you point to a reference? (I ask because IIRC there is some variability in the definitions people use). Normalized DCG is the ratio between the DCG obtained for the predicted and true rankings, and in my understanding when there is only one possible ranking (when there is only one candidate as in this example), both rankings are the same so this ratio should be 1. (this is the value we obtain if we disable this check). however, ranking a list of length 1 is not meaningful, so if y_true has only one column it seems more likely that there was a mistake in the formatting/representation of the true gains, or that a user applied this ranking metric to a binary classification task. Therefore raising an error seems reasonable to me, but I guess the message could be improved (although it is hard to guess what was the mistake). showing a warning and returning 1.0 could also be an option |
note this is a duplicate of #20119 AFAICT |
HI jerome, you are right, I made a mistake. I'm using the definition on wikipedia |
indeed when all documents are truly irrelevant and the ndcg is thus 0 / 0 (undefined) currently 0 is returned (as seen here). but still I think measuring ndcg for a list of 1 document is not meaningful (regardless of the value of the relevance), so raising an error about the shape of y_true makes sense. |
So we should improve the error message in this case. |
I am happy to work on this if it hasn’t been assigned yet |
@georged4s I can see that #24482 has been open but it seems stalled. I think that you can claim the issue and propose a fix. You can also look at the review done in the older PR. |
Thanks @glemaitre for replying and for the heads up. Cool, I will look into this one. |
I came here as I have suffered the same problem, it doesn't support binary targets. Also, it would be great if it could be calculated simultaneously for a batch of users. |
Hi, there doesn't seem to be a linked PR (excluding the stalled one), could I pick it up? |
Picking it up as part of the PyLadies "Contribute to scikit-learn" workshop |
See this code example:
It works correctly when the number of elements is bigger than 1: https://stackoverflow.com/questions/64303839/how-to-calculate-ndcg-with-binary-relevances-using-sklearn
The text was updated successfully, but these errors were encountered: