Skip to content

[MRG] Use defined notation for precision and recall #12726

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 6, 2018

Conversation

gsganden
Copy link

@gsganden gsganden commented Dec 5, 2018

Replace undefined symbols A and B in the definitions of precision, recall, and F-beta with defined symbols y_l and \hat{y}_l.

Reference Issues/PRs

NA

What does this implement/fix? Explain your changes.

The user guide on model evaluation defines precision and recall in terms of A and B without defining those symbols. There does seem to be a convention in the Information Retrieval literature of using A refer to relevant document (analogous to positive samples) and B to refer to retrieved documents (analogous to positive predictions). We already have defined symbols for those concepts, namely \hat{y}_l and y_l, respectively, where l is the "positive" class label. When I replace A with \hat{y}_l and B with y_l, then the resulting definitions of precision and recall are reversed, so I correct that issue in addition to the simple find-and-replace operation.

Any other comments?

The more I look at this section of the document, the more apparent problems I find. I think this PR fixes one of them.

Greg Gandenberger added 2 commits December 5, 2018 14:03
Replace undefined "`A`" and "`B`" in definitions of precision, recall, and F-beta with defined symbols "`y_l`" and "`\hat{y}_l`" and correct the resulting definitions.
I missed that the definitions use `\hat{y}` for true labels and `y` for predicted labels, so I had precision and recall reversed.
Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @gsganden

@qinhanmin2014 qinhanmin2014 merged commit 1cb56ba into scikit-learn:master Dec 6, 2018
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, and would like to see this reverted.

Those definitions of P and R are generic functions applied to generic sets. I don't mind them being y and \hat{y} but using _l is in direct contradiction to what is described below in regards to different averaging.

Please do raise issues with respect to other errors in the documentation here.

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Dec 9, 2018
adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this pull request Jan 7, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants