-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC Added references to the Matthews correlation coefficient function in the user guide #19958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC Added references to the Matthews correlation coefficient function in the user guide #19958
Conversation
I was under the impression that we add references in our user guide to back-up claims or explanations. While I am not against adding references, I think that it should be more motivated by adding insightful discussions and not just a drop-off of a reference section with no direct links with the discussion. I feel that sometimes our user guide is a kind of a metrics catalogue and might insights such as "which metrics to choose?", "what are the links between metrics?", etc. I think that references could come together with such explanations. I see that @thomasjpfan and @jnothman were inclined to such addition so I was wondering about their thoughts. |
Most of these references are from the docstring: scikit-learn/sklearn/metrics/_classification.py Lines 838 to 839 in c88c89c
Now that I look at the git blame, [1] is for original implementation, [2] is wikipedia, and [3] and [4] are for the multiclass implementation. In this contenxt, I do not think the new reference "The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy.. " should be included here. Maybe the best place to put this reference is a new example that compares MCC with other metrics. |
Hi @glemaitre and @thomasjpfan I think a user who wants to know more about the mathematical foundations of the MCC would be interested in reading them. That's why I suggest their inclusion. Best, -- Davide |
Hi -- Davide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that last reference would need a separate example/section to be justified. Otheriwise LGTM. I've removed that last one.
WDYT @glemaitre
For me, it would be enough to reference the wikipedia page that itself reference the other paper. The additional sentence does not justify why we add 4 references. |
Whoops, I did not mean to close the PR. |
I don't mind. Removed the extra ones. I guess you can merge this one now @glemaitre |
doc/modules/model_evaluation.rst
Outdated
@@ -1277,6 +1278,13 @@ function: | |||
>>> matthews_corrcoef(y_true, y_pred) | |||
-0.33... | |||
|
|||
.. topic:: References: | |||
|
|||
.. [WikipediaMCC2021] Wikipedia contributors. Detection error tradeoff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm there is something wrong here since it specify the DET instead of the MCC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Changed that, and using the Phi coefficient page directly since MCC is redirected here.
… in the user guide (scikit-learn#19958) Co-authored-by: adrinjalali <adrin.jalali@gmail.com>
… in the user guide (scikit-learn#19958) Co-authored-by: adrinjalali <adrin.jalali@gmail.com>
… in the user guide (#19958) Co-authored-by: adrinjalali <adrin.jalali@gmail.com>
Reference Issues/PRs
What does this implement/fix? Explain your changes.
I am proposing to include the reference to two articles of mine about the Matthews correlation coefficient because they describessome mathematical properties of MCC, some mathematical relationships between MCC and other measures (F1 score and accuracy; and balanced accuracy, bookmaker informedness, and markedness), and some use cases about how it works and what message it generates.
I think this information might be useful for the scikit-learn users who would like to know more about MCC.
I already proposed to include one of these two references in the MCC code function, but other users suggested me to add it to the user guide instead.
I am available to provide additional information, of course.
Thanks for considering my pull request.
Any other comments?