-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Metric documentation (mainly in classification) #1512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks a lot! You're on fire :)
|
Currently the clustering metrics are documented in the clustering module. I think I would prefer them to be documented in the metrics module. Any opinions on that @arjoly @ogrisel @GaelVaroquaux @robertlayton ? |
By the way, there is two metrics that ends with @amueller I have to leave, I will think about it tomorrow. Edit: yes indeed |
Lol I see we will become good friends :) Maybe for regression the losses are called errors? I am not sure how much thought went into a naming scheme. I think being explicit is important, which is why I don't like Being consistent is nice, too, but we shouldn't overdo it. Is the rest consistent? Is it worth renaming to Or could we try to have separate naming schemes for classification, regression and clustering metrics? OT |
No strong opinion as long as there are some frequent and highly visible G |
I think its ok to keep the |
+1 for being more explicit by deprecating |
+1 for having accuracy_score. Generally, if the thing has an explicit name, we should use that. In cases where the same metric has multiple names, this runs into problems. As a thought, how do people feel about having aliases? i.e. after the definition of Oh, and put all metric documentation in the metrics part. Makes sense, that is where I would have expected them to be. With the clustering documentation update I'm (slowly) getting through, the documentation page may get significantly longer. |
Not too excited: there should be one and only one prefered way of
I'd suggest to put disambiguation in the docstring and recommend the use |
If we are lucky then people can google the docs ;) |
That's what I thought, but I didn't dare mention it :) |
Of course they are. In my defence, I only just had my first coffee of the day. |
It would be nice to have everything in one place and a link from the clustering documentation to the module evaluation part.
Will you make it or I perform the change? I will initiate name changes ( |
Yes. |
And update the API change of the |
Currently |
I think there is still some typo to hunt down, but what I wanted to achieve is there. So reviews and comments are welcome. :-) |
Alright for keeping |
you meant |
I think that is what he meant. |
Indeed... sorry for the confusion. |
roc_curve | ||
|
||
|
||
Others have been extended to the multiclass case: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather say something along the lines of "also work in". For example a confusion matrix is quite naturally multiclass and doesn't need to be extended.
I have rebased on top of master to take into account #1534. @GaelVaroquaux The deprecated decorator is now used. To summarize what are the remaining questions / remarks to take into account
For
|
Have you changed the default behavior of the renamed zero_one? could you please test that the old name has the old default behavior and the new name has the new? |
>>> zero_one_loss(y_true, y_pred) | ||
0.25 | ||
>>> zero_one_loss(y_true, y_pred, normalize=False) | ||
1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amueller Check the doctest here.
No.
See the reference. Does it answer your question? |
Taken into account your question, I have also clarified the api changed. Good catch! Thanks. |
Explained variance can be negative for completely random or anti-correlated predictions:
Usually for a real life, non dummy model it's positive though (if the signal is not itself completely random). |
assert_equal(zero_one(y_true, y_pred), | ||
zero_one(y_pred, y_true)) | ||
|
||
assert_almost_equal(zero_one(y_true, y_pred, normalize=True), | ||
zero_one(y_pred, y_true, normalize=True), 2) | ||
zero_one(y_pred, y_true, normalize=True), 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these raise deprecation warnings, right? could you please catch them?
@arjoly Thanks for the pointers and sorry for being lazy. Having a busy day. Apart from my comment on the deprecation warnings I think this is good to go. |
@amueller Warnings are catched! |
thanks :) I guess you are good to know. feel free to merge using you preferred method ;) Arnaud Joly notifications@github.com schrieb:
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet. |
Merge by rebase! :-) Thanks all for your time and review!!! |
A total of 1,306 additions and 215 deletions. That's a huge contribution! Thanks heaps! |
+1 :) Mathieu Blondel notifications@github.com schrieb:
|
I would like to partly tackle the issue in #1508.
In this pr, I intend to add definitions to classification metrics and remaining regression metrics.
Before going into the MRG state, I still have read two or three times the added documentation to find mistakes.
Furthermore, I am not sure about the documentation on
explained_variance_score
. It is new to me.Questions:
zero_one
andzero_one_score
instead ofzero_one_loss
andaccuracy_score
?ClassifierMixin
uses a "homebrew" accuracy metrics in the score member?