-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+2] DOC add info to conventions for multi-label fitting #7519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i'm not sure type casting is the proper place for this. It most definitely is a convention, but I don't think it fits cleanly into either category.
@@ -310,6 +310,33 @@ Here, the first ``predict()`` returns an integer array, since ``iris.target`` | |||
(an integer array) was used in ``fit``. The second ``predict()`` returns a string | |||
array, since ``iris.target_names`` was for fitting. | |||
|
|||
Similarly, when using `multiclass classifiers <http://scikit-learn.org/stable/modules/multiclass.html>`_, | |||
the format of the target data used for fitting determines whether multiclass or multilabel predictions will be returned:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line is a bit long, can you split it up into 2?
fwiw seems like there's an existing PR here: #5837 that edits the multiclass documentation. I do think this should be mentioned here as well, though. |
Created a new section in the conventions and tried to make shorter/clearer sentences. I hadn't seen the other PR since it wasn't linked to the issue - should there be some sort of correspondence between the two PRs? |
I suppose they should give similar information and not conflict each other , but besides that they could probably be treated fairly independently in my opinion. |
I think the two PRs are independent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also briefly note that multilabel data can be provided as a sparse matrix? Or does that belong elsewhere? Otherwise LGTM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
When using :class:`multiclass classifiers <sklearn.multiclass>`, | ||
the format of returned predictions is dependent on the format of the target data fit upon. More specifically, a classifier fit on multiclass labels will return multiclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the line shorter to be consistent with the rest of the docs.
The ``predict()`` method therefore provides corresponding multiclass predictions. | ||
In the second case, the classification target is provided to ``fit()`` as | ||
a 2d array of binary label indicators, using the :class:`LabelBinarizer <sklearn.preprocessing.LabelBinarizer>`. | ||
In this case ``predict()`` returns a 2d array representing the corresponding multi-label predictions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might highlight the last two instances receiving no label at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe add that it's possible that multiple ones appear in a row? (or maybe change the example so that that actually happens?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, refer to MultiLabelBinarizer
... Is that too much for a tutorial?
looks good apart from minor comments :) |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
When using :class:`multiclass classifiers <sklearn.multiclass>`, | ||
the format of returned predictions is dependent on the format of the target data fit upon. More specifically, a classifier fit on multiclass labels will return multiclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'd say "the learning and prediction task that is performed is dependent" because not only the output format is changed, what happens in the algorithm is different.
The ``predict()`` method therefore provides corresponding multiclass predictions. | ||
In the second case, the classification target is provided to ``fit()`` as | ||
a 2d array of binary label indicators, using the :class:`LabelBinarizer <sklearn.preprocessing.LabelBinarizer>`. | ||
In this case ``predict()`` returns a 2d array representing the corresponding multi-label predictions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe add that it's possible that multiple ones appear in a row? (or maybe change the example so that that actually happens?)
LGTM |
The lines still seem too long. If we keep lines short (80char), future merges will be easier. |
My mistake, I was misunderstanding what exactly was meant by line lengths. Does this seem better now? |
Thanks! |
Thanks @jkarno! |
sweet, grats @jkarno :) |
…earn#7519) * DOC add info to conventions for multi-label fitting * DOC move to multilabel section and small edits * DOC clean multilabel examples and clean information * DOC fix line lengths for multiclass
…earn#7519) * DOC add info to conventions for multi-label fitting * DOC move to multilabel section and small edits * DOC clean multilabel examples and clean information * DOC fix line lengths for multiclass
…earn#7519) * DOC add info to conventions for multi-label fitting * DOC move to multilabel section and small edits * DOC clean multilabel examples and clean information * DOC fix line lengths for multiclass
Reference Issue
Fixes #4639
What does this implement/fix? Explain your changes.
I added documentation to the Quick Start conventions, documenting how multiclass classifiers will act when fit on classifications vs. binary label indicators.
Any other comments?
This is my first pull request to Scikit-Learn, so please let me know if there is anything that should be done differently. I spoke with @amueller regarding this contribution, and he suggested this location for the new documentation. However, I'm not positive whether it should be under the type casting section.
Andy, please let me know if I misunderstood the issue. I tried to demonstrate the difference you described, but I'm not 100% sure this was what the issue was referring to.