Skip to content

[MRG+2] DOC add info to conventions for multi-label fitting #7519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 6, 2016

Conversation

jkarno
Copy link
Contributor

@jkarno jkarno commented Sep 29, 2016

Reference Issue

Fixes #4639

What does this implement/fix? Explain your changes.

I added documentation to the Quick Start conventions, documenting how multiclass classifiers will act when fit on classifications vs. binary label indicators.

Any other comments?

This is my first pull request to Scikit-Learn, so please let me know if there is anything that should be done differently. I spoke with @amueller regarding this contribution, and he suggested this location for the new documentation. However, I'm not positive whether it should be under the type casting section.

Andy, please let me know if I misunderstood the issue. I tried to demonstrate the difference you described, but I'm not 100% sure this was what the issue was referring to.

Copy link
Contributor

@nelson-liu nelson-liu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i'm not sure type casting is the proper place for this. It most definitely is a convention, but I don't think it fits cleanly into either category.

@@ -310,6 +310,33 @@ Here, the first ``predict()`` returns an integer array, since ``iris.target``
(an integer array) was used in ``fit``. The second ``predict()`` returns a string
array, since ``iris.target_names`` was for fitting.

Similarly, when using `multiclass classifiers <http://scikit-learn.org/stable/modules/multiclass.html>`_,
the format of the target data used for fitting determines whether multiclass or multilabel predictions will be returned::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line is a bit long, can you split it up into 2?

@nelson-liu
Copy link
Contributor

nelson-liu commented Sep 29, 2016

fwiw seems like there's an existing PR here: #5837 that edits the multiclass documentation. I do think this should be mentioned here as well, though.

@jkarno
Copy link
Contributor Author

jkarno commented Sep 29, 2016

Created a new section in the conventions and tried to make shorter/clearer sentences.

I hadn't seen the other PR since it wasn't linked to the issue - should there be some sort of correspondence between the two PRs?

@nelson-liu
Copy link
Contributor

I hadn't seen the other PR since it wasn't linked to the issue - should there be some sort of correspondence between the two PRs?

I suppose they should give similar information and not conflict each other , but besides that they could probably be treated fairly independently in my opinion.

@jkarno jkarno changed the title DOC add info to conventions for multi-label fitting [MRG] DOC add info to conventions for multi-label fitting Sep 29, 2016
@amueller
Copy link
Member

I think the two PRs are independent.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also briefly note that multilabel data can be provided as a sparse matrix? Or does that belong elsewhere? Otherwise LGTM

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When using :class:`multiclass classifiers <sklearn.multiclass>`,
the format of returned predictions is dependent on the format of the target data fit upon. More specifically, a classifier fit on multiclass labels will return multiclass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the line shorter to be consistent with the rest of the docs.

The ``predict()`` method therefore provides corresponding multiclass predictions.
In the second case, the classification target is provided to ``fit()`` as
a 2d array of binary label indicators, using the :class:`LabelBinarizer <sklearn.preprocessing.LabelBinarizer>`.
In this case ``predict()`` returns a 2d array representing the corresponding multi-label predictions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might highlight the last two instances receiving no label at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe add that it's possible that multiple ones appear in a row? (or maybe change the example so that that actually happens?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, refer to MultiLabelBinarizer... Is that too much for a tutorial?

@amueller
Copy link
Member

amueller commented Oct 5, 2016

looks good apart from minor comments :)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When using :class:`multiclass classifiers <sklearn.multiclass>`,
the format of returned predictions is dependent on the format of the target data fit upon. More specifically, a classifier fit on multiclass labels will return multiclass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'd say "the learning and prediction task that is performed is dependent" because not only the output format is changed, what happens in the algorithm is different.

The ``predict()`` method therefore provides corresponding multiclass predictions.
In the second case, the classification target is provided to ``fit()`` as
a 2d array of binary label indicators, using the :class:`LabelBinarizer <sklearn.preprocessing.LabelBinarizer>`.
In this case ``predict()`` returns a 2d array representing the corresponding multi-label predictions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe add that it's possible that multiple ones appear in a row? (or maybe change the example so that that actually happens?)

@jnothman
Copy link
Member

jnothman commented Oct 6, 2016

LGTM

@jnothman jnothman changed the title [MRG] DOC add info to conventions for multi-label fitting [MRG+1] DOC add info to conventions for multi-label fitting Oct 6, 2016
@NelleV
Copy link
Member

NelleV commented Oct 6, 2016

The lines still seem too long. If we keep lines short (80char), future merges will be easier.

@jkarno
Copy link
Contributor Author

jkarno commented Oct 6, 2016

My mistake, I was misunderstanding what exactly was meant by line lengths. Does this seem better now?

@NelleV
Copy link
Member

NelleV commented Oct 6, 2016

Thanks!

@NelleV NelleV changed the title [MRG+1] DOC add info to conventions for multi-label fitting [MRG+2] DOC add info to conventions for multi-label fitting Oct 6, 2016
@jnothman
Copy link
Member

jnothman commented Oct 6, 2016

Thanks @jkarno!

@jnothman jnothman merged commit 45cb11d into scikit-learn:master Oct 6, 2016
@amueller
Copy link
Member

amueller commented Oct 7, 2016

sweet, grats @jkarno :)

amueller pushed a commit to amueller/scikit-learn that referenced this pull request Oct 14, 2016
…earn#7519)

* DOC add info to conventions for multi-label fitting

* DOC move to multilabel section and small edits

* DOC clean multilabel examples and clean information

* DOC fix line lengths for multiclass
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
…earn#7519)

* DOC add info to conventions for multi-label fitting

* DOC move to multilabel section and small edits

* DOC clean multilabel examples and clean information

* DOC fix line lengths for multiclass
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…earn#7519)

* DOC add info to conventions for multi-label fitting

* DOC move to multilabel section and small edits

* DOC clean multilabel examples and clean information

* DOC fix line lengths for multiclass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Documentation of label-binarizer and multi-label format
5 participants