Skip to content

[MRG] DOC: fix document that fetch_20newsgroups #12783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 23, 2018

Conversation

eamanu
Copy link
Contributor

@eamanu eamanu commented Dec 14, 2018

Reference Issues/PRs

Fix #12777
Related https://github.com/scikit-learn/scikit-learn/pull/12770/files#r241655146

What does this implement/fix? Explain your changes.

Improve doc of fetch_20newsgroups and add target_names on Return on docstring.

Any other comments?

bunch.DESCR: a description of the dataset.
bunch : Bunch object with the following attribute:

bunch.data: list, length [n_samples]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the extra indentation here is actually useful, but maybe this would work?

bunch ...
    - bunch.blahblah ....
    ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!


bunch.DESCR: a description of the dataset.

bunch.target_names: a list of categories containing in the dataset,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the categories of the returned data, not the whole dataset (I may be mistaken)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! you are rigth. So. the length should be [n_samples] right?

@eamanu eamanu changed the title [WIP] DOC: fix document that fetch_20newsgroups [MRG] DOC: fix document that fetch_20newsgroups Dec 14, 2018
bunch.DESCR: a description of the dataset.
bunch : Bunch object with the following attribute:

- bunch.data: list, length [n_samples]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may not need these newlines in between these items.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remove it.

- bunch.DESCR: a description of the dataset.

- bunch.target_names: a list of categories of the returned data,
length [n_classes]. This depends of the `categories` parameter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this another one in the itemized list, or is it a second output (in which case it has one extra indentation)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the last commit I improve this

@eamanu
Copy link
Contributor Author

eamanu commented Dec 15, 2018 via email

@eamanu
Copy link
Contributor Author

eamanu commented Dec 15, 2018 via email

bunch.DESCR: a description of the dataset.
bunch : Bunch object with the following attribute:
- bunch.data: list, length [n_samples]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need all these blank lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready. I remove all blank lines from the docs. I did not sure about the sphinx behavior, because of that I used the blank line. :-)

bunch : Bunch object with the following attribute:
- bunch.data: list, length [n_samples]
- bunch.target: array, shape [n_samples]
- bunch.filenames: list, length [n_classes]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n_samples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to say that is the target_names of the returned data (similarly to fetch_20newsgroups). Maybe n_samples is the correct word?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

news = fetch_20newsgroups()
len(news.filenames)
# 11314

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eamanu no response here?

- bunch.filenames: list, length [n_classes]
- bunch.DESCR: a description of the dataset.
- bunch.target_names: a list of categories of the returned data,
length [n_classes]. This depends of the `categories` parameter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depends on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fix it

bunch : Bunch object with the following attribute:
- bunch.data: sparse matrix, shape [n_samples, n_features]
- bunch.target: array, shape [n_samples]
- bunch.target_names: list, length [n_samples]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n_classes? copy paste your version from fetch_20newsgroups

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will do that

@qinhanmin2014 qinhanmin2014 merged commit 19a7c08 into scikit-learn:master Dec 23, 2018
adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this pull request Jan 7, 2019
jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Feb 19, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC document that fetch_20newsgroups also returns target_names
3 participants