-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] DOC: fix document that fetch_20newsgroups #12783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bunch.DESCR: a description of the dataset. | ||
bunch : Bunch object with the following attribute: | ||
|
||
bunch.data: list, length [n_samples] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the extra indentation here is actually useful, but maybe this would work?
bunch ...
- bunch.blahblah ....
....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
|
||
bunch.DESCR: a description of the dataset. | ||
|
||
bunch.target_names: a list of categories containing in the dataset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the categories of the returned data, not the whole dataset (I may be mistaken)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! you are rigth. So. the length should be [n_samples] right?
bunch.DESCR: a description of the dataset. | ||
bunch : Bunch object with the following attribute: | ||
|
||
- bunch.data: list, length [n_samples] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may not need these newlines in between these items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remove it.
- bunch.DESCR: a description of the dataset. | ||
|
||
- bunch.target_names: a list of categories of the returned data, | ||
length [n_classes]. This depends of the `categories` parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this another one in the itemized list, or is it a second output (in which case it has one extra indentation)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the last commit I improve this
El 14 dic. 2018 7:36 PM, Adrin Jalali <notifications@github.com> escribió:@adrinjalali commented on this pull request.
In sklearn/datasets/twenty_newsgroups.py:
- bunch.data: list, length [n_samples]
- bunch.target: array, shape [n_samples]
- bunch.filenames: list, length [n_classes]
- bunch.DESCR: a description of the dataset.
+ bunch : Bunch object with the following attribute:
+
+ - bunch.data: list, length [n_samples]
+
+ - bunch.target: array, shape [n_samples]
+
+ - bunch.filenames: list, length [n_classes]
+
+ - bunch.DESCR: a description of the dataset.
+
+ - bunch.target_names: a list of categories of the returned data,
+ length [n_classes]. This depends of the `categories` parameter.
Is this another one in the itemized list, or is it a second output (in which case it has one extra indentation)?Is the length of the target_names list. Do I writte it on a different way?
—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or mute the thread.
|
El 14 dic. 2018 7:35 PM, Adrin Jalali <notifications@github.com> escribió:@adrinjalali commented on this pull request.
In sklearn/datasets/twenty_newsgroups.py:
@@ -216,11 +216,18 @@ def fetch_20newsgroups(data_home=None, subset='train', categories=None,
Returns
-------
- bunch : Bunch object
- bunch.data: list, length [n_samples]
- bunch.target: array, shape [n_samples]
- bunch.filenames: list, length [n_classes]
- bunch.DESCR: a description of the dataset.
+ bunch : Bunch object with the following attribute:
+
+ - bunch.data: list, length [n_samples]
I think you may not need these newlines in between these items.Ok! I will clean this
—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or mute the thread.
|
bunch.DESCR: a description of the dataset. | ||
bunch : Bunch object with the following attribute: | ||
- bunch.data: list, length [n_samples] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need all these blank lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready. I remove all blank lines from the docs. I did not sure about the sphinx behavior, because of that I used the blank line. :-)
bunch : Bunch object with the following attribute: | ||
- bunch.data: list, length [n_samples] | ||
- bunch.target: array, shape [n_samples] | ||
- bunch.filenames: list, length [n_classes] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_samples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to say that is the target_names of the returned data (similarly to fetch_20newsgroups
). Maybe n_samples is the correct word?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean?
news = fetch_20newsgroups()
len(news.filenames)
# 11314
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eamanu no response here?
- bunch.filenames: list, length [n_classes] | ||
- bunch.DESCR: a description of the dataset. | ||
- bunch.target_names: a list of categories of the returned data, | ||
length [n_classes]. This depends of the `categories` parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depends on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fix it
bunch : Bunch object with the following attribute: | ||
- bunch.data: sparse matrix, shape [n_samples, n_features] | ||
- bunch.target: array, shape [n_samples] | ||
- bunch.target_names: list, length [n_samples] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_classes? copy paste your version from fetch_20newsgroups
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will do that
scikit-learn#12783)" This reverts commit 29ac841.
scikit-learn#12783)" This reverts commit 29ac841.
Reference Issues/PRs
Fix #12777
Related https://github.com/scikit-learn/scikit-learn/pull/12770/files#r241655146
What does this implement/fix? Explain your changes.
Improve doc of
fetch_20newsgroups
and addtarget_names
on Return on docstring.Any other comments?