Skip to content

[MRG+1] Doc: complete PR #11180 for the reorganization of the dataset loading utilities section #11328

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 2, 2018

Conversation

jeremiedbb
Copy link
Member

This PR is just the continued work started in #11180 for the reorganization of the datasets loading utilities
section in the doc discussed in #11083.

There were TODOs left out in #11180 that this PR aims to fill. It's essentially adding/reworking introductions of the new datasets loading utilities sections.

@jnothman
Copy link
Member

And again, I'm sorry for messing up the merge...

@jeremiedbb
Copy link
Member Author

It's really not a big deal, and thank YOU for reviewing all my PRs so fast !

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some comments. Also please prefix the titles of your PR with [WIP] (Work In Progress) or [MRG] (Ready to merge) to let the reviewers know whether you are still working on improvements to an early PR or not.


*desc*
They can be loaded using the following functions :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No whitespace before ":" in English.

@@ -262,6 +277,8 @@ Generators for decomposition
Loading other datasets
======================

This section gathers different tools to load other kinds of datasets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence brings very little information I would rather remove it to avoid diluting the actual information with boilerplate English connection sentences.

require to download any file from some external website.
scikit-learn comes with a few small standard datasets that do not require to
download any file from some external website. A description of each dataset is
available below.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"A description of each dataset is available below." could probably be trimmed to keep the documentation as informative as possible.

some contain ``feature_names`` and ``target_names``. See the dataset
descriptions below for details.

**The dataset generation functions.** They can be used to generate controled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: controlled

@@ -9,49 +9,61 @@ Dataset loading utilities
The ``sklearn.datasets`` package embeds some small toy datasets
as introduced in the :ref:`Getting Started <loading_example_dataset>` section.

This package also features helpers to fetch larger datasets commonly
used by the machine learning community to benchmark algorithm on data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... to benchmark algorithms ...

@jeremiedbb jeremiedbb changed the title DOC: complete PR #11180 for the reorganization of the dataset loading utilities section [MRG] Doc: complete PR #11180 for the reorganization of the dataset loading utilities section Jun 21, 2018
Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jeremiedbb.

interface, returning a tuple ``(X, y)`` consisting of a ``n_samples`` *
Both loaders and fetchers functions return a dictionary-like object holding
at least two items: an array of shape ``n_samples`` * ``n_features`` with
key ``data``(except for 20newsgroups) and a numpy array of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why but there's formatting issue here. See https://25798-843222-gh.circle-artifacts.com/0/doc/datasets/index.html. I guess we should have a blank before the left bracket

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice, thanks !

@qinhanmin2014 qinhanmin2014 changed the title [MRG] Doc: complete PR #11180 for the reorganization of the dataset loading utilities section [MRG+1] Doc: complete PR #11180 for the reorganization of the dataset loading utilities section Jun 30, 2018
@qinhanmin2014 qinhanmin2014 added this to the 0.20 milestone Jun 30, 2018
@jnothman jnothman merged commit b56fa39 into scikit-learn:master Jul 2, 2018
@qinhanmin2014
Copy link
Member

Thanks @jeremiedbb for the great work :)

@jeremiedbb jeremiedbb deleted the doc-dataset-utilites branch July 12, 2018 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants