Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions doc/tutorial/text_analytics/working_with_text_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ The most intuitive way to do so is the bags of words representation:

The bags of words representation implies that ``n_features`` is
the number of distinct words in the corpus: this number is typically
larger that 100,000.
larger than 100,000.

If ``n_samples == 10000``, storing ``X`` as a numpy array of type
float32 would require 10000 x 100000 x 4 bytes = **4GB in RAM** which
Expand Down Expand Up @@ -443,13 +443,13 @@ to speed up the computation::
The result of calling ``fit`` on a ``GridSearchCV`` object is a classifier
that we can use to ``predict``::

>>> twenty_train.target_names[gs_clf.predict(['God is love'])]
>>> twenty_train.target_names[gs_clf.predict(['God is love'])[0]]
'soc.religion.christian'

The object's ``best_score_`` and ``best_params_`` attributes store the best
mean score and the parameters setting corresponding to that score::

>>> gs_clf.best_score_
>>> gs_clf.best_score_ # doctest: +ELLIPSIS
0.900...
>>> for param_name in sorted(parameters.keys()):
... print("%s: %r" % (param_name, gs_clf.best_params_[param_name]))
Expand Down
3 changes: 2 additions & 1 deletion sklearn/datasets/twenty_newsgroups.py
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,8 @@ def fetch_20newsgroups(data_home=None, subset='train', categories=None,

if cache is None:
if download_if_missing:
print('Downloading 20news dataset. This may take a few minutes.')
logger.info("Downloading 20news dataset. "
"This may take a few minutes.")
cache = download_20newsgroups(target_dir=twenty_home,
cache_path=cache_path)
else:
Expand Down