Skip to content

Commit 94b8471

Browse files
authored
MNT Work-around sphinx-gallery UnicodeDecodeError in recommender system (scikit-learn#27969)
1 parent c771b1e commit 94b8471

File tree

2 files changed

+6
-6
lines changed

2 files changed

+6
-6
lines changed

examples/linear_model/plot_lasso_and_elasticnet.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -245,4 +245,4 @@
245245
#
246246
# .. [1] :doi:`"Lasso-type recovery of sparse representations for
247247
# high-dimensional data" N. Meinshausen, B. Yu - The Annals of Statistics
248-
# 2009, Vol. 37, No. 1, 246270 <10.1214/07-AOS582>`
248+
# 2009, Vol. 37, No. 1, 246-270 <10.1214/07-AOS582>`

examples/text/plot_hashing_vs_dict_vectorizer.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -299,7 +299,7 @@ def n_nonzero_columns(X):
299299
#
300300
# Now we make a similar experiment with the
301301
# :func:`~sklearn.feature_extraction.text.HashingVectorizer`, which is
302-
# equivalent to combining the hashing trick implemented by the
302+
# equivalent to combining the "hashing trick" implemented by the
303303
# :func:`~sklearn.feature_extraction.FeatureHasher` class and the text
304304
# preprocessing and tokenization of the
305305
# :func:`~sklearn.feature_extraction.text.CountVectorizer`.
@@ -322,15 +322,15 @@ def n_nonzero_columns(X):
322322
# TfidfVectorizer
323323
# ---------------
324324
#
325-
# In a large text corpus, some words appear with higher frequency (e.g. the,
326-
# “a”, “is” in English) and do not carry meaningful information about the actual
325+
# In a large text corpus, some words appear with higher frequency (e.g. "the",
326+
# "a", "is" in English) and do not carry meaningful information about the actual
327327
# contents of a document. If we were to feed the word count data directly to a
328328
# classifier, those very common terms would shadow the frequencies of rarer yet
329329
# more informative terms. In order to re-weight the count features into floating
330330
# point values suitable for usage by a classifier it is very common to use the
331-
# tfidf transform as implemented by the
331+
# tf-idf transform as implemented by the
332332
# :func:`~sklearn.feature_extraction.text.TfidfTransformer`. TF stands for
333-
# "term-frequency" while "tfidf" means term-frequency times inverse
333+
# "term-frequency" while "tf-idf" means term-frequency times inverse
334334
# document-frequency.
335335
#
336336
# We now benchmark the :func:`~sklearn.feature_extraction.text.TfidfVectorizer`,

0 commit comments

Comments
 (0)