MNT Work-around sphinx-gallery UnicodeDecodeError in recommender system (scikit-learn#27969)

Charlie-XIAO · web-flow · commit 94b84718f9a9 · 2023-12-18T11:19:34.000+01:00
diff --git a/examples/linear_model/plot_lasso_and_elasticnet.py b/examples/linear_model/plot_lasso_and_elasticnet.py
@@ -245,4 +245,4 @@
 #
 #   .. [1] :doi:`"Lasso-type recovery of sparse representations for
 #    high-dimensional data" N. Meinshausen, B. Yu - The Annals of Statistics
-#    2009, Vol. 37, No. 1, 246–270 <10.1214/07-AOS582>`
+#    2009, Vol. 37, No. 1, 246-270 <10.1214/07-AOS582>`
diff --git a/examples/text/plot_hashing_vs_dict_vectorizer.py b/examples/text/plot_hashing_vs_dict_vectorizer.py
@@ -299,7 +299,7 @@ def n_nonzero_columns(X):
 #
 # Now we make a similar experiment with the
 # :func:`~sklearn.feature_extraction.text.HashingVectorizer`, which is
-# equivalent to combining the “hashing trick” implemented by the
+# equivalent to combining the "hashing trick" implemented by the
 # :func:`~sklearn.feature_extraction.FeatureHasher` class and the text
 # preprocessing and tokenization of the
 # :func:`~sklearn.feature_extraction.text.CountVectorizer`.
@@ -322,15 +322,15 @@ def n_nonzero_columns(X):
 # TfidfVectorizer
 # ---------------
 #
-# In a large text corpus, some words appear with higher frequency (e.g. “the”,
-# “a”, “is” in English) and do not carry meaningful information about the actual
+# In a large text corpus, some words appear with higher frequency (e.g. "the",
+# "a", "is" in English) and do not carry meaningful information about the actual
 # contents of a document. If we were to feed the word count data directly to a
 # classifier, those very common terms would shadow the frequencies of rarer yet
 # more informative terms. In order to re-weight the count features into floating
 # point values suitable for usage by a classifier it is very common to use the
-# tf–idf transform as implemented by the
+# tf-idf transform as implemented by the
 # :func:`~sklearn.feature_extraction.text.TfidfTransformer`. TF stands for
-# "term-frequency" while "tf–idf" means term-frequency times inverse
+# "term-frequency" while "tf-idf" means term-frequency times inverse
 # document-frequency.
 #
 # We now benchmark the :func:`~sklearn.feature_extraction.text.TfidfVectorizer`,

Original file line number	Diff line number	Diff line change
`@@ -245,4 +245,4 @@`
`245`	`245`	`#`
`246`	`246`	# .. [1] :doi:`"Lasso-type recovery of sparse representations for
`247`	`247`	`# high-dimensional data" N. Meinshausen, B. Yu - The Annals of Statistics`
`248`		-# 2009, Vol. 37, No. 1, 246–270 <10.1214/07-AOS582>`
	`248`	+# 2009, Vol. 37, No. 1, 246-270 <10.1214/07-AOS582>`