Skip to content

Commit a09a62e

Browse files
authored
DOC improve stop_words description w.r.t. max_df range in CountVectorizer (#25489)
1 parent 4c8813e commit a09a62e

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

sklearn/feature_extraction/text.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -996,9 +996,9 @@ class CountVectorizer(_VectorizerMixin, BaseEstimator):
996996
will be removed from the resulting tokens.
997997
Only applies if ``analyzer == 'word'``.
998998
999-
If None, no stop words will be used. max_df can be set to a value
1000-
in the range [0.7, 1.0) to automatically detect and filter stop
1001-
words based on intra corpus document frequency of terms.
999+
If None, no stop words will be used. In this case, setting `max_df`
1000+
to a higher value, such as in the range (0.7, 1.0), can automatically detect
1001+
and filter stop words based on intra corpus document frequency of terms.
10021002
10031003
token_pattern : str or None, default=r"(?u)\\b\\w\\w+\\b"
10041004
Regular expression denoting what constitutes a "token", only used
@@ -1833,9 +1833,9 @@ class TfidfVectorizer(CountVectorizer):
18331833
will be removed from the resulting tokens.
18341834
Only applies if ``analyzer == 'word'``.
18351835
1836-
If None, no stop words will be used. max_df can be set to a value
1837-
in the range [0.7, 1.0) to automatically detect and filter stop
1838-
words based on intra corpus document frequency of terms.
1836+
If None, no stop words will be used. In this case, setting `max_df`
1837+
to a higher value, such as in the range (0.7, 1.0), can automatically detect
1838+
and filter stop words based on intra corpus document frequency of terms.
18391839
18401840
token_pattern : str, default=r"(?u)\\b\\w\\w+\\b"
18411841
Regular expression denoting what constitutes a "token", only used

0 commit comments

Comments
 (0)