Skip to content

Commit cf42aa8

Browse files
committed
Minor rewording.
1 parent 9424115 commit cf42aa8

File tree

1 file changed

+7
-6
lines changed

1 file changed

+7
-6
lines changed

sklearn/feature_extraction/text.py

+7-6
Original file line numberDiff line numberDiff line change
@@ -1147,12 +1147,13 @@ class TfidfTransformer(BaseEstimator, TransformerMixin):
11471147
corpus.
11481148
11491149
The formula that is used to compute the tf-idf for a term t of a document d
1150-
tf-idf(t, d) = tf(t, d) * idf(t), and the idf is computed as idf(t) = log [
1151-
n / df(t) ] + 1 (if ``smooth_idf=False``), where n is the total number of
1152-
documents and df(t) is the document frequency of t; the document frequency
1153-
is the number of documents that contain the term t. The effect of adding
1154-
"1" to the idf in the equation above is that terms with zero idf, i.e.,
1155-
terms that occur in all documents in a training set, will not be entirely
1150+
in a document set is tf-idf(t, d) = tf(t, d) * idf(t), and the idf is
1151+
computed as idf(t) = log [ n / df(t) ] + 1 (if ``smooth_idf=False``), where
1152+
n is the total number of documents in the document set and df(t) is the
1153+
document frequency of t; the document frequency is the number of documents
1154+
in the document set that contain the term t. The effect of adding "1" to
1155+
the idf in the equation above is that terms with zero idf, i.e., terms
1156+
that occur in all documents in a training set, will not be entirely
11561157
ignored.
11571158
(Note that the idf formula above differs from the standard textbook
11581159
notation that defines the idf as

0 commit comments

Comments
 (0)