@@ -1147,12 +1147,13 @@ class TfidfTransformer(BaseEstimator, TransformerMixin):
1147
1147
corpus.
1148
1148
1149
1149
The formula that is used to compute the tf-idf for a term t of a document d
1150
- tf-idf(t, d) = tf(t, d) * idf(t), and the idf is computed as idf(t) = log [
1151
- n / df(t) ] + 1 (if ``smooth_idf=False``), where n is the total number of
1152
- documents and df(t) is the document frequency of t; the document frequency
1153
- is the number of documents that contain the term t. The effect of adding
1154
- "1" to the idf in the equation above is that terms with zero idf, i.e.,
1155
- terms that occur in all documents in a training set, will not be entirely
1150
+ in a document set is tf-idf(t, d) = tf(t, d) * idf(t), and the idf is
1151
+ computed as idf(t) = log [ n / df(t) ] + 1 (if ``smooth_idf=False``), where
1152
+ n is the total number of documents in the document set and df(t) is the
1153
+ document frequency of t; the document frequency is the number of documents
1154
+ in the document set that contain the term t. The effect of adding "1" to
1155
+ the idf in the equation above is that terms with zero idf, i.e., terms
1156
+ that occur in all documents in a training set, will not be entirely
1156
1157
ignored.
1157
1158
(Note that the idf formula above differs from the standard textbook
1158
1159
notation that defines the idf as
0 commit comments