DOC example for feature_extraction.text.TfidfTransformer #15199

federicopisanu · 2019-10-12T10:35:25Z

Reference Issue

Addresses #3846

What does this implement/fix? Explain your changes.

Adds examples for feature_extraction.text.TfidfTransformer.

Any other comments?

thomasjpfan

Thank you @federicopisanu for the PR!

thomasjpfan · 2019-10-12T18:56:11Z

sklearn/feature_extraction/text.py

@@ -1342,6 +1342,34 @@ class TfidfTransformer(TransformerMixin, BaseEstimator):
        The inverse document frequency (IDF) vector; only defined
        if  ``use_idf`` is True.

+    Examples
+    --------
+    >>> from sklearn.feature_extraction.text import TfidfTransformer


We can use a pipeline to with CountVectorizer to create the count matrix with a custom vocabulary:

from sklearn.feature_extraction.text import TfidfTransformer from sklearn.feature_extraction.text import CountVectorizer from sklearn.pipeline import Pipeline import numpy as np corpus = ['this is the first document', 'this document is the second document', 'and this is the third one', 'is this the first document'] vocabulary = ['this', 'document', 'first', 'is', 'second', 'the', 'and', 'one'] pipe = Pipeline([('count', CountVectorizer(vocabulary=vocabulary)), ('tfid', TfidfTransformer())]) pipe.fit(corpus) pipe['count'].transform(corpus).toarray() pipe['tfid'].idf_ pipe.transform(corpus).shape

Yeah, sure! I'll work on this.

I think it's done!

rth

LGTM, thanks @federicopisanu !

rth · 2019-10-13T12:07:37Z

Merging with an implicit +1 from Thomas above.

federicopisanu and others added 2 commits October 12, 2019 12:18

added exaple for feature_extraction.text.TfidfTransformer

e225eb6

fix for CircleCI test

b30ab8a

thomasjpfan reviewed Oct 12, 2019

View reviewed changes

federicopisanu and others added 5 commits October 12, 2019 23:35

Pipeline with CountVectorizer used for the TfidfTransfomer example

a595bdc

fix for CircleCI

5c9b7e4

more fixes for CircleCI test

d7b27b3

more fixes for CircleCI test

39c2657

fixes for azure-pipepines test

2bada62

rth approved these changes Oct 13, 2019

View reviewed changes

rth changed the title ~~[MRG] DOC example for feature_extraction.text.TfidfTransformer~~ DOC example for feature_extraction.text.TfidfTransformer Oct 13, 2019

rth merged commit 1edecd6 into scikit-learn:master Oct 13, 2019

he7d3r mentioned this pull request Oct 6, 2021

DOC Remove unused import from example #21253

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC example for feature_extraction.text.TfidfTransformer #15199

DOC example for feature_extraction.text.TfidfTransformer #15199

federicopisanu commented Oct 12, 2019 •

edited

Loading

thomasjpfan left a comment

thomasjpfan Oct 12, 2019

federicopisanu Oct 12, 2019

federicopisanu Oct 13, 2019

rth left a comment

rth commented Oct 13, 2019

DOC example for feature_extraction.text.TfidfTransformer #15199

DOC example for feature_extraction.text.TfidfTransformer #15199

Conversation

federicopisanu commented Oct 12, 2019 • edited Loading

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

thomasjpfan left a comment

Choose a reason for hiding this comment

thomasjpfan Oct 12, 2019

Choose a reason for hiding this comment

federicopisanu Oct 12, 2019

Choose a reason for hiding this comment

federicopisanu Oct 13, 2019

Choose a reason for hiding this comment

rth left a comment

Choose a reason for hiding this comment

rth commented Oct 13, 2019

federicopisanu commented Oct 12, 2019 •

edited

Loading