Skip to content

DOC example for feature_extraction.text.TfidfTransformer #15199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

federicopisanu
Copy link
Contributor

@federicopisanu federicopisanu commented Oct 12, 2019

Reference Issue

Addresses #3846

What does this implement/fix? Explain your changes.

Adds examples for feature_extraction.text.TfidfTransformer.

Any other comments?

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @federicopisanu for the PR!

@@ -1342,6 +1342,34 @@ class TfidfTransformer(TransformerMixin, BaseEstimator):
The inverse document frequency (IDF) vector; only defined
if ``use_idf`` is True.

Examples
--------
>>> from sklearn.feature_extraction.text import TfidfTransformer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use a pipeline to with CountVectorizer to create the count matrix with a custom vocabulary:

from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline
import numpy as np
corpus = ['this is the first document',
          'this document is the second document',
          'and this is the third one',
          'is this the first document']
vocabulary = ['this', 'document', 'first', 'is', 'second', 'the',
              'and', 'one']
pipe = Pipeline([('count', CountVectorizer(vocabulary=vocabulary)), 
                  ('tfid', TfidfTransformer())])
pipe.fit(corpus)
pipe['count'].transform(corpus).toarray()
pipe['tfid'].idf_
pipe.transform(corpus).shape

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sure! I'll work on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's done!

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @federicopisanu !

@rth
Copy link
Member

rth commented Oct 13, 2019

Merging with an implicit +1 from Thomas above.

@rth rth changed the title [MRG] DOC example for feature_extraction.text.TfidfTransformer DOC example for feature_extraction.text.TfidfTransformer Oct 13, 2019
@rth rth merged commit 1edecd6 into scikit-learn:master Oct 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants