-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC example for feature_extraction.text.TfidfTransformer #15199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC example for feature_extraction.text.TfidfTransformer #15199
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @federicopisanu for the PR!
@@ -1342,6 +1342,34 @@ class TfidfTransformer(TransformerMixin, BaseEstimator): | |||
The inverse document frequency (IDF) vector; only defined | |||
if ``use_idf`` is True. | |||
|
|||
Examples | |||
-------- | |||
>>> from sklearn.feature_extraction.text import TfidfTransformer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use a pipeline to with CountVectorizer
to create the count matrix with a custom vocabulary:
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline
import numpy as np
corpus = ['this is the first document',
'this document is the second document',
'and this is the third one',
'is this the first document']
vocabulary = ['this', 'document', 'first', 'is', 'second', 'the',
'and', 'one']
pipe = Pipeline([('count', CountVectorizer(vocabulary=vocabulary)),
('tfid', TfidfTransformer())])
pipe.fit(corpus)
pipe['count'].transform(corpus).toarray()
pipe['tfid'].idf_
pipe.transform(corpus).shape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sure! I'll work on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @federicopisanu !
Merging with an implicit +1 from Thomas above. |
Reference Issue
Addresses #3846
What does this implement/fix? Explain your changes.
Adds examples for feature_extraction.text.TfidfTransformer.
Any other comments?