-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
DOC Update plot_column_transformer to notebook style #17028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -39,95 +33,130 @@ | |||
from sklearn.compose import ColumnTransformer | |||
from sklearn.svm import LinearSVC | |||
|
|||
############################################################################## |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @lucyleeow there is a discussion about changing this syntax starting from 0.23 (#17068). Maybe, keep an eye on it and be ready to change this one if it is merged. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, easy change!
Looks good. I was a bit confused that the results change but then saw you also change the newsgroups. Maybe @ogrisel wants to have a look, he might have written the original? |
Thanks @amueller!
This was because I wanted to print an example of a sample but the posts in 'alt.atheism' and 'talk.religion.misc' were a little controversial... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much nicer. Thank you @lucyleeow !
############################################################################## | ||
# We will also create a transformer that extracts the | ||
# length of the text and the number of sentences. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: do we need two new lines here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember why I put 2 lines now. Lint complains if it's only 1.
return [{'length': len(text), | ||
'num_sentences': text.count('.')} | ||
for text in posts] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Do we need two new lines here?
for text in posts] | ||
|
||
|
||
TextStats = FunctionTransformer(text_stats) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is an instance:
TextStats = FunctionTransformer(text_stats) | |
text_stats_transformer = FunctionTransformer(text_stats) |
('body_stats', Pipeline([ | ||
('stats', TextStats()), # returns a list of dicts | ||
('stats', TextStats), # returns a list of dicts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
('stats', TextStats), # returns a list of dicts | |
('stats', text_stats_transformer), # returns a list of dicts |
|
||
SubjectBodyExtractor = FunctionTransformer(subject_body_extractor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is an instance:
SubjectBodyExtractor = FunctionTransformer(subject_body_extractor) | |
subject_body_extractor = FunctionTransformer(subject_body_extractor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subject_body_transformer
so it's not the same name as the function? Not that this matters, though if we make it the same name we should also do this for text_stats_transformer
to be consistent.
|
||
pipeline = Pipeline([ | ||
# Extract subject & body | ||
('subjectbody', SubjectBodyExtractor), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
('subjectbody', SubjectBodyExtractor), | |
('subjectbody', subject_body_extractor), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @lucyleeow
return_X_y=True) | ||
############################################################################## | ||
# Finally, we fit our pipeline on the training data and use it to predict | ||
# topics for ``X_test``. Performance metrics of our pipeline is then printed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# topics for ``X_test``. Performance metrics of our pipeline is then printed. | |
# topics for ``X_test``. Performance metrics of our pipeline are then printed. |
Reference Issues/PRs
None
What does this implement/fix? Explain your changes.
plot_column_transformer.py
to use notebook styleFunctionTransformer
as stateless transformers defined hereAny other comments?