Skip to content

Conversation

lucyleeow
Copy link
Member

Reference Issues/PRs

None

What does this implement/fix? Explain your changes.

  • Update plot_column_transformer.py to use notebook style
  • Use FunctionTransformer as stateless transformers defined here

Any other comments?

@lucyleeow lucyleeow changed the title WIP DOC Update plot_column_transformer to notebook style DOC Update plot_column_transformer to notebook style Apr 25, 2020
@@ -39,95 +33,130 @@
from sklearn.compose import ColumnTransformer
from sklearn.svm import LinearSVC

##############################################################################
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lucyleeow there is a discussion about changing this syntax starting from 0.23 (#17068). Maybe, keep an eye on it and be ready to change this one if it is merged. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, easy change!

@amueller
Copy link
Member

Looks good. I was a bit confused that the results change but then saw you also change the newsgroups. Maybe @ogrisel wants to have a look, he might have written the original?

@lucyleeow
Copy link
Member Author

Thanks @amueller!

Looks good. I was a bit confused that the results change but then saw you also change the newsgroups

This was because I wanted to print an example of a sample but the posts in 'alt.atheism' and 'talk.religion.misc' were a little controversial...

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much nicer. Thank you @lucyleeow !

##############################################################################
# We will also create a transformer that extracts the
# length of the text and the number of sentences.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: do we need two new lines here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember why I put 2 lines now. Lint complains if it's only 1.

return [{'length': len(text),
'num_sentences': text.count('.')}
for text in posts]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Do we need two new lines here?

for text in posts]


TextStats = FunctionTransformer(text_stats)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an instance:

Suggested change
TextStats = FunctionTransformer(text_stats)
text_stats_transformer = FunctionTransformer(text_stats)

('body_stats', Pipeline([
('stats', TextStats()), # returns a list of dicts
('stats', TextStats), # returns a list of dicts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
('stats', TextStats), # returns a list of dicts
('stats', text_stats_transformer), # returns a list of dicts


SubjectBodyExtractor = FunctionTransformer(subject_body_extractor)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an instance:

Suggested change
SubjectBodyExtractor = FunctionTransformer(subject_body_extractor)
subject_body_extractor = FunctionTransformer(subject_body_extractor)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subject_body_transformer so it's not the same name as the function? Not that this matters, though if we make it the same name we should also do this for text_stats_transformer to be consistent.


pipeline = Pipeline([
# Extract subject & body
('subjectbody', SubjectBodyExtractor),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
('subjectbody', SubjectBodyExtractor),
('subjectbody', subject_body_extractor),

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @lucyleeow

return_X_y=True)
##############################################################################
# Finally, we fit our pipeline on the training data and use it to predict
# topics for ``X_test``. Performance metrics of our pipeline is then printed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# topics for ``X_test``. Performance metrics of our pipeline is then printed.
# topics for ``X_test``. Performance metrics of our pipeline are then printed.

@thomasjpfan thomasjpfan merged commit a497523 into scikit-learn:master May 17, 2020
adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this pull request May 18, 2020
viclafargue pushed a commit to viclafargue/scikit-learn that referenced this pull request Jun 26, 2020
@lucyleeow lucyleeow deleted the plot_col_trans branch October 21, 2023 03:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants