-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC: change CountVectorizer(...lambda..) to OneHotEncoder() in ColumnTransformer examples #13212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
doc/modules/compose.rst
Outdated
... ('title_bow', CountVectorizer(), 'title')], | ||
... remainder='drop') | ||
|
||
column_trans.fit(X) | ||
column_trans.get_feature_names() | ||
column_trans.transform(X).toarray() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is a left-over that can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup. done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
doc/modules/compose.rst
Outdated
>>> column_trans = ColumnTransformer( | ||
... [('city_category', CountVectorizer(analyzer=lambda x: [x]), 'city'), | ||
... [('city_category', OneHotEncoder(dtype='int'),(['city'])), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... [('city_category', OneHotEncoder(dtype='int'),(['city'])), | |
... [('city_category', OneHotEncoder(dtype='int'), ['city']), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maikia Could accept this suggestion as well. I forgot to add it to the previous one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small changes for PEP8 and removing unnecessary parenthesis.
input and therefore the columns were specified as a string (``'city'``). | ||
However, other transformers generally expect 2D data, and in that case you need | ||
input and therefore the columns were specified as a string (``'title'``). | ||
However, :class:`preprocessing.OneHotEncoder <sklearn.preprocessing.OneHotEncoder>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the stuff in the angle brackets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was there originally. @jorisvandenbossche reading those lines yesterday I did not why you used this syntax? Any reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason it is written like this is because the "currentmodule" of sphinx is sklearn.pipeline
, so I think we need to use the full path to refer to sklearn.preprocessing.OneHotEncoder
. What this then does is shorten that in the display to preprocessing.OneHotEncoder
(the functionality to link to its docstring page of course stays the same).
Whether this complexity is worth it, that's another question :)
But to be consistent with how the rest of the text is, it can be replaced here with :class:`~sklearn.preprocessing.OneHotEncode`
(which will just display OneHotEncoder
)
Two small changes for PEP8 and removing unnecessary parenthesis. Co-Authored-By: maikia <maja_ka@hotmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @maikia
Thanks, @maikia!!! |
…ransformer examples (scikit-learn#13212)
…ransformer examples (scikit-learn#13212)
… ColumnTransformer examples (scikit-learn#13212)" This reverts commit 1db3bbb.
… ColumnTransformer examples (scikit-learn#13212)" This reverts commit 1db3bbb.
…ransformer examples (scikit-learn#13212)
What does this implement/fix? Explain your changes.
Replaced the
CountVectorizer(analyzer=lambda x: [x])
(workaround to do one-hot encoding using the CountVectorizer) withOneHotEncoder()
, now this supports string features andget_feature_names
.