DOC: change CountVectorizer(...lambda..) to OneHotEncoder() in ColumnTransformer examples #13212

maikia · 2019-02-21T13:31:05Z

What does this implement/fix? Explain your changes.

Replaced the CountVectorizer(analyzer=lambda x: [x]) (workaround to do one-hot encoding using the CountVectorizer) with OneHotEncoder(), now this supports string features and get_feature_names.

jorisvandenbossche · 2019-02-21T13:33:06Z

doc/modules/compose.rst

  ...      ('title_bow', CountVectorizer(), 'title')],
  ...     remainder='drop')

+column_trans.fit(X) 
+column_trans.get_feature_names()
+column_trans.transform(X).toarray()


I suppose this is a left-over that can be removed

jorisvandenbossche

Looks good to me!

glemaitre · 2019-02-21T18:10:27Z

doc/modules/compose.rst

  >>> column_trans = ColumnTransformer(
-  ...     [('city_category', CountVectorizer(analyzer=lambda x: [x]), 'city'),
+  ...     [('city_category', OneHotEncoder(dtype='int'),(['city'])),


Suggested change

... [('city_category', OneHotEncoder(dtype='int'),(['city'])),

... [('city_category', OneHotEncoder(dtype='int'), ['city']),

@maikia Could accept this suggestion as well. I forgot to add it to the previous one.

glemaitre

Two small changes for PEP8 and removing unnecessary parenthesis.

doc/modules/compose.rst

jnothman · 2019-02-22T01:02:32Z

doc/modules/compose.rst

-input and therefore the columns were specified as a string (``'city'``).
-However, other transformers generally expect 2D data, and in that case you need
+input and therefore the columns were specified as a string (``'title'``).
+However, :class:`preprocessing.OneHotEncoder <sklearn.preprocessing.OneHotEncoder>`


You don't need the stuff in the angle brackets

It was there originally. @jorisvandenbossche reading those lines yesterday I did not why you used this syntax? Any reasons?

The reason it is written like this is because the "currentmodule" of sphinx is sklearn.pipeline, so I think we need to use the full path to refer to sklearn.preprocessing.OneHotEncoder. What this then does is shorten that in the display to preprocessing.OneHotEncoder (the functionality to link to its docstring page of course stays the same).

Whether this complexity is worth it, that's another question :)

But to be consistent with how the rest of the text is, it can be replaced here with :class:`~sklearn.preprocessing.OneHotEncode` (which will just display OneHotEncoder)

Two small changes for PEP8 and removing unnecessary parenthesis. Co-Authored-By: maikia <maja_ka@hotmail.com>

qinhanmin2014

LGTM, thanks @maikia

glemaitre · 2019-02-22T15:19:01Z

Thanks, @maikia!!!

…ransformer examples (scikit-learn#13212)

… ColumnTransformer examples (scikit-learn#13212)" This reverts commit 1db3bbb.

…ransformer examples (scikit-learn#13212)

changed CountVectorizer(...lambda..) to OneHotEncoder() in few examples

6588899

jorisvandenbossche reviewed Feb 21, 2019

View reviewed changes

removed some leftover useless lines

5d2ea49

jorisvandenbossche changed the title ~~changed CountVectorizer(...lambda..) to OneHotEncoder() in few examples of ColumnTransformer~~ DOC: change CountVectorizer(...lambda..) to OneHotEncoder() in ColumnTransformer examples Feb 21, 2019

some white spaces removed, lines cut shorter

aa7e916

jorisvandenbossche approved these changes Feb 21, 2019

View reviewed changes

glemaitre reviewed Feb 21, 2019

View reviewed changes

glemaitre requested changes Feb 21, 2019

View reviewed changes

doc/modules/compose.rst Outdated Show resolved Hide resolved

jnothman reviewed Feb 22, 2019

View reviewed changes

glemaitre and others added 2 commits February 22, 2019 11:31

Update doc/modules/compose.rst

77d676d

Two small changes for PEP8 and removing unnecessary parenthesis. Co-Authored-By: maikia <maja_ka@hotmail.com>

unnecessary parantheses removed

0f00c1e

glemaitre approved these changes Feb 22, 2019

View reviewed changes

qinhanmin2014 approved these changes Feb 22, 2019

View reviewed changes

qinhanmin2014 merged commit 2480368 into scikit-learn:master Feb 22, 2019

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Feb 24, 2019

DOC Change CountVectorizer(...lambda..) to OneHotEncoder() in ColumnT…

6cd75a5

…ransformer examples (scikit-learn#13212)

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

DOC Change CountVectorizer(...lambda..) to OneHotEncoder() in ColumnT…

1db3bbb

…ransformer examples (scikit-learn#13212)

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "DOC Change CountVectorizer(...lambda..) to OneHotEncoder() in…

bc80da3

… ColumnTransformer examples (scikit-learn#13212)" This reverts commit 1db3bbb.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "DOC Change CountVectorizer(...lambda..) to OneHotEncoder() in…

51c615b

… ColumnTransformer examples (scikit-learn#13212)" This reverts commit 1db3bbb.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

DOC Change CountVectorizer(...lambda..) to OneHotEncoder() in ColumnT…

ec0f65e

…ransformer examples (scikit-learn#13212)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: change CountVectorizer(...lambda..) to OneHotEncoder() in ColumnTransformer examples #13212

DOC: change CountVectorizer(...lambda..) to OneHotEncoder() in ColumnTransformer examples #13212

maikia commented Feb 21, 2019 •

edited by jorisvandenbossche

Loading

jorisvandenbossche Feb 21, 2019

maikia Feb 21, 2019

jorisvandenbossche left a comment

glemaitre Feb 21, 2019

glemaitre Feb 22, 2019

glemaitre left a comment

jnothman Feb 22, 2019

glemaitre Feb 22, 2019

jorisvandenbossche Feb 22, 2019

qinhanmin2014 left a comment

glemaitre commented Feb 22, 2019

	... [('city_category', OneHotEncoder(dtype='int'),(['city'])),
	... [('city_category', OneHotEncoder(dtype='int'), ['city']),

DOC: change CountVectorizer(...lambda..) to OneHotEncoder() in ColumnTransformer examples #13212

DOC: change CountVectorizer(...lambda..) to OneHotEncoder() in ColumnTransformer examples #13212

Conversation

maikia commented Feb 21, 2019 • edited by jorisvandenbossche Loading

What does this implement/fix? Explain your changes.

jorisvandenbossche Feb 21, 2019

Choose a reason for hiding this comment

maikia Feb 21, 2019

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

glemaitre Feb 21, 2019

Choose a reason for hiding this comment

glemaitre Feb 22, 2019

Choose a reason for hiding this comment

glemaitre left a comment

Choose a reason for hiding this comment

jnothman Feb 22, 2019

Choose a reason for hiding this comment

glemaitre Feb 22, 2019

Choose a reason for hiding this comment

jorisvandenbossche Feb 22, 2019

Choose a reason for hiding this comment

qinhanmin2014 left a comment

Choose a reason for hiding this comment

glemaitre commented Feb 22, 2019

maikia commented Feb 21, 2019 •

edited by jorisvandenbossche

Loading