-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Add experimental.ColumnTransformer #9012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jnothman
merged 90 commits into
scikit-learn:master
from
jorisvandenbossche:amueller/heterogeneous_feature_union
May 29, 2018
Merged
Changes from all commits
Commits
Show all changes
90 commits
Select commit
Hold shift + click to select a range
1937d56
add heterogeneous ColumnTransformer
amueller 95bf6cb
Merge remote-tracking branch 'upstream/master' into amueller/heteroge…
jorisvandenbossche 914ba53
Get tests/examples working with current sklearn
jorisvandenbossche 2333e61
Add support for numpy arrays and positional columns in dataframes as …
jorisvandenbossche 464f7e6
add support for selecting multiple columns
jorisvandenbossche 7777e2a
doc corrections
jorisvandenbossche 42ce18c
Change to tuples instead of dict
jorisvandenbossche 4a55b9b
Reimplement as subclass of FeatureUnion
jorisvandenbossche 55a5372
Fix-ups and move tests
jorisvandenbossche 74d0639
update docs
jorisvandenbossche b6883b9
Support selecting multiple columns from dict + ensure passed subset i…
jorisvandenbossche 1c4f09b
Also support slices for positional subsets
jorisvandenbossche 7cef7df
Fix 2d dict items case
jorisvandenbossche 6ceed19
Refactor column selection based on discussion
jorisvandenbossche e19e3c1
clean-up + add more tests
jorisvandenbossche 0116ac9
Merge remote-tracking branch 'upstream/master' into amueller/heteroge…
jorisvandenbossche c7ea079
Nuke swiss army knife (no dict/recarray support)
jorisvandenbossche acff9dd
Add catch/reraise error with custom message
jorisvandenbossche 4db243c
update docs
jorisvandenbossche 6ab49a8
undo changes to utils
jorisvandenbossche 2dda954
Move to experimental module
jorisvandenbossche 0d0107f
fixup move to experimental
jorisvandenbossche 267ca85
Merge remote-tracking branch 'upstream/master' into amueller/heteroge…
jorisvandenbossche 0c7b0d7
Move docs
jorisvandenbossche c711b55
add support for boolean masks
jorisvandenbossche 0cb9770
Add make_column_transformer factory function
jorisvandenbossche 9d24bb1
doc fixups
jorisvandenbossche 11a5c0c
feedback
jorisvandenbossche a8efeeb
skip feature_extraction docs if pandas not installed
jorisvandenbossche 20976b1
fix doctests + pep8
jorisvandenbossche e71a390
Merge remote-tracking branch 'upstream/master' into amueller/heteroge…
jorisvandenbossche 406b2a9
add to sklearn/setup.py
jorisvandenbossche ae12bbc
feedback
jorisvandenbossche 70ed541
Merge remote-tracking branch 'upstream/master' into amueller/heteroge…
jorisvandenbossche 16bfae5
possible fix for get_params / set_params
jorisvandenbossche 7ff02a4
Merge remote-tracking branch 'upstream/master' into amueller/heteroge…
jorisvandenbossche a753833
updates for feedback
jorisvandenbossche bb4d721
Don't subclass FeatureUnion + clone passed transformers
jorisvandenbossche 493116f
add named_transformers_ attribute
jorisvandenbossche a33ad8c
add test that confirms that transformers now actually get cloned
jorisvandenbossche 18b814d
Merge remote-tracking branch 'upstream/master' into amueller/heteroge…
jorisvandenbossche 6cedbd7
added some more tests
jorisvandenbossche 0229e5b
doc feedback guillaume
jorisvandenbossche f9d95eb
Merge remote-tracking branch 'origin/master' into amueller/heterogene…
glemaitre ca1647e
Solve the issue introduce by git during merging
glemaitre 0707319
Addess Joel comments
glemaitre 88ac893
remove validation from init
glemaitre 91a5312
correct comment in example
glemaitre deb3b78
Do not modify transformer in init
glemaitre a6d7b77
Factorize _fit_* functions
glemaitre d287420
minor updates based on feedback
jorisvandenbossche 2920912
Merge remote-tracking branch 'upstream/master' into amueller/heteroge…
jorisvandenbossche 7b1ce95
refactor try except block to single helper function
jorisvandenbossche db9b2de
Merge remote-tracking branch 'upstream/master' into amueller/heteroge…
jorisvandenbossche e6d81af
move whatsnew + fix bad merge
jorisvandenbossche 733b111
add passthrough kwarg
jorisvandenbossche 6d639f0
Merge remote-tracking branch 'upstream/master' into amueller/heteroge…
jorisvandenbossche 8d142fd
fixup basic passthrough implementation and tests
jorisvandenbossche af257e0
fix doctest
jorisvandenbossche 6705233
use pytest setup to skip docs if no pandas
jorisvandenbossche 2b591e4
Merge remote-tracking branch 'upstream/master' into amueller/heteroge…
jorisvandenbossche 00aef88
move doc fixture to common conftest.py for docs
jorisvandenbossche 9c2df9c
poc of passthrough=True
jorisvandenbossche 4463fa7
Merge remote-tracking branch 'upstream/master' into amueller/heteroge…
jorisvandenbossche 8d6e034
Update make_column_transformer to accept tuples instead of dict
jorisvandenbossche 82a5697
some clean-up
jorisvandenbossche 04cf4ff
more thoroughly test + fix passthrough
jorisvandenbossche db2eabd
add test to cover check of transformers
jorisvandenbossche c402fb2
feedback Joel
jorisvandenbossche 9ae7753
add note on None transformer and 'remainder'
jorisvandenbossche c222101
small update to the tests
jorisvandenbossche 26bf288
Merge remote-tracking branch 'upstream/master' into amueller/heteroge…
jorisvandenbossche 8386fae
Merge remote-tracking branch 'upstream/master' into amueller/heteroge…
jorisvandenbossche 28840ad
flake8
jorisvandenbossche 14c7b1e
Merge remote-tracking branch 'upstream/master' into amueller/heteroge…
jorisvandenbossche 608ba9a
Move ColumnTransformer from experimental to compose
jorisvandenbossche 22c499c
fix sklearn/__init__.py
jorisvandenbossche 333f878
fixup remaining usage of experimental
jorisvandenbossche c3f8733
fix doctest example
jorisvandenbossche 4804cd8
switch transformers/columns order in make_column_transformer
jorisvandenbossche 3d3e772
Add special-cased 'drop' and 'passthrough'
jorisvandenbossche 3346268
Implement 'drop'/'passthrough' for remainder instead of passthrough k…
jorisvandenbossche 7ded77a
remainder -> unspecified
jorisvandenbossche 4835c29
fix doctests + remaining feedback Joel
jorisvandenbossche 04bcb1e
pep8
jorisvandenbossche 3d2a9bc
unspecified -> remainder
jorisvandenbossche afb7384
update for feedback
jorisvandenbossche d298fc3
switch default from 'drop' to 'passthrough' + add transformer ouput v…
jorisvandenbossche 4098928
Add NotImplementedError for get_feature_names if columns are passed t…
jorisvandenbossche 9ab27fb
move docs from feature_extraction.rst -> compose.rst
jorisvandenbossche File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we are going to be in the "column" namespace, where we support pandas dataframes, should we use a pandas dataframe in this example, rather than a object array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this example I didn't use pandas, as it seems a bit overhead (it would just be for temporarily putting the two columns in a frame to pass it to a next frame). But we certainly need another example with a pandas dataframe (eg with adults).
But can change it here as well if needed.