-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
[MRG+2] chore(make_union): add n_jobs to make_union through kwargs #8031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+2] chore(make_union): add n_jobs to make_union through kwargs #8031
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise, this is looking good. thanks for the neat test.
@@ -792,6 +790,7 @@ def make_union(*transformers): | |||
Examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should really have a Parameters section here in the docstring.
pca = PCA(svd_solver='full') | ||
mock = Transf() | ||
fu = make_union(pca, mock, n_jobs=3) | ||
assert_equal(3, fu.n_jobs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might, for completeness, check that fu.transformers == make_union(pca, mock).transformers
but this is fine as is.
@@ -787,6 +787,23 @@ def make_union(*transformers, **kwargs): | |||
and does not permit, naming the transformers. Instead, they will be given | |||
names automatically based on their types. It also does not allow weighting. | |||
|
|||
Parameters | |||
---------- | |||
transformer_list : list of (string, transformer) tuples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct. You want something like. *transformers : list of estimators
n_jobs : int, optional | ||
Number of jobs to run in parallel (default 1). | ||
|
||
transformer_weights : dict, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably leave this out, given that transformer names are automatically assigned, so it only really adds confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks!
@@ -418,6 +418,14 @@ def test_make_union(): | |||
assert_equal(transformers, (pca, mock)) | |||
|
|||
|
|||
def test_make_union_kwargs(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a test that if an invalid **kwarg
is given, there is an error? Otherwise LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, added an extra assertion that raises a TypeError. Thanks!
fu = make_union(pca, mock, n_jobs=3) | ||
assert_equal(fu.transformer_list, make_union(pca, mock).transformer_list) | ||
assert_equal(3, fu.n_jobs) | ||
assert_raises(TypeError, make_union,pca, mock, invalidFakeKwarg=42) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use assert_raise_message
to check if the error message is helpful but it's also fine as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thinking, the error message is helpful so I will go that route instead.
# invalid keyword parameters should raise an error message | ||
assert_raise_message( | ||
TypeError, | ||
"__init__() got an unexpected keyword argument 'invalidFakeKwarg'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks :)
LGTM! |
""" | ||
return FeatureUnion(_name_estimators(transformers)) | ||
return FeatureUnion(_name_estimators(transformers), **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should check, until we decide, perhaps, to implement some kind of transformer_weights
, that only n_jobs
is passed in. Sometimes it's better to be conservative. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So error here if something is given that would be accepted by FeatureUnion
but could be garbled? I'm not entirely sure what the risks are, but I'm happy to play it safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of being conservative, @jnothman
Maybe something like the following (but perhaps with a KeyError instead)?
def make_union(*transformers, **kwargs):
"""Construct a FeatureUnion from the given transformers.
This is a shorthand for the FeatureUnion constructor; it does not require,
and does not permit, naming the transformers. Instead, they will be given
names automatically based on their types. It also does not allow weighting.
Parameters
----------
*transformers : list of estimators
n_jobs : int, optional
Number of jobs to run in parallel (default 1).
Returns
-------
f : FeatureUnion
Examples
--------
>>> from sklearn.decomposition import PCA, TruncatedSVD
>>> from sklearn.pipeline import make_union
>>> make_union(PCA(), TruncatedSVD()) # doctest: +NORMALIZE_WHITESPACE
FeatureUnion(n_jobs=1,
transformer_list=[('pca',
PCA(copy=True, iterated_power='auto',
n_components=None, random_state=None,
svd_solver='auto', tol=0.0, whiten=False)),
('truncatedsvd',
TruncatedSVD(algorithm='randomized',
n_components=2, n_iter=5,
random_state=None, tol=0.0))],
transformer_weights=None)
"""
if 'n_jobs' in kwargs:
return FeatureUnion(_name_estimators(transformers), **kwargs)
elif kwargs:
raise NotImplementedError
else:
return FeatureUnion(_name_estimators(transformers))
This gives us the following output:
In [9]: make_union(PCA(), TruncatedSVD())
Out[9]:
FeatureUnion(n_jobs=1,
transformer_list=[('pca', PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,
svd_solver='auto', tol=0.0, whiten=False)), ('truncatedsvd', TruncatedSVD(algorithm='randomized', n_components=2, n_iter=5,
random_state=None, tol=0.0))],
transformer_weights=None)
In [10]: make_union(PCA(), TruncatedSVD(), n_jobs=4)
Out[10]:
FeatureUnion(n_jobs=4,
transformer_list=[('pca', PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,
svd_solver='auto', tol=0.0, whiten=False)), ('truncatedsvd', TruncatedSVD(algorithm='randomized', n_components=2, n_iter=5,
random_state=None, tol=0.0))],
transformer_weights=None)
In [11]: make_union(PCA(), TruncatedSVD(), anyOtherArg=3)
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-11-87bdeace7aa2> in <module>()
----> 1 make_union(PCA(), TruncatedSVD(), anyOtherArg=3)
/Users/acb/Documents/DataSci/contrib/scikit/scikit-learn/sklearn/pipeline.py in make_union(*transformers, **kwargs)
818 return FeatureUnion(_name_estimators(transformers), **kwargs)
819 elif kwargs:
--> 820 raise NotImplementedError
821 else:
822 return FeatureUnion(_name_estimators(transformers))
NotImplementedError:
Thank you both for all your input on this 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want something more like
n_jobs = kwargs.pop('n_jobs', None)
if kwargs:
# We do not currently support `transformer_weights` as we may want to change its type spec in make_union
raise TypeError('Unknown keyword arguments: {}'.format(kwargs.keys()))
FeatureUnion(_name_estimators(transformers), n_jobs=n_jobs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sounds good. Would we want the default n_jobs to be 1
instead of None
to keep it consistent?
yes, 1
…On 15 December 2016 at 12:22, alexandercbooth ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/pipeline.py
<#8031>:
> """
- return FeatureUnion(_name_estimators(transformers))
+ return FeatureUnion(_name_estimators(transformers), **kwargs)
OK, sounds good. Would we want the default n_jobs to be 1 instead of None
to keep it consistent?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8031>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AAEz6xzyQrdggBDIEvEkZK-FLVVCowhkks5rIJY6gaJpZM4LJkCc>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also include transformer_weights in the test. Otherwise, this LGTM.
That's a good point as the most likely kwarg people might try there is |
Please add an Enhancements entry into whats_new.rst. Thanks!! |
Thanks! |
In the future, please use "Fixes #xxxx" in the PR description so the issue closes automatically when PR is merged. |
Will do, sincere thanks for all you help, @jnothman! |
Addresses #8028 by adding kwargs to make_union.