Skip to content

[MRG+2] chore(make_union): add n_jobs to make_union through kwargs #8031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Dec 15, 2016

Conversation

alexandercbooth
Copy link
Contributor

Addresses #8028 by adding kwargs to make_union.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise, this is looking good. thanks for the neat test.

@@ -792,6 +790,7 @@ def make_union(*transformers):
Examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really have a Parameters section here in the docstring.

pca = PCA(svd_solver='full')
mock = Transf()
fu = make_union(pca, mock, n_jobs=3)
assert_equal(3, fu.n_jobs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might, for completeness, check that fu.transformers == make_union(pca, mock).transformers but this is fine as is.

@@ -787,6 +787,23 @@ def make_union(*transformers, **kwargs):
and does not permit, naming the transformers. Instead, they will be given
names automatically based on their types. It also does not allow weighting.

Parameters
----------
transformer_list : list of (string, transformer) tuples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. You want something like. *transformers : list of estimators

n_jobs : int, optional
Number of jobs to run in parallel (default 1).

transformer_weights : dict, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably leave this out, given that transformer names are automatically assigned, so it only really adds confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks!

@@ -418,6 +418,14 @@ def test_make_union():
assert_equal(transformers, (pca, mock))


def test_make_union_kwargs():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a test that if an invalid **kwarg is given, there is an error? Otherwise LGTM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, added an extra assertion that raises a TypeError. Thanks!

fu = make_union(pca, mock, n_jobs=3)
assert_equal(fu.transformer_list, make_union(pca, mock).transformer_list)
assert_equal(3, fu.n_jobs)
assert_raises(TypeError, make_union,pca, mock, invalidFakeKwarg=42)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use assert_raise_message to check if the error message is helpful but it's also fine as-is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thinking, the error message is helpful so I will go that route instead.

@amueller amueller changed the title [MRG] chore(make_union): add n_jobs to make_union through kwargs [MRG + 1] chore(make_union): add n_jobs to make_union through kwargs Dec 12, 2016
# invalid keyword parameters should raise an error message
assert_raise_message(
TypeError,
"__init__() got an unexpected keyword argument 'invalidFakeKwarg'",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :)

@amueller
Copy link
Member

LGTM!

"""
return FeatureUnion(_name_estimators(transformers))
return FeatureUnion(_name_estimators(transformers), **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should check, until we decide, perhaps, to implement some kind of transformer_weights, that only n_jobs is passed in. Sometimes it's better to be conservative. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So error here if something is given that would be accepted by FeatureUnion but could be garbled? I'm not entirely sure what the risks are, but I'm happy to play it safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of being conservative, @jnothman

Maybe something like the following (but perhaps with a KeyError instead)?

def make_union(*transformers, **kwargs):
    """Construct a FeatureUnion from the given transformers.

    This is a shorthand for the FeatureUnion constructor; it does not require,
    and does not permit, naming the transformers. Instead, they will be given
    names automatically based on their types. It also does not allow weighting.

    Parameters
    ----------
    *transformers : list of estimators

    n_jobs : int, optional
        Number of jobs to run in parallel (default 1).

    Returns
    -------
    f : FeatureUnion

    Examples
    --------
    >>> from sklearn.decomposition import PCA, TruncatedSVD
    >>> from sklearn.pipeline import make_union
    >>> make_union(PCA(), TruncatedSVD())    # doctest: +NORMALIZE_WHITESPACE
    FeatureUnion(n_jobs=1,
           transformer_list=[('pca',
                              PCA(copy=True, iterated_power='auto',
                                  n_components=None, random_state=None,
                                  svd_solver='auto', tol=0.0, whiten=False)),
                             ('truncatedsvd',
                              TruncatedSVD(algorithm='randomized',
                              n_components=2, n_iter=5,
                              random_state=None, tol=0.0))],
           transformer_weights=None)
    """
    if 'n_jobs' in kwargs:
        return FeatureUnion(_name_estimators(transformers), **kwargs)
    elif kwargs:
        raise NotImplementedError
    else:
        return FeatureUnion(_name_estimators(transformers))

This gives us the following output:

In [9]: make_union(PCA(), TruncatedSVD())
Out[9]: 
FeatureUnion(n_jobs=1,
       transformer_list=[('pca', PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False)), ('truncatedsvd', TruncatedSVD(algorithm='randomized', n_components=2, n_iter=5,
       random_state=None, tol=0.0))],
       transformer_weights=None)

In [10]: make_union(PCA(), TruncatedSVD(), n_jobs=4)
Out[10]: 
FeatureUnion(n_jobs=4,
       transformer_list=[('pca', PCA(copy=True, iterated_power='auto', n_components=None, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False)), ('truncatedsvd', TruncatedSVD(algorithm='randomized', n_components=2, n_iter=5,
       random_state=None, tol=0.0))],
       transformer_weights=None)

In [11]: make_union(PCA(), TruncatedSVD(), anyOtherArg=3)
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-11-87bdeace7aa2> in <module>()
----> 1 make_union(PCA(), TruncatedSVD(), anyOtherArg=3)

/Users/acb/Documents/DataSci/contrib/scikit/scikit-learn/sklearn/pipeline.py in make_union(*transformers, **kwargs)
    818         return FeatureUnion(_name_estimators(transformers), **kwargs)
    819     elif kwargs:
--> 820         raise NotImplementedError
    821     else:
    822         return FeatureUnion(_name_estimators(transformers))

NotImplementedError: 

Thank you both for all your input on this 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want something more like

n_jobs = kwargs.pop('n_jobs', None)
if kwargs:
    # We do not currently support `transformer_weights` as we may want to change its type spec in make_union
    raise TypeError('Unknown keyword arguments: {}'.format(kwargs.keys()))
FeatureUnion(_name_estimators(transformers), n_jobs=n_jobs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sounds good. Would we want the default n_jobs to be 1 instead of None to keep it consistent?

@jnothman
Copy link
Member

jnothman commented Dec 15, 2016 via email

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also include transformer_weights in the test. Otherwise, this LGTM.

@alexandercbooth
Copy link
Contributor Author

That's a good point as the most likely kwarg people might try there is transformer_weights. That will make more sense to test rather than the fake kwarg I had before. Thanks!

@jnothman
Copy link
Member

Please add an Enhancements entry into whats_new.rst. Thanks!!

@jnothman jnothman changed the title [MRG + 1] chore(make_union): add n_jobs to make_union through kwargs [MRG+2] chore(make_union): add n_jobs to make_union through kwargs Dec 15, 2016
@alexandercbooth
Copy link
Contributor Author

Thanks!

@jnothman jnothman merged commit 8056d63 into scikit-learn:master Dec 15, 2016
@jnothman
Copy link
Member

In the future, please use "Fixes #xxxx" in the PR description so the issue closes automatically when PR is merged.

@alexandercbooth
Copy link
Contributor Author

Will do, sincere thanks for all you help, @jnothman!

sergeyf pushed a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017
@Przemo10 Przemo10 mentioned this pull request Mar 17, 2017
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants