Skip to content

ENH Add Deprecating Position Arguments Helper #13311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Oct 7, 2019

Conversation

thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Feb 27, 2019

Reference Issues/PRs

Addresses #12805

What does this implement/fix? Explain your changes.

Adds a function decorator, warn_args , that can wraps a function and issues a warning when any of the positional arguments passed after * will issue a warning.

For functions:

@_deprecate_positional_args
def dbscan(X, eps=0.5, *, min_samples=4, metric='minkowski'):
    pass

Calling dbscan(X, 0.1, 5) will result with a DeprecationWarning:

DeprecationWarning: Should use keyword args: min_samples=5

For classes

class LogisticRegression:
    
    @_deprecate_positional_args
    def __init__(self, penalty='l2', *, dual=False):

        self.penalty = penalty
        self.dual = dual

Calling LogisticRegression('l2', True) will result with a DeprecationWarning:

Should use keyword args: dual=True

Any other comments?

_deprecate_positional_args uses the fact that the first element of a class method is traditionally named self to determine if the function is called in a instance method like __init__.

@jnothman
Copy link
Member

jnothman commented Feb 28, 2019 via email

@GaelVaroquaux
Copy link
Member

Is the idea that after two releases, we remove the "*" that gets added in functions? (I find that it really looks ugly. I could see it confusing beginners.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make sure that this warning raises in our test suite and doc building.

We should also document this is what's new and perhaps more clearly somewhere in the user guide??

The only problem I see at the moment is that the * doesn't show in the API reference documentation. I don't think it's a big problem, since almost all args can be passed as kwargs.

Do we need to test support for non-__init__ methods?

@@ -1788,3 +1788,10 @@ def test_penalty_none(solver):
"LogisticRegressionCV",
lr.fit, X, y
)


def test_logistic_warns_with_args():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this. it's not practical or helpful to have this for everything.

args_msg = ['{}={}'.format(name, arg)
for name, arg in zip(kwonlyargs[:extra_args],
err_args[-extra_args:])]
warnings.warn("Should use keyword args: "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something more forceful. "Pass xxxx as keyword args (e.g. xxxx=...). From version 0.24 passing these as positional arguments will result in an error."

Then we need to obligate ourselves to finish all the decorating by 0.22.

@@ -936,3 +937,38 @@ def check_non_negative(X, whom):

if X_min < 0:
raise ValueError("Negative values in data passed to %s" % whom)


def warn_args(f):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call it deprecate_positional_args?

def inner_f(*args, **kwargs):
extra_args = len(args) - len(orig_spec)
if extra_args > 0:
# ignore first 'self' argument for class methods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class -> instance

orig_spec = argspec.args

# Assumes class method has 'self' as first argument
is_class_method = orig_spec and 'self' == orig_spec[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is inspect.ismethod applicable here? Maybe it's not bound yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_class_method -> is_method

@jnothman
Copy link
Member

jnothman commented Mar 1, 2019

@GaelVaroquaux

Is the idea that after two releases, we remove the "*" that gets added in functions? (I find that it really looks ugly. I could see it confusing beginners.

No, the idea is that that * is what makes this a hard constraint rather than a cultural convention, after which we are:

  • assured users are passing all but a few key params by name
  • able to reorder parameters as we see fit and hence introduce params
  • able to document params in any order (ideally under multiple section headings if we could get that introduced in numpydoc) so that we can order them either alphabetically or by their information / clustering (though this is hard) rather than ordering them roughly by date of addition!

@codecov
Copy link

codecov bot commented May 31, 2019

Codecov Report

Merging #13311 into master will decrease coverage by 4.31%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #13311      +/-   ##
==========================================
- Coverage    96.8%   92.49%   -4.32%     
==========================================
  Files         393      420      +27     
  Lines       71859    74830    +2971     
  Branches     7866        0    -7866     
==========================================
- Hits        69564    69211     -353     
- Misses       2272     5619    +3347     
+ Partials       23        0      -23
Impacted Files Coverage Δ
sklearn/utils/tests/test_validation.py 98.55% <100%> (+0.02%) ⬆️
sklearn/utils/validation.py 99.64% <100%> (+0.01%) ⬆️
sklearn/linear_model/logistic.py 98.92% <100%> (+0.01%) ⬆️
sklearn/ensemble/partial_dependence.py 32.91% <0%> (-62.77%) ⬇️
sklearn/ensemble/tests/test_partial_dependence.py 61.85% <0%> (-38.15%) ⬇️
sklearn/tree/export.py 79.51% <0%> (-15.37%) ⬇️
sklearn/tree/tests/test_export.py 94.16% <0%> (-5.84%) ⬇️
sklearn/cluster/mean_shift_.py 95.14% <0%> (-3.89%) ⬇️
sklearn/tests/test_site_joblib.py 96.66% <0%> (-3.34%) ⬇️
sklearn/covariance/elliptic_envelope.py 97.22% <0%> (-2.78%) ⬇️
... and 293 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5e0d1a1...2796868. Read the comment docs.

@amueller
Copy link
Member

Does this need a slep/discussion/vote?

@rth
Copy link
Member

rth commented Jul 12, 2019

I wonder what happens with auto-completion e.g. in Jupyter etc. Would it still work even with the decorator?

@adrinjalali
Copy link
Member

@amueller

Does this need a slep/discussion/vote?

Here's the SLEP: scikit-learn/enhancement_proposals#19

@rth

I wonder what happens with auto-completion e.g. in Jupyter etc. Would it still work even with the decorator?

I put the output for ipython and VSCode in the SLEP

@adrinjalali
Copy link
Member

@thomasjpfan as I pasted in the SLEP, the order of items in the hint shown by ipython seem different with the decorator, any chance we could fix/change that?

@thomasjpfan
Copy link
Member Author

@adrinjalali Are you using the deprecate_positional_args version of this PR? In iPython and jupyter lab, the auto complete seems to be working for me. (ipython=7.6.1, jupyterlab=0.35.5)

@adrinjalali
Copy link
Member

Yeah I took the decorator from this PR. I haven't tried jupyterlab though.

So you're saying you see something else than the one I pasted in the SLEP?

@thomasjpfan
Copy link
Member Author

So you're saying you see something else than the one I pasted in the SLEP?

Yup. I get the same auto complete results for the wrapped and unwrapped version of the function. My implementation calls decorates the wrapped function with @wrap(f) which places the original function in f.__wrapped__. autocomplete should be using this for the original function signature.

@amueller
Copy link
Member

@adrinjalali
Copy link
Member

adrinjalali commented Jul 17, 2019

I quite like their idea of rename_parameter, delete_parameter, deprecate_parameter, etc, as decorators. It's so neat!

@amueller
Copy link
Member

@adrinjalali I once started writing a library to do that:
https://github.com/amueller/futurepast
but I never had time to go through with it.

@thomasjpfan thomasjpfan added the API label Aug 6, 2019
@jnothman
Copy link
Member

SLEP009 is all but accepted. Please resolve conflicts and update for review.

@thomasjpfan
Copy link
Member Author

Okay lets see how this looks like with #15005 (comment)

noise_scale = 0.01

# Create S-curve dataset
X, y = datasets.samples_generator.make_s_curve(n_samples, random_state=0)

# Compute isomap embedding
iso = manifold.Isomap(n_components, 2)
iso = manifold.Isomap(n_neighbors, n_components=2)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really makes me want to do: Isomap(*, n_neighbors=5, n_components=2,...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with that

@jnothman
Copy link
Member

Do you have a way of systematically considering all the signatures?

1 similar comment
@jnothman
Copy link
Member

Do you have a way of systematically considering all the signatures?

@thomasjpfan
Copy link
Member Author

For working on this PR, I had a code snippet that checked for keyword only arguments based on order:

from inspect import signature, Parameter
from sklearn.utils.testing import all_estimators
from collections import defaultdict

whitelist = set([
    'n_components', 'estimator', 'base_estimator', 'n_clusters', 'n_neighbors', 'steps',
    'regressor', 'transformers', 'n_estimators', 'transformer_list', 'score_func', 'func',
    'degree', 'estimators'])

public_estimators = [tup for tup in all_estimators() if not tup[0].startswith("_")]
estimators_by_module = defaultdict(list)
for name, est in public_estimators:
    module = est.__module__
    estimators_by_module[module].append((name, est))
    
for module, estimators in estimators_by_module.items():
    print(module)
    for name, est in estimators:
        sig = signature(est)
        if sig.parameters:
            params = list(sig.parameters.items())
            first_name, first_param = params[0]
            if first_name in whitelist:
                if first_param.kind == Parameter.KEYWORD_ONLY:
                    print("*", name, first_param, "INCORRECT KEYWORD_ONLY")
                params = params[1:]
            # remove parameters without defaults
            params = [p for p in params if p[1].default != Parameter.empty]
            # the rest must be keywords only
            if not all(p[1].kind in [Parameter.KEYWORD_ONLY]
                       for p in params):
                print("*", name, "THE REST SHOULD BE KEYWORD_ONLY")

I added a test, test_estimators_keyword_only, to the whitelist and that the rest of the parameters are keyword only.

"name, Estimator",
[(name, Estimator)
for name, Estimator in all_estimators() if not name.startswith("_")])
def test_estimators_keyword_only(name, Estimator):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test makes sure that the whitelist is POSITION_OR_KEYWORD and that the rest of the parameters (without defaults) is KEYWORD_ONLY.

@NicolasHug
Copy link
Member

NicolasHug commented Sep 25, 2019

I agree with #13311 (comment), can we first merge a simpler PR where only the decorator is introduced?

@qinhanmin2014
Copy link
Member

qinhanmin2014 commented Oct 4, 2019

Here is the current whitelist:

Honestly I'm still uncomfortable about the long whitelist (e.g., PCA(2))). I think we introduce keyword only agrs because we want to force users to sprcify the name of the parameter, so the whitelist should be as short as possible. It's true that we'll break user's code, but we have a deprecation cycle so I think that's acceptable.

@thomasjpfan
Copy link
Member Author

I agree with #13311 (comment), can we first merge a simpler PR where only the decorator is introduced?

After we merge this decorator, and we go through each module group by group. We would need to wait till we go through every module before we can release. Otherwise there will be some estimators that have this warning and others that do not.

Honestly I'm still uncomfortable about the long whitelist

@qinhanmin2014 What do you think the whitelist should be?

@qinhanmin2014
Copy link
Member

@qinhanmin2014 What do you think the whitelist should be?

Not sure, but shorter than current version, e.g., perhaps n_components, n_clusters and n_neighbors should not be in the list.

But as I've said in gitter, if this is the final decision, I won't oppose.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good now. We need probably another PR to explain keyword only args and how we use them in sklearn I think. (Another example of where a blog would be nice)

@thomasjpfan
Copy link
Member Author

This PR was stripped down to only include the decorator. It doesn’t use it anywhere.

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, given that this only add the helper function. Thanks @thomasjpfan !

@rth rth changed the title [MRG] Adds Deprecating Position Arguments Helper ENH Add Deprecating Position Arguments Helper Oct 7, 2019
@rth rth merged commit f450173 into scikit-learn:master Oct 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants