Skip to content

[MRG+1] Accept keyword parameters to hyperparameter search fit methods #8278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

stephen-hoover
Copy link
Contributor

Reference Issue

Closes #2879

What does this implement/fix? Explain your changes.

This PR allows the hyperparameter search classes in sklearn.model_selection._search (i.e. GridSearchCV and RandomizedSearchCV) to accept keyword parameters and pass them to the wrapped Estimator. This brings their fit methods into closer compliance with the Estimator API. This PR does not address issues #8158 or #4632 which deal with passing parameters to the scoring function.

To ease testing of the new parameter, I added an extra argument to the sklearn.utils.mocking.CheckingClassifier which allows it to enforce the presence of keyword fit arguments and verify that they have the expected length.

Any other comments?

The ultimate fate of fit parameters on these methods will depend on the resolution to #4497, but I think the change in this PR conforms to present usage without establishing any new conventions. Accepting keyword parameters in these fit methods is something that I (and I think other users) would expect.

One limitation of this PR is that the groups parameter is still reserved by the fit method and not passed to the wrapped Estimator. Perhaps a solution to that would wait on #4497, but I think it doesn't need to be solved in this PR.

I also had to choose what to do if users try to provide fit parameters through both the constructor and the fit method. I chose to emit a warning and ignore parameters passed through the constructor (since that behavior is deprecated), but I don't feel strongly about that decision.

I searched through existing documentation and tests, but I was unable to find any uses of the fit_params constructor argument for either GridSearchCV or RandomizedSearchCV.

Deprecate ``fit_params`` as a constructor argument to the hyperparameter search classes and instead accept keyword parameters to the ``fit`` methods. This makes the ``fit`` methods of these functions conform to the Estimator API and allows the use of hyperparameter search functions in other CV utility functions such as ``cross_val_predict``.
@stephen-hoover
Copy link
Contributor Author

@jnothman , this is my minimal fix to #2879 .

@jnothman
Copy link
Member

jnothman commented Feb 4, 2017 via email

@@ -45,19 +45,30 @@ class CheckingClassifier(BaseEstimator, ClassifierMixin):
changed the input.
"""
def __init__(self, check_y=None,
check_X=None, foo_param=0):
check_X=None, foo_param=0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cosmetic: put more args on the first line as long as they fit under 80 columns

assert_true(len(X) == len(y))
if self.check_X is not None:
assert_true(self.check_X(X))
if self.check_y is not None:
assert_true(self.check_y(y))
self.classes_ = np.unique(check_array(y, ensure_2d=False,
allow_nd=True))
if self.expected_fit_params:
missing = (set(np.atleast_1d(self.expected_fit_params)) -
Copy link
Member

@ogrisel ogrisel Feb 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why converting to an numpy array with np.atleast_1d before converting to a set? Converting the list to a set is enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The np.atleast_1d call converts a string to a one-element array, so that users could give a string input to expected_fit_params instead of a one-element list.

In this case, without the array casting there would be an obvious and loud error if someone forgot to make that input a list, and there's not much benefit to saving two characters for a test helper. I'll remove the np.atleast_1d.

@ogrisel
Copy link
Member

ogrisel commented Feb 8, 2017

Please add a new test to check that the deprecation warning is raised as expected.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also remove the fit_params parameter from the class docstrings and do a review of the full code base with git grep fit_params= to check that fit_params are no longer passed as input to the parameter search class constructors and update the code to pass those to the fit method instead.

grid_search = GridSearchCV(clf, {'foo_param': [1, 2, 3]})

# The CheckingClassifer generates an assertion error if
# a parameter is missing or has length != len(X).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of putting a comment to explain what CheckingClassifier should raise in that case, please add an assertion with assert_raises.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider assert_raises_regex to check the error message on top of checking the exception type.

random_search = RandomizedSearchCV(clf, {'foo_param': [0]}, n_iter=1)

# The CheckingClassifer generates an assertion error if
# a parameter is missing or has length != len(X).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add an assertion with assert_raises here.

@ogrisel
Copy link
Member

ogrisel commented Feb 8, 2017

LGTM besides the requested changes.

@codecov
Copy link

codecov bot commented Feb 8, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@53f8082). Click here to learn what that means.

@@            Coverage Diff            @@
##             master    #8278   +/-   ##
=========================================
  Coverage          ?   94.74%           
=========================================
  Files             ?      342           
  Lines             ?    60711           
  Branches          ?        0           
=========================================
  Hits              ?    57519           
  Misses            ?     3192           
  Partials          ?        0
Impacted Files Coverage Δ
sklearn/linear_model/tests/test_ridge.py 100% <100%> (ø)
sklearn/model_selection/tests/test_search.py 99.28% <100%> (ø)
sklearn/utils/mocking.py 100% <100%> (ø)
sklearn/linear_model/ridge.py 93.88% <100%> (ø)
sklearn/model_selection/_search.py 97.85% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 53f8082...9604c1d. Read the comment docs.

@stephen-hoover
Copy link
Contributor Author

@ogrisel , I've made the changes that you and @lesteve requested.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much, LGTM.

@ogrisel ogrisel changed the title [MRG] Accept keyword parameters to hyperparameter search fit methods [MRG+1] Accept keyword parameters to hyperparameter search fit methods Feb 8, 2017
@GaelVaroquaux
Copy link
Member

I am +1 to merge. This will be modified by the sample_props, but is it indeed going in the right direction in terms of API.

@GaelVaroquaux GaelVaroquaux merged commit 3a0ea19 into scikit-learn:master Feb 9, 2017
sergeyf pushed a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017
scikit-learn#8278)

* ENH Accept keyword parameters to hyperparameter search fit methods

Deprecate ``fit_params`` as a constructor argument to the hyperparameter search classes and instead accept keyword parameters to the ``fit`` methods. This makes the ``fit`` methods of these functions conform to the Estimator API and allows the use of hyperparameter search functions in other CV utility functions such as ``cross_val_predict``.

* CR: Expanded tests, remove deprecated use in Ridge

* Make tests consistent in Python 2 and 3
@ecampana
Copy link

ecampana commented Mar 11, 2017

Does GridSearchCV allow using a Pipeline estimator with fit_params, e.g. fit_params={'sample_weight': sample_weight}, such that sample_weight is passed from the grid.fit() method to the pipe.fit() method which is then ultimately passed to a classifier fit() method? I posted this question to #2630 as well. I am involved in the HEP community, and we are very interested in finding out if this type of procedure is at all possible as our final goal is to re-implement our analysis purely with scikit-learn. Thank you for your time.

@stephen-hoover
Copy link
Contributor Author

@ecampana , yes, Pipelines accept fit parameters and pass them to their components. I commented on your question in #2630 , since you had examples there. But use fit parameters in fit, rather than in the GridSearchCV's constructor. You do still need to specify which step of the pipeline should get the fit parameter, so you can't use sample_weight= by itself. I see in the examples you gave that you do have the right parameters names to give to the Pipeline, though.

@ecampana
Copy link

ecampana commented Mar 16, 2017

@stephen-hoover, thank you for your reply and confirming that GridSearchCV and Pipeline in conjunction accomplish what I was hoping for but for which I was not quite uncertain. I have made further comments to my question in #2630 if you would by any chance have the time for it. Thanks again!

@Przemo10 Przemo10 mentioned this pull request Mar 17, 2017
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
scikit-learn#8278)

* ENH Accept keyword parameters to hyperparameter search fit methods

Deprecate ``fit_params`` as a constructor argument to the hyperparameter search classes and instead accept keyword parameters to the ``fit`` methods. This makes the ``fit`` methods of these functions conform to the Estimator API and allows the use of hyperparameter search functions in other CV utility functions such as ``cross_val_predict``.

* CR: Expanded tests, remove deprecated use in Ridge

* Make tests consistent in Python 2 and 3
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
scikit-learn#8278)

* ENH Accept keyword parameters to hyperparameter search fit methods

Deprecate ``fit_params`` as a constructor argument to the hyperparameter search classes and instead accept keyword parameters to the ``fit`` methods. This makes the ``fit`` methods of these functions conform to the Estimator API and allows the use of hyperparameter search functions in other CV utility functions such as ``cross_val_predict``.

* CR: Expanded tests, remove deprecated use in Ridge

* Make tests consistent in Python 2 and 3
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
scikit-learn#8278)

* ENH Accept keyword parameters to hyperparameter search fit methods

Deprecate ``fit_params`` as a constructor argument to the hyperparameter search classes and instead accept keyword parameters to the ``fit`` methods. This makes the ``fit`` methods of these functions conform to the Estimator API and allows the use of hyperparameter search functions in other CV utility functions such as ``cross_val_predict``.

* CR: Expanded tests, remove deprecated use in Ridge

* Make tests consistent in Python 2 and 3
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
scikit-learn#8278)

* ENH Accept keyword parameters to hyperparameter search fit methods

Deprecate ``fit_params`` as a constructor argument to the hyperparameter search classes and instead accept keyword parameters to the ``fit`` methods. This makes the ``fit`` methods of these functions conform to the Estimator API and allows the use of hyperparameter search functions in other CV utility functions such as ``cross_val_predict``.

* CR: Expanded tests, remove deprecated use in Ridge

* Make tests consistent in Python 2 and 3
lemonlaug pushed a commit to lemonlaug/scikit-learn that referenced this pull request Jan 6, 2021
scikit-learn#8278)

* ENH Accept keyword parameters to hyperparameter search fit methods

Deprecate ``fit_params`` as a constructor argument to the hyperparameter search classes and instead accept keyword parameters to the ``fit`` methods. This makes the ``fit`` methods of these functions conform to the Estimator API and allows the use of hyperparameter search functions in other CV utility functions such as ``cross_val_predict``.

* CR: Expanded tests, remove deprecated use in Ridge

* Make tests consistent in Python 2 and 3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sample_weights array can't be used with GridSearchCV
6 participants