Skip to content

MAINT Param validation: decorate all estimators with _fit_context #26473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 14, 2023

Conversation

jeremiedbb
Copy link
Member

@jeremiedbb jeremiedbb commented May 31, 2023

Follow-up of #25815
This PR actually applies the decorator to all estimators.

There are situations where we don't want to skip inner validation (prefer_skip_nested_validation=False):

  • the user passes an estimator instance
  • the user passes a callable and the args for the callable as a dict (e.g. metric and metric_params). Note that if the user only passes a callable we want to skip inner validation because the args passed to the callable come from us and not from the user.

Estimators receiving cv objects can skip inner validation because cv objects are not validated yet. When we decide to validate cv objects we'll need to revisit this.

Something that I missed in #25815 is that for partial_fit we don't want to validate if fit or partial_fit has already been called (which was not done properly for all estimators). I added such a mechanism in the _fit_context decorator.

@jeremiedbb jeremiedbb added this to the 1.3 milestone May 31, 2023
@glemaitre glemaitre self-requested a review June 1, 2023 07:48
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only nitpcks. I check that we did not forget any file and looked at the _validate_params and if it makes sense to skip or not.

@jeremiedbb jeremiedbb added the Validation related to input validation label Jun 7, 2023
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't prefer_skip_nested_validation=False/True here redundant now that we've added them to the parameter validation definitions themselves?

@@ -318,6 +319,7 @@ def _get_estimator(self):

return estimator

@_fit_context(prefer_skip_nested_validation=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'd need a comment in the code whenever these are False.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jeremiedbb
Copy link
Member Author

Aren't prefer_skip_nested_validation=False/True here redundant now that we've added them to the parameter validation definitions themselves?

Sorry @adrinjalali, I don't understand your question. Redundant with what ?

@adrinjalali
Copy link
Member

What I mean is, in functions, @validate_params has the prefer_skip_nested_validation argument. Here' we're setting it again. So I'm wondering how they'd interact, it's quite confusing.

@jeremiedbb
Copy link
Member Author

Here we're setting it for estimators. Every estimator validates the parameters in the _fit_context decorator, i.e. just before calling fit. Every public function validates its parameters in the validate_params decorator.

prefer_skip_nested_validation tells the estimator (resp. function) whether or not to skip the validation of all scikit-learn objects called within the estimator (resp. function). We have public functions calling estimator's fit and estimators calling public functions. In the decorators we check the global config and validate or not accordingly. Then we set the config appropriately for inner calls depending on prefer_skip_nested_validation.

@adrinjalali adrinjalali merged commit 9c266cf into scikit-learn:main Jun 14, 2023
avm19 pushed a commit to avm19/scikit-learn that referenced this pull request Jun 15, 2023
Comment on lines +1135 to +1140
# we don't want to validate again for each call to partial_fit
partial_fit_and_fitted = (
fit_method.__name__ == "partial_fit" and _is_fitted(estimator)
)

if not global_skip_validation and not partial_fit_and_fitted:
Copy link
Contributor

@avm19 avm19 Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about computing partial_fit_and_fitted inside if not global_skip_validation:? I don't know if _is_fitted(estimator) is always fast, because it may check all parameter names of the estimator or call estimator.__sklearn_is_fitted__() if present. It might be a good idea to avoid it when not necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
No Changelog Needed Validation related to input validation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants