Skip to content

FIX remove n_estimators checks from validate_estimator #24224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 1, 2022

Conversation

naoise-h
Copy link
Contributor

@naoise-h naoise-h commented Aug 22, 2022

What does this implement/fix? Explain your changes.

BaseEnsemble._validate_estimator currently checks the type and value of the class's n_estimators parameter, despite it already being checked as part of validate_parameter_constraints. This PR removes the redundant check in _validate_estimator.

Any other comments?

Cf. #7457 and #7146 (old issue/PR)

@cmarmo
Copy link
Contributor

cmarmo commented Aug 25, 2022

Hi @naoise-h, thank you for your pull request.
Indeed it seems to me that TypeError is the right exception to be thrown.
I am puzzled by the fact that the related test

def test_base_not_int_n_estimators():
# Check that instantiating a BaseEnsemble with a string as n_estimators
# raises a ValueError demanding n_estimators to be supplied as an integer.
string_ensemble = BaggingClassifier(base_estimator=Perceptron(), n_estimators="3")
iris = load_iris()
with pytest.raises(ValueError):
string_ensemble.fit(iris.data, iris.target)
float_ensemble = BaggingClassifier(base_estimator=Perceptron(), n_estimators=3.0)
with pytest.raises(ValueError):
float_ensemble.fit(iris.data, iris.target)

is not failing in your PR.... do you mind investigating?
Also, anyway the test should be modified accordingly with the new type of exception. Thanks!

@naoise-h
Copy link
Contributor Author

Thanks for pointing that out @cmarmo. It appears there is duplication in checking n_estimators with the validate_parameter_constraints method. As a result, the check in _validate_estimator is redundant.

raise ValueError(
f"The {param_name!r} parameter of {caller_name} must be"
f" {constraints_str}. Got {param_val!r} instead."
)

That leaves two questions:

  1. Is it worth refactoring validate_parameter_constraints to prioritise a type check, possibly by adding a is_correct_type method to constraints? Or even splitting the check between value and type, to throw more informative error messages?

  2. In the meantime, is it worth updating this PR to remove the redundant check on n_estimators from _validate_estimator?

@cmarmo cmarmo added the Validation related to input validation label Aug 26, 2022
@cmarmo
Copy link
Contributor

cmarmo commented Aug 26, 2022

Thanks for digging in! I'm afraid I'm not the right person to ask for an answer... I'm going to reference here #23462 as it is clearly related to parameter validation and label your PR accordingly. Hope this will help to give you a quick review.

@Micky774
Copy link
Contributor

Hey there @naoise-h, thanks for bringing this up, and good catch @cmarmo on the related test.

All base estimators which inherit form BaseEnsemble:

BaseBagging
BaseForest
BaseGradientBoosting
BaseWeightBoosting

currently have a parameter constraint for n_estimators, therefore I think it would be better to remove the n_estimators validation from BaseEnsemble._validate_estimator. This includes both of the following:

if not isinstance(self.n_estimators, numbers.Integral):
raise TypeError(
"n_estimators must be an integer, got {0}.".format(
type(self.n_estimators)
)
)
if self.n_estimators <= 0:
raise ValueError(
"n_estimators must be greater than zero, got {0}.".format(
self.n_estimators
)
)

@naoise-h naoise-h changed the title FIX error raised in BaseEnsemble._validate_estimator FIX remove n_estimators checks from validate_estimator Sep 1, 2022
@naoise-h
Copy link
Contributor Author

naoise-h commented Sep 1, 2022

Thanks @Micky774 and @cmarmo . I have updated the PR to remove the checks on n_estimators in validate_estimator.

Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>
@Micky774 Micky774 added Waiting for Reviewer Quick Review For PRs that are quick to review labels Sep 1, 2022
@Micky774
Copy link
Contributor

Micky774 commented Sep 1, 2022

@thomasjpfan You may be interested in reviewing this

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @naoise-h. This redundant validation can indeed be removed now. I also removed the tests that @cmarmo mentioned since it's now covered by the common test.

LGTM

@jeremiedbb jeremiedbb merged commit 4114161 into scikit-learn:main Sep 1, 2022
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Sep 12, 2022
…mator (scikit-learn#24224)

Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>
Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants