Skip to content

MAINT validate parameters in NaiveBayes estimators #23674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 28, 2022

Conversation

Diadochokinetic
Copy link
Contributor

Reference Issues/PRs

Make all estimators use _validate_params #23462

What does this implement/fix? Explain your changes.

Add parameter_constraints to the classes in sklearn.naive_bayes.py, add validate_params in fit function, remove redundant parameter tests and correct documentation.

@Diadochokinetic
Copy link
Contributor Author

The original docstring states, that alpha=0 will result in no smoothing. But the Estimators using alpha as a parameter are based on _BaseDiscreteNB, which calls _check_alpha in the fit method. _check_alpha is raising a warning, if np.min(self.alpha) < _ALPHA_MIN and afterwards setting np.maximum(self.alpha, _ALPHA_MIN). Therefore alpha=0 doesn't have any effect and the parameter_constraint can be set to _ALPHA_MIN.

@Diadochokinetic
Copy link
Contributor Author

I don't understand this mypy error: https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=43423&view=logs&j=32e2e1bb-a28f-5b18-6cfc-3f01273f5609&t=f370d024-dfa6-536d-2b2d-766ea2a2900c&l=14

When I run mypy sklearn/naive_bayes.py on my local machine, I get Success: no issues found in 1 source file. Can some explain to me, what the issue is?

@Micky774
Copy link
Contributor

Micky774 commented Jun 22, 2022

The original docstring states, that alpha=0 will result in no smoothing. But the Estimators using alpha as a parameter are based on _BaseDiscreteNB, which calls _check_alpha in the fit method. _check_alpha is raising a warning, if np.min(self.alpha) < _ALPHA_MIN and afterwards setting np.maximum(self.alpha, _ALPHA_MIN). Therefore alpha=0 doesn't have any effect and the parameter_constraint can be set to _ALPHA_MIN.

This behavior is also addressed in #22269 and is expected to change soon(ish). I think, for this PR, it would probably be best for you to preserve existing behavior, i.e. allow a user to enter alpha < _ALPHA_MIN and then raise the current warning.

@Diadochokinetic
Copy link
Contributor Author

The original docstring states, that alpha=0 will result in no smoothing. But the Estimators using alpha as a parameter are based on _BaseDiscreteNB, which calls _check_alpha in the fit method. _check_alpha is raising a warning, if np.min(self.alpha) < _ALPHA_MIN and afterwards setting np.maximum(self.alpha, _ALPHA_MIN). Therefore alpha=0 doesn't have any effect and the parameter_constraint can be set to _ALPHA_MIN.

This behavior is also addressed in #22269 and is expected to change soon(ish). I think, for this PR, it would probably be best for you to preserve existing behavior, i.e. allow a user to enter alpha < _ALPHA_MIN and then raise the current warning.

Alright, I am going to reverse my changes regarded _ALPHA_MIN. I'll probably get to it tomorrow :)

@Diadochokinetic Diadochokinetic changed the title [WIP] towards #23462 Add _validate_params to sklearn.naive_bayes.py [MRG] towards #23462 Add _validate_params to sklearn.naive_bayes.py Jun 27, 2022
@glemaitre glemaitre self-requested a review June 27, 2022 12:05
@glemaitre glemaitre changed the title [MRG] towards #23462 Add _validate_params to sklearn.naive_bayes.py MAINT validate parameters in NaiveBayes estimators Jun 27, 2022
@glemaitre
Copy link
Member

I revert GaussianNB because an earlier PR already solve the issue: #23583 and it is already approved.

@Diadochokinetic
Copy link
Contributor Author

I revert GaussianNB because an earlier PR already solve the issue: #23583 and it is already approved.

Oh, I missed that. I checked the PR section, but I filtered for the ticket number, since the issue told us to put the ticket number in the title. At least we had the same solution :D

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I push a small refactoring for alpha and made a proper inheritance for the constraints that are more intuitive.

@glemaitre glemaitre self-requested a review June 27, 2022 13:14
@Diadochokinetic
Copy link
Contributor Author

I push a small refactoring for alpha and made a proper inheritance for the constraints that are more intuitive.

Ah, that's a good idea. If the constraints of the child class equal the parent class, just use the parent class.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I just pushed a small change to only validate at the first call to partial_fit

@jeremiedbb jeremiedbb merged commit 56b0417 into scikit-learn:main Jun 28, 2022
@jeremiedbb
Copy link
Member

Thanks @Diadochokinetic

@Diadochokinetic Diadochokinetic deleted the naive_valid_params branch June 29, 2022 17:26
ogrisel pushed a commit to ogrisel/scikit-learn that referenced this pull request Jul 11, 2022
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants