-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Fix MultinomialNB and BernoulliNB alpha=0 bug (continuation) #9131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…into fix_NB_5814
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Please wrap those fit calls in assert_warns in tests to ensure the warning is triggered.
We should also have a small test for the ValueError case, using assert_raise_messsage or similar.
Thanks for taking this on.
Thanks for reviewing! Tests are done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick work
sklearn/tests/test_naive_bayes.py
Outdated
# Test sparse X | ||
X = scipy.sparse.csr_matrix(X) | ||
nb = BernoulliNB(alpha=0.) | ||
nb.fit(X, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will also raise the warning which we would rather not see in tests. Either assert or ignore.
sklearn/tests/test_naive_bayes.py
Outdated
y = np.array([0, 1]) | ||
b_nb = BernoulliNB(alpha=-0.1) | ||
m_nb = MultinomialNB(alpha=-0.1) | ||
assert_raises(ValueError, b_nb.fit, X, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better if this tests the error message with the assert_raise_* variants
Please add an entry to the change log at |
sklearn/naive_bayes.py
Outdated
@@ -460,6 +463,16 @@ def _update_class_log_prior(self, class_prior=None): | |||
else: | |||
self.class_log_prior_ = np.zeros(n_classes) - np.log(n_classes) | |||
|
|||
def _check_alpha(self): | |||
if self.alpha < 0: | |||
raise ValueError('Smoothing parameter alpha = %e. ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can now see that %e is clearly a bad pick. %.1e would be better and just as informative
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point! done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, and assuming tests pass, LGTM
sklearn/naive_bayes.py
Outdated
def _check_alpha(self): | ||
if self.alpha < 0: | ||
raise ValueError('Smoothing parameter alpha = %.1e. ' | ||
'alpha must be >= 0!' % self.alpha) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed this says >= which is a little awkward we warn when it's 0 and change the value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps alternate language is "alpha should be > 0", as opposed to 'must be'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, it's already fixed
@@ -460,6 +463,16 @@ def _update_class_log_prior(self, class_prior=None): | |||
else: | |||
self.class_log_prior_ = np.zeros(n_classes) - np.log(n_classes) | |||
|
|||
def _check_alpha(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you went through this before, but why isn't alpha just initially set to an appropriate value and then used as before, instead of changing the code a lot as below?
Modifying attributes didn't pay nice with our set_params API
…On 16 Jun 2017 10:36 am, "Jacob Schreiber" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/naive_bayes.py
<#9131 (comment)>
:
> @@ -460,6 +463,16 @@ def _update_class_log_prior(self, class_prior=None):
else:
self.class_log_prior_ = np.zeros(n_classes) - np.log(n_classes)
+ def _check_alpha(self):
Maybe you went through this before, but why isn't alpha just initially set
to an appropriate value and then used as before, instead of changing the
code a lot as below?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#9131 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6yspSztoncgMuuTDYOBIqpTi5Oe-ks5sEc31gaJpZM4N7P0O>
.
|
It wouldn't be modifying attributes, it would be checking alpha when passed into init and setting it appropriately initially. |
Not sure to understand your point but if we move the checking at clf = BernoulliNB(alpha=0.1)
clf.fit(X, y)
params={'alpha': 0.0}
clf.set_params(**params)
clf.fit(X, y) # Will crash |
This looks good to me. Thanks for the contribution! |
…cikit-learn#9131) * Fix scikit-learn#5814 * Fix pep8 in naive_bayes.py:716 * Fix sparse matrix incompatibility * Fix python 2.7 problem in test_naive_bayes * Make sure the values are probabilities before log transform * Improve docstring of `_safe_logprob` * Clip alpha solution * Clip alpha solution * Clip alpha in fit and partial_fit * Add what's new entry * Add test * Remove .project * Replace assert method * Update what's new * Format float into %.1e * Update ValueError msg
…cikit-learn#9131) * Fix scikit-learn#5814 * Fix pep8 in naive_bayes.py:716 * Fix sparse matrix incompatibility * Fix python 2.7 problem in test_naive_bayes * Make sure the values are probabilities before log transform * Improve docstring of `_safe_logprob` * Clip alpha solution * Clip alpha solution * Clip alpha in fit and partial_fit * Add what's new entry * Add test * Remove .project * Replace assert method * Update what's new * Format float into %.1e * Update ValueError msg
…cikit-learn#9131) * Fix scikit-learn#5814 * Fix pep8 in naive_bayes.py:716 * Fix sparse matrix incompatibility * Fix python 2.7 problem in test_naive_bayes * Make sure the values are probabilities before log transform * Improve docstring of `_safe_logprob` * Clip alpha solution * Clip alpha solution * Clip alpha in fit and partial_fit * Add what's new entry * Add test * Remove .project * Replace assert method * Update what's new * Format float into %.1e * Update ValueError msg
…cikit-learn#9131) * Fix scikit-learn#5814 * Fix pep8 in naive_bayes.py:716 * Fix sparse matrix incompatibility * Fix python 2.7 problem in test_naive_bayes * Make sure the values are probabilities before log transform * Improve docstring of `_safe_logprob` * Clip alpha solution * Clip alpha solution * Clip alpha in fit and partial_fit * Add what's new entry * Add test * Remove .project * Replace assert method * Update what's new * Format float into %.1e * Update ValueError msg
…cikit-learn#9131) * Fix scikit-learn#5814 * Fix pep8 in naive_bayes.py:716 * Fix sparse matrix incompatibility * Fix python 2.7 problem in test_naive_bayes * Make sure the values are probabilities before log transform * Improve docstring of `_safe_logprob` * Clip alpha solution * Clip alpha solution * Clip alpha in fit and partial_fit * Add what's new entry * Add test * Remove .project * Replace assert method * Update what's new * Format float into %.1e * Update ValueError msg
…cikit-learn#9131) * Fix scikit-learn#5814 * Fix pep8 in naive_bayes.py:716 * Fix sparse matrix incompatibility * Fix python 2.7 problem in test_naive_bayes * Make sure the values are probabilities before log transform * Improve docstring of `_safe_logprob` * Clip alpha solution * Clip alpha solution * Clip alpha in fit and partial_fit * Add what's new entry * Add test * Remove .project * Replace assert method * Update what's new * Format float into %.1e * Update ValueError msg
…cikit-learn#9131) * Fix scikit-learn#5814 * Fix pep8 in naive_bayes.py:716 * Fix sparse matrix incompatibility * Fix python 2.7 problem in test_naive_bayes * Make sure the values are probabilities before log transform * Improve docstring of `_safe_logprob` * Clip alpha solution * Clip alpha solution * Clip alpha in fit and partial_fit * Add what's new entry * Add test * Remove .project * Replace assert method * Update what's new * Format float into %.1e * Update ValueError msg
Reference Issue
Fixes #5814, continuation of #7477
What does this implement/fix? Explain your changes.
Move alpha setting into
fit
andpartial_fit