Skip to content

[MRG+1] Fix MultinomialNB and BernoulliNB alpha=0 bug (continuation) #9131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jun 19, 2017

Conversation

herilalaina
Copy link
Contributor

Reference Issue

Fixes #5814, continuation of #7477

What does this implement/fix? Explain your changes.

Move alpha setting into fit and partial_fit

@herilalaina herilalaina changed the title [WIP] Fix MultinomialNB and BernoulliNB alpha=0 bug (continuation) [MRG] Fix MultinomialNB and BernoulliNB alpha=0 bug (continuation) Jun 15, 2017
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Please wrap those fit calls in assert_warns in tests to ensure the warning is triggered.

We should also have a small test for the ValueError case, using assert_raise_messsage or similar.

Thanks for taking this on.

@herilalaina
Copy link
Contributor Author

Thanks for reviewing! Tests are done.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick work

# Test sparse X
X = scipy.sparse.csr_matrix(X)
nb = BernoulliNB(alpha=0.)
nb.fit(X, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also raise the warning which we would rather not see in tests. Either assert or ignore.

y = np.array([0, 1])
b_nb = BernoulliNB(alpha=-0.1)
m_nb = MultinomialNB(alpha=-0.1)
assert_raises(ValueError, b_nb.fit, X, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better if this tests the error message with the assert_raise_* variants

@jnothman
Copy link
Member

Please add an entry to the change log at doc/whats_new.rst.

@@ -460,6 +463,16 @@ def _update_class_log_prior(self, class_prior=None):
else:
self.class_log_prior_ = np.zeros(n_classes) - np.log(n_classes)

def _check_alpha(self):
if self.alpha < 0:
raise ValueError('Smoothing parameter alpha = %e. '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can now see that %e is clearly a bad pick. %.1e would be better and just as informative

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point! done

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, and assuming tests pass, LGTM

def _check_alpha(self):
if self.alpha < 0:
raise ValueError('Smoothing parameter alpha = %.1e. '
'alpha must be >= 0!' % self.alpha)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed this says >= which is a little awkward we warn when it's 0 and change the value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps alternate language is "alpha should be > 0", as opposed to 'must be'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, it's already fixed

@@ -460,6 +463,16 @@ def _update_class_log_prior(self, class_prior=None):
else:
self.class_log_prior_ = np.zeros(n_classes) - np.log(n_classes)

def _check_alpha(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you went through this before, but why isn't alpha just initially set to an appropriate value and then used as before, instead of changing the code a lot as below?

@jnothman jnothman changed the title [MRG] Fix MultinomialNB and BernoulliNB alpha=0 bug (continuation) [MRG+1] Fix MultinomialNB and BernoulliNB alpha=0 bug (continuation) Jun 16, 2017
@jnothman
Copy link
Member

jnothman commented Jun 16, 2017 via email

@jmschrei
Copy link
Member

It wouldn't be modifying attributes, it would be checking alpha when passed into init and setting it appropriately initially.

@herilalaina
Copy link
Contributor Author

Not sure to understand your point but if we move the checking at __init__, the following script won't work

clf = BernoulliNB(alpha=0.1)
clf.fit(X, y)
params={'alpha': 0.0}
clf.set_params(**params)
clf.fit(X, y) # Will crash

@jnothman jnothman added this to the 0.19 milestone Jun 17, 2017
@jmschrei
Copy link
Member

This looks good to me. Thanks for the contribution!

@jmschrei jmschrei merged commit b4b5de8 into scikit-learn:master Jun 19, 2017
@herilalaina herilalaina deleted the fix_NB_5814 branch June 19, 2017 19:36
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multinomial Bayes issue
4 participants