Skip to content

[MRG] Fix MultinomialNB and BernoulliNB alpha=0 bug #7477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

yl565
Copy link
Contributor

@yl565 yl565 commented Sep 23, 2016

PR to #5814


This change is Reviewable

@yl565 yl565 changed the title Fix MultinomialNB and BernoulliNB alpha=0 bug [MRG] Fix MultinomialNB and BernoulliNB alpha=0 bug Sep 24, 2016
th = 1e30
p = np.asarray(p)
if (p > 1).any() or (p < 0).any():
raise ValueError('Input `p` must be within [0, 1] range!')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this happens for numerical reasons it does not explain how to make it work. Besides change the data.

why not clipping systematically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agramfort Thanks for your comment. It just happens when some elements contains inf (or very close to overflow which leads to inf), np.dot or np.array.sum may result in nan. Some simple examples:

In[3]: np.inf - np.inf
Out[3]: nan
In[7]: 1e+308 + 1e+308
Out[7]: inf
In[8]: (1e+308 + 1e+308) - (1e+308 + 1e+308)
Out[8]: nan

Since we are calculating log probability, I think it is reasonable to use a large negative number like -1e30 to replace log(0) = -inf when probability is 0.

why not clipping systematically?

I'm not sure I understand but do you mean creating wrappers for numpy to clip all values?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think he means clipping p to be between 0 and 1

@amueller
Copy link
Member

something went wrong with your rebase...

@yl565
Copy link
Contributor Author

yl565 commented Oct 24, 2016

@amueller I fixed the rebase problem. Also, I now adopted a simple solution: clip alpha to be never smaller than _ALPHA_MIN = 1e-10. This avoids the log(prob) = -inf when prob=0 problems and seems works well. Alternatively, I'm thinking maybe we can treat alpha=0 as a special case and uses a different numeric solution.

@@ -680,10 +684,21 @@ class MultinomialNB(BaseDiscreteNB):
"""

def __init__(self, alpha=1.0, fit_prior=True, class_prior=None):
self.alpha = alpha
self._alpha = alpha
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this violates scikit-learn API. Please check the setting of alpha in fit. Also, only clip it when it is used, and don't change the value of self.alpha.

@dalmia
Copy link
Contributor

dalmia commented Feb 18, 2017

@yl565 Are you stilll working on this?

@yl565
Copy link
Contributor Author

yl565 commented Feb 18, 2017

Go ahead if you want to work on it

@jnothman
Copy link
Member

I think we need another contributor, or can you make the minor fixes and add a what's new entry, @yl565?

@jmschrei
Copy link
Member

Fixed via #9131

@jmschrei jmschrei closed this Jun 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Easy Well-defined and straightforward way to resolve Waiting for Reviewer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants