-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Fix MultinomialNB and BernoulliNB alpha=0 bug #7477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
th = 1e30 | ||
p = np.asarray(p) | ||
if (p > 1).any() or (p < 0).any(): | ||
raise ValueError('Input `p` must be within [0, 1] range!') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this happens for numerical reasons it does not explain how to make it work. Besides change the data.
why not clipping systematically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agramfort Thanks for your comment. It just happens when some elements contains inf
(or very close to overflow which leads to inf
), np.dot
or np.array.sum
may result in nan
. Some simple examples:
In[3]: np.inf - np.inf
Out[3]: nan
In[7]: 1e+308 + 1e+308
Out[7]: inf
In[8]: (1e+308 + 1e+308) - (1e+308 + 1e+308)
Out[8]: nan
Since we are calculating log probability, I think it is reasonable to use a large negative number like -1e30
to replace log(0) = -inf
when probability is 0.
why not clipping systematically?
I'm not sure I understand but do you mean creating wrappers for numpy to clip all values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think he means clipping p to be between 0 and 1
something went wrong with your rebase... |
@amueller I fixed the rebase problem. Also, I now adopted a simple solution: clip alpha to be never smaller than |
@@ -680,10 +684,21 @@ class MultinomialNB(BaseDiscreteNB): | |||
""" | |||
|
|||
def __init__(self, alpha=1.0, fit_prior=True, class_prior=None): | |||
self.alpha = alpha | |||
self._alpha = alpha |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this violates scikit-learn API. Please check the setting of alpha
in fit
. Also, only clip it when it is used, and don't change the value of self.alpha
.
@yl565 Are you stilll working on this? |
Go ahead if you want to work on it |
I think we need another contributor, or can you make the minor fixes and add a what's new entry, @yl565? |
Fixed via #9131 |
PR to #5814
This change is