Skip to content

Multinomial Bayes issue #5814

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mladenk42 opened this issue Nov 13, 2015 · 8 comments · Fixed by #9131
Closed

Multinomial Bayes issue #5814

mladenk42 opened this issue Nov 13, 2015 · 8 comments · Fixed by #9131
Labels
Milestone

Comments

@mladenk42
Copy link

Given the digits dataset (available in sklearn.datasets), we split it into train and test set.
We fit a MultinomialNB classifier on the train set, and generate predictions on that same train
set. When this is done without smoothing, the classification performance is rather low.
This is contrary to expectations, as the classifier has already seen all possible feature values.
So not including smoothing shouldn't really make such a big difference.

It seems the BernoulliNB class from sklearn also has this problem.

My inefficient but straightforward implementation performs expectedly well on the training set.
Code to reproduce the issue with some more tests is available at http://pastebin.com/2hsrA8xL
I hope the issue is not caused by some trivial implementation detail I've overlooked.

@agramfort
Copy link
Member

agramfort commented Nov 14, 2015 via email

@mladenk42
Copy link
Author

Might be a silly question but where do i send it :)? The scikit-learn-issues mailing list?
Thanks.

@agramfort
Copy link
Member

agramfort commented Nov 14, 2015 via email

@mladenk42
Copy link
Author

Since it's my first time submitting an error report I'm completely lost :(

It says here that PR-s are used to tell others about changes I pushed to a repository. I didn't push anything O_o, nor do I want to. The demo script I submitted (via pastebin link) doesn't fix the issue, or change anything in sklearn code. It's just there to reproduce the issue, so someone who knows the sklearn code can debug it more easily. Perhaps I'm misunderstanding something.

@absolutelyNoWarranty
Copy link
Contributor

It seems when alpha=0.0 and the data has features which never change, feature_log_prob_ has -inf which causes the calculations down the line to become all nan.

Given
X = np.array([[1, 0],[1, 1]])
y = np.array([0, 1])

Compare

nb = BernoulliNB(alpha=0.)
nb.fit(X, y)

nb.predict_proba(X)
Out[135]:
array([[ nan,  nan],
       [ nan,  nan]])

nb.feature_log_prob_
Out[139]:
array([[  0., -inf],
       [  0.,   0.]])

and

nb = BernoulliNB(alpha=1e-15)
nb.fit(X, y)
nb.predict_proba(X)

Out[136]:
array([[  1.0000e+00,   8.8818e-16],
       [  1.0000e-15,   1.0000e+00]])

@amueller amueller added the Bug label Sep 14, 2016
@amueller
Copy link
Member

We should probably throw an error when that happens if that creates issues.

@amueller amueller added this to the 0.19 milestone Sep 14, 2016
yl565 added a commit to yl565/scikit-learn that referenced this issue Sep 23, 2016
@yl565
Copy link
Contributor

yl565 commented Sep 24, 2016

@amueller Please see my PR to this issue.

yl565 added a commit to yl565/scikit-learn that referenced this issue Oct 17, 2016
herilalaina pushed a commit to herilalaina/scikit-learn that referenced this issue Jun 15, 2017
@jmschrei
Copy link
Member

Fixed via #9131

jmschrei pushed a commit that referenced this issue Jun 19, 2017
…9131)

* Fix #5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
dmohns pushed a commit to dmohns/scikit-learn that referenced this issue Aug 7, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
dmohns pushed a commit to dmohns/scikit-learn that referenced this issue Aug 7, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
NelleV pushed a commit to NelleV/scikit-learn that referenced this issue Aug 11, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
paulha pushed a commit to paulha/scikit-learn that referenced this issue Aug 19, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this issue Aug 29, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this issue Nov 15, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this issue Dec 18, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants