-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Multinomial Bayes issue #5814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
please send a PR so the diff is readable
|
Might be a silly question but where do i send it :)? The scikit-learn-issues mailing list? |
I mean open a pull request (PR) on github.
|
Since it's my first time submitting an error report I'm completely lost :( It says here that PR-s are used to tell others about changes I pushed to a repository. I didn't push anything O_o, nor do I want to. The demo script I submitted (via pastebin link) doesn't fix the issue, or change anything in sklearn code. It's just there to reproduce the issue, so someone who knows the sklearn code can debug it more easily. Perhaps I'm misunderstanding something. |
It seems when Given Compare
and
|
We should probably throw an error when that happens if that creates issues. |
@amueller Please see my PR to this issue. |
Fixed via #9131 |
…9131) * Fix #5814 * Fix pep8 in naive_bayes.py:716 * Fix sparse matrix incompatibility * Fix python 2.7 problem in test_naive_bayes * Make sure the values are probabilities before log transform * Improve docstring of `_safe_logprob` * Clip alpha solution * Clip alpha solution * Clip alpha in fit and partial_fit * Add what's new entry * Add test * Remove .project * Replace assert method * Update what's new * Format float into %.1e * Update ValueError msg
…cikit-learn#9131) * Fix scikit-learn#5814 * Fix pep8 in naive_bayes.py:716 * Fix sparse matrix incompatibility * Fix python 2.7 problem in test_naive_bayes * Make sure the values are probabilities before log transform * Improve docstring of `_safe_logprob` * Clip alpha solution * Clip alpha solution * Clip alpha in fit and partial_fit * Add what's new entry * Add test * Remove .project * Replace assert method * Update what's new * Format float into %.1e * Update ValueError msg
…cikit-learn#9131) * Fix scikit-learn#5814 * Fix pep8 in naive_bayes.py:716 * Fix sparse matrix incompatibility * Fix python 2.7 problem in test_naive_bayes * Make sure the values are probabilities before log transform * Improve docstring of `_safe_logprob` * Clip alpha solution * Clip alpha solution * Clip alpha in fit and partial_fit * Add what's new entry * Add test * Remove .project * Replace assert method * Update what's new * Format float into %.1e * Update ValueError msg
…cikit-learn#9131) * Fix scikit-learn#5814 * Fix pep8 in naive_bayes.py:716 * Fix sparse matrix incompatibility * Fix python 2.7 problem in test_naive_bayes * Make sure the values are probabilities before log transform * Improve docstring of `_safe_logprob` * Clip alpha solution * Clip alpha solution * Clip alpha in fit and partial_fit * Add what's new entry * Add test * Remove .project * Replace assert method * Update what's new * Format float into %.1e * Update ValueError msg
…cikit-learn#9131) * Fix scikit-learn#5814 * Fix pep8 in naive_bayes.py:716 * Fix sparse matrix incompatibility * Fix python 2.7 problem in test_naive_bayes * Make sure the values are probabilities before log transform * Improve docstring of `_safe_logprob` * Clip alpha solution * Clip alpha solution * Clip alpha in fit and partial_fit * Add what's new entry * Add test * Remove .project * Replace assert method * Update what's new * Format float into %.1e * Update ValueError msg
…cikit-learn#9131) * Fix scikit-learn#5814 * Fix pep8 in naive_bayes.py:716 * Fix sparse matrix incompatibility * Fix python 2.7 problem in test_naive_bayes * Make sure the values are probabilities before log transform * Improve docstring of `_safe_logprob` * Clip alpha solution * Clip alpha solution * Clip alpha in fit and partial_fit * Add what's new entry * Add test * Remove .project * Replace assert method * Update what's new * Format float into %.1e * Update ValueError msg
…cikit-learn#9131) * Fix scikit-learn#5814 * Fix pep8 in naive_bayes.py:716 * Fix sparse matrix incompatibility * Fix python 2.7 problem in test_naive_bayes * Make sure the values are probabilities before log transform * Improve docstring of `_safe_logprob` * Clip alpha solution * Clip alpha solution * Clip alpha in fit and partial_fit * Add what's new entry * Add test * Remove .project * Replace assert method * Update what's new * Format float into %.1e * Update ValueError msg
…cikit-learn#9131) * Fix scikit-learn#5814 * Fix pep8 in naive_bayes.py:716 * Fix sparse matrix incompatibility * Fix python 2.7 problem in test_naive_bayes * Make sure the values are probabilities before log transform * Improve docstring of `_safe_logprob` * Clip alpha solution * Clip alpha solution * Clip alpha in fit and partial_fit * Add what's new entry * Add test * Remove .project * Replace assert method * Update what's new * Format float into %.1e * Update ValueError msg
Given the digits dataset (available in sklearn.datasets), we split it into train and test set.
We fit a MultinomialNB classifier on the train set, and generate predictions on that same train
set. When this is done without smoothing, the classification performance is rather low.
This is contrary to expectations, as the classifier has already seen all possible feature values.
So not including smoothing shouldn't really make such a big difference.
It seems the BernoulliNB class from sklearn also has this problem.
My inefficient but straightforward implementation performs expectedly well on the training set.
Code to reproduce the issue with some more tests is available at http://pastebin.com/2hsrA8xL
I hope the issue is not caused by some trivial implementation detail I've overlooked.
The text was updated successfully, but these errors were encountered: