Multinomial Bayes issue

Given the digits dataset (available in sklearn.datasets), we split it into train and test set.
We fit a MultinomialNB classifier on the train set, and generate predictions on that same train
set. When this is done without smoothing, the classification performance is rather low.
This is contrary to expectations, as the classifier has already seen all possible feature values.
So not including smoothing shouldn't really make such a big difference. 

It seems the BernoulliNB class from sklearn also has this problem.

My inefficient but straightforward implementation performs expectedly well on the training set.
Code to reproduce the issue with some more tests is available at http://pastebin.com/2hsrA8xL
I hope the issue is not caused by some trivial implementation detail I've overlooked.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Multinomial Bayes issue #5814

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Multinomial Bayes issue #5814

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions