Description
Given the digits dataset (available in sklearn.datasets), we split it into train and test set.
We fit a MultinomialNB classifier on the train set, and generate predictions on that same train
set. When this is done without smoothing, the classification performance is rather low.
This is contrary to expectations, as the classifier has already seen all possible feature values.
So not including smoothing shouldn't really make such a big difference.
It seems the BernoulliNB class from sklearn also has this problem.
My inefficient but straightforward implementation performs expectedly well on the training set.
Code to reproduce the issue with some more tests is available at http://pastebin.com/2hsrA8xL
I hope the issue is not caused by some trivial implementation detail I've overlooked.