Skip to content

[MRG] Fix exception when MultinomialNB is given data with only one class #19078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

khazit
Copy link

@khazit khazit commented Dec 29, 2020

Reference Issues/PRs

Fixes #17926

What does this implement/fix? Explain your changes.

The following snippet raises an exception:

import sklearn
from sklearn.naive_bayes import MultinomialNB

X = [(0.072777, 0.334995),
     (0.857577, 0.977991),
     (0.310364, 0.230206),
     (0.75821 , 0.600593),
     (0.883202, 0.066408)]

y = [0, 0, 0, 0, 0]

clf = MultinomialNB(fit_prior=False)
clf.fit(X, y)
clf.predict(X)

IndexError: index 1 is out of bounds for axis 0 with size 1

This happens because _joint_log_likelihood() outputs a (n_examples, 2) matrix, even though there is only one class.
The correct size should be (n_examples, 1) with +inf values.

Any other comments?

There is a lot of weird behaviors when a MNB is given data with only one class, depending on the fit_prior argument.
Example:

X = [(0.072777, 0.334995),
     (0.857577, 0.977991),
     (0.310364, 0.230206),
     (0.75821 , 0.600593),
     (0.883202, 0.066408)]

y = [2, 2, 2, 2, 2]

clf = MultinomialNB(fit_prior=False)
clf.fit(X, y)

print(clf.class_count_)
print(clf.class_log_prior_)
print(clf.classes_)

Outputs:

[5. 0.]
[-0.]
[2]

Whereas

clf = MultinomialNB(fit_prior=True)
clf.fit(X, y)
print(clf.class_count_)
print(clf.class_log_prior_)
print(clf.classes_)

Outputs:

[5. 0.]
[  0. -inf]
[2]

khazit and others added 3 commits December 29, 2020 12:52
Co-authored-by: Joran Marie <joran.mr@icloud.com>
Co-authored-by: Zoe Abecassis <zoe.abecassis@insa-rouen.fr>"
Co-authored-by: Joran Marie <joran.mr@icloud.com>
Co-authored-by: zabecassis <zoe.abecassis@insa-rouen.fr>
@khazit khazit changed the title Fix exception when MultinomialNB is given data with only one class [MGR] Fix exception when MultinomialNB is given data with only one class Dec 29, 2020
@khazit khazit changed the title [MGR] Fix exception when MultinomialNB is given data with only one class [MRG] Fix exception when MultinomialNB is given data with only one class Dec 29, 2020
Base automatically changed from master to main January 22, 2021 10:53
@thomasjpfan
Copy link
Member

This has been fixed in #18925 and will be included in the next major release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MultinomialNB cannot handle single class without fitting prior
3 participants