FIX discrete Naive Bayes model fitting for degenerate single-class case #18925

dpoznik · 2020-11-27T03:06:14Z

When training multinomial naive Bayes models as part of a larger pipeline, it is possible for a degenerate case to arise, wherein there is just one class label. Prior to this change, it was possible to fit a single-class MultinomialNB model. One would expect that using such a model for prediction would deterministically yield the one class label. However, sometimes an IndexError would arise.

Ultimately, this traced to the fact that LabelBinarizer.transform returns an array of shape (n_samples, 1) when there are one or two classes, but naive_bayes._BaseDiscreteNB.fit and naive_bayes._BaseDiscreteNB.partial_fit (reasonably) assumed that if the return value had one column, that meant there were exactly two classes.

With this fix, the single-class case is handled differently from the two-class case, thereby ensuring the expected behavior and eliminating the source of the IndexError.

On this branch, all tests run via these three commands pass:

pytest sklearn/naive_bayes.py
pytest doc/modules/naive_bayes.rst
pytest sklearn/tests/test_naive_bayes.py

The last one includes a new regression test that fails on master.

dpoznik · 2020-11-27T23:30:09Z

sklearn/naive_bayes.py

@@ -552,7 +552,7 @@ def partial_fit(self, X, y, classes=None, sample_weight=None):
        if _check_partial_fit_first_call(self, classes):
            # This is the first call to partial_fit:
            # initialize various cumulative counters
-            n_effective_classes = len(classes) if len(classes) > 1 else 2


There is a small behavior change here when len(classes) == 1.
However, the docstring implies that len(classes) should be 2 for binary problems:

classes : array-like of shape (n_classes,), default=None List of all the classes that can possibly appear in the y vector.

Assuming this is true, then I believe the only behavior change is for the degenerate single-class case, as intended.

sklearn/tests/test_naive_bayes.py

glemaitre

A first pass. We will need to add an entry in the what's new v1.0.rst as a bug fix

sklearn/tests/test_naive_bayes.py

…-single-class-nb

dpoznik · 2020-12-18T01:16:32Z

We will need to add an entry in the what's new v1.0.rst as a bug fix

Cool, I've drafted this in 6e51074.

glemaitre

Apart of some style changes, LGTM.

doc/whats_new/v1.0.rst

sklearn/tests/test_naive_bayes.py

.gitignore

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

…-single-class-nb

sklearn/tests/test_naive_bayes.py

glemaitre · 2021-01-18T09:36:36Z

@thomasjpfan @lorentzenchr Would you like to have a second look

…list-classes Centralize lists of naive Bayes classes to streamline test parameterization

…-single-class-nb

lorentzenchr

@dpoznik Thanks for fixing this. You need to resolve merge conflicts. Beware that we've changed the default branch from master to main. Let me/us know if you need help.

sklearn/naive_bayes.py

sklearn/tests/test_naive_bayes.py

…ingle-class-nb

@lorentzenchr

As per @lorentzenchr in code review: "Pickability is tested in `check_estimators_pickle` (see `utils/estimator_checks.py`) in `tests/test_common.py`. I verified it by running `pytest sklearn/tests/test_common.py -v -k BernoulliNB` and so on."

dpoznik · 2021-01-23T17:27:23Z

Thanks for fixing this.

No problem; it was fun :)
Thanks for reviewing!

You need to resolve merge conflicts.

This should be all set with 9bbfb22.

Beware that we've changed the default branch from master to main.

Cool; thanks for the heads-up!

lorentzenchr

LGTM

lorentzenchr · 2021-01-23T20:11:39Z

@glemaitre Before merging, could you shortly skim over the last changes - just to be sure:smirk:

glemaitre · 2021-01-24T11:30:29Z

Thanks @dpoznik Everything seems good. Merging.

dpoznik added 2 commits November 26, 2020 19:34

Handle degenerate single-class case correctly in naive_bayes.py

6b4b5c7

Add regression test for degenerate single-class naive Bayes

95adfb2

github-actions bot added the module:naive_bayes label Nov 27, 2020

Remove partial_fit edits

b3d62fa

dpoznik force-pushed the handle-degenerate-single-class-nb branch from d3793a0 to b3d62fa Compare November 27, 2020 04:28

Test three remaining _BaseDiscreteNB subclasses

ae43019

dpoznik changed the title ~~[MRG] Fix degenerate single-class Naive Bayes model fitting~~ [WIP] Fix degenerate single-class Naive Bayes model fitting Nov 27, 2020

dpoznik added 3 commits November 27, 2020 17:28

Simplify handling of single-class case

abcd680

Fix degenerage single-class case in _BaseDiscreteNB.partial_fit

b1affb3

Update discrete Naive Bayes tests to cover all four types

3b1b0e1

dpoznik commented Nov 27, 2020

View reviewed changes

sklearn/tests/test_naive_bayes.py Show resolved Hide resolved

dpoznik changed the title ~~[WIP] Fix degenerate single-class Naive Bayes model fitting~~ [MRG] Fix naive Bayes model fitting for degenerate single-class case Nov 27, 2020

dpoznik force-pushed the handle-degenerate-single-class-nb branch 2 times, most recently from 131c483 to 8f3c089 Compare November 28, 2020 14:16

Extend test to cover more array attributes

1d10e36

dpoznik force-pushed the handle-degenerate-single-class-nb branch from 8f3c089 to 1d10e36 Compare November 28, 2020 14:22

Fix minor docstring typos in shapes of one-axis arrays

5907ed0

dpoznik changed the title ~~[MRG] Fix naive Bayes model fitting for degenerate single-class case~~ [MRG] Fix discrete Naive Bayes model fitting for degenerate single-class case Dec 6, 2020

dpoznik mentioned this pull request Dec 6, 2020

Discrete Naive Bayes classifiers crash unnecessarily on degenerate one-class data #18974

Closed

glemaitre self-requested a review December 17, 2020 22:17

glemaitre reviewed Dec 17, 2020

View reviewed changes

dpoznik added 2 commits December 17, 2020 19:04

Merge remote-tracking branch 'upstream/master' into handle-degenerate…

c053a4e

…-single-class-nb

Rename extant tests and update whats_new doc

6e51074

Parametrize test_discretenb_degenerate_one_class_case

313d74f

glemaitre approved these changes Dec 18, 2020

View reviewed changes

dpoznik and others added 2 commits December 18, 2020 11:06

Apply suggestions from code review

60f3a98

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Tweak some variable names and comments

4c5f8d2

glemaitre changed the title ~~[MRG] Fix discrete Naive Bayes model fitting for degenerate single-class case~~ FIX discrete Naive Bayes model fitting for degenerate single-class case Dec 18, 2020

dpoznik added 4 commits December 18, 2020 11:38

Tweak wording of comment and summary

013c505

Simplify test parametrization

74770b3

Change parameter name in test function

b07c09c

Merge remote-tracking branch 'upstream/master' into handle-degenerate…

3313337

…-single-class-nb

cmarmo added the Waiting for Reviewer label Jan 4, 2021

Merge branch 'master' into handle-degenerate-single-class-nb

c8db70c

dpoznik commented Jan 15, 2021

View reviewed changes

sklearn/tests/test_naive_bayes.py Outdated Show resolved Hide resolved

dpoznik added 2 commits January 14, 2021 18:45

Add class lists to simplify test parametrization

2c54815

Replace "cls" with "DiscreteNaiveBayes" or "NaiveBayes"

4db6f10

dpoznik and others added 2 commits January 18, 2021 07:53

Merge pull request #1 from dpoznik/handle-degenerate-single-class-nb-…

1535364

…list-classes Centralize lists of naive Bayes classes to streamline test parameterization

Merge remote-tracking branch 'upstream/master' into handle-degenerate…

61e6cb3

…-single-class-nb

Base automatically changed from master to main January 22, 2021 10:53

lorentzenchr reviewed Jan 23, 2021

View reviewed changes

sklearn/naive_bayes.py Outdated Show resolved Hide resolved

sklearn/tests/test_naive_bayes.py Outdated Show resolved Hide resolved

dpoznik added 4 commits January 23, 2021 08:56

Merge remote-tracking branch 'upstream/main' into handle-degenerate-s…

9bbfb22

…ingle-class-nb

Rename variable: n_effective_classes -> n_classes

945c632

Remove test_naive_bayes_pickle

a43f8e5

As per @lorentzenchr in code review: "Pickability is tested in `check_estimators_pickle` (see `utils/estimator_checks.py`) in `tests/test_common.py`. I verified it by running `pytest sklearn/tests/test_common.py -v -k BernoulliNB` and so on."

Remove unused variables

a789b95

lorentzenchr approved these changes Jan 23, 2021

View reviewed changes

glemaitre merged commit 6f32544 into scikit-learn:main Jan 24, 2021

dpoznik deleted the handle-degenerate-single-class-nb branch January 24, 2021 18:02

This was referenced Feb 22, 2021

MultinomialNB cannot handle single class without fitting prior #17926

Closed

[MRG] Fix exception when MultinomialNB is given data with only one class #19078

Closed

glemaitre mentioned this pull request Apr 22, 2021

Release 0.24.2 #19954

Merged

12 tasks

Uh oh!

FIX discrete Naive Bayes model fitting for degenerate single-class case #18925

FIX discrete Naive Bayes model fitting for degenerate single-class case #18925

Uh oh!

Conversation

dpoznik commented Nov 27, 2020 • edited by glemaitre Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dpoznik Nov 27, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dpoznik commented Dec 18, 2020

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre commented Jan 18, 2021

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dpoznik commented Jan 23, 2021

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

lorentzenchr commented Jan 23, 2021

Uh oh!

glemaitre commented Jan 24, 2021

Uh oh!

Uh oh!

dpoznik commented Nov 27, 2020 •

edited by glemaitre

Loading