Skip to content

[MRG] Adding variable force_alpha to classes in naive_bayes.py #18805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 39 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
282a7dd
Adding variable alphaCorrection to classes in naive_bayes.py.
arka204 Mar 22, 2020
d78e17b
Splitting few lines of code.
arka204 Mar 22, 2020
3b79637
Merge pull request #1 from scikit-learn/master
arka204 Apr 11, 2020
464dc37
Merge pull request #2 from scikit-learn/master
arka204 May 10, 2020
a4429bf
Fixing problems and adding tests.
arka204 May 21, 2020
cf35eb1
Updating naive_bayes.py.
arka204 May 21, 2020
15a658f
Merge pull request #3 from scikit-learn/master
arka204 May 22, 2020
ec786b3
Merge branch 'alpha-close-or-equal-0-update' into alpha-1
arka204 May 22, 2020
4606d85
Merge pull request #5 from arka204/alpha-1
arka204 May 22, 2020
f0debb6
Merge pull request #6 from arka204/alpha-close-or-equal-0-update
arka204 May 22, 2020
dcce4a8
Checkig warnings in tests.
arka204 May 31, 2020
81d5f32
Merge branch 'alpha-close-or-equal-0-update' of https://github.com/ar…
arka204 May 31, 2020
43dfda5
Merge pull request #8 from arka204/alpha-close-or-equal-0-update
arka204 Jun 8, 2020
be0ebfe
Update v0.24.rst
arka204 Jun 8, 2020
2968400
Merge pull request #10 from scikit-learn/master
arka204 Jun 8, 2020
7e1b649
Merge branch 'Proposition-for-BernoulliNB-and-MultinomialNB-when-alph…
arka204 Jun 8, 2020
5782e6f
Merge pull request #11 from arka204/master-copy
arka204 Jun 8, 2020
a7337d2
Merge remote-tracking branch 'upstream/master' into Proposition-for-B…
Nov 10, 2020
2d16091
Fix merge
Nov 10, 2020
728b842
Merge remote-tracking branch 'upstream/master' into 10772-force-alpha
hongshaoyang Dec 20, 2020
a8209e0
Move whatsnew
hongshaoyang Dec 20, 2020
44c50fd
Merge remote-tracking branch 'upstream/master' into 10772-force-alpha
hongshaoyang Dec 21, 2020
ded1f6e
Merge remote-tracking branch 'upstream/main' into 10772-force-alpha
hongshaoyang Jan 30, 2021
2a2d8f3
Apply suggestions from code review
hongshaoyang Feb 8, 2021
d8f784e
Remove extra line
hongshaoyang Feb 9, 2021
a3897f7
Flake8
hongshaoyang Feb 9, 2021
23c68dd
Apply suggestions from code review
hongshaoyang Feb 9, 2021
b1151d7
Merge remote-tracking branch 'upstream/main' into 10772-force-alpha
hongshaoyang May 29, 2021
1d01c6c
Fix merge
hongshaoyang May 29, 2021
aa1d8de
use assert_warns_message
hongshaoyang May 29, 2021
203af9e
Apply suggestions from code review
hongshaoyang Jun 9, 2021
cc4fda7
Merge remote-tracking branch 'upstream/main' into 10772-force-alpha
hongshaoyang Jun 9, 2021
91127bc
Fix wrong variable name
hongshaoyang Jun 9, 2021
c4d0736
Fix test to use "with pytest.warns" instead of assert_warns_message
hongshaoyang Jun 9, 2021
8964a16
Merge commit '0e7761cdc4f244adb4803f1a97f0a9fe4b365a99' into 10772-fo…
hongshaoyang Jun 23, 2021
e7a5f37
MAINT Adds target_version to black config (#20293)
thomasjpfan Jun 17, 2021
98c0c12
Black formatting
hongshaoyang Jun 23, 2021
2d9ab41
Merge remote-tracking branch 'upstream/main' into 10772-force-alpha
hongshaoyang Jun 23, 2021
16af708
Apply suggestions from code review
hongshaoyang Jun 23, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions doc/whats_new/v1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -462,6 +462,11 @@ Changelog

:mod:`sklearn.naive_bayes`
..........................
- |Fix| A new parameter `force_alpha` was added to :class:`BernoulliNB` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because v1.0 is already released, this entry should now be moved to v1.1.rst.

class:`MultinomialNB`, allowing user to set parameter alpha to a very
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class:`MultinomialNB`, allowing user to set parameter alpha to a very
:class:`MultinomialNB`, allowing user to set parameter alpha to a very

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to mention ComplementNB and CategoricalNB.

small number, greater or equal 0, which was earlier automatically changed
to `_ALPHA_MIN` instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change _ALPHA_MIN into its value to be more informative.

:pr:`16747`, :pr:`18805` by :user:`arka204` and :user:`hongshaoyang`.

- |Fix| The `fit` and `partial_fit` methods of the discrete naive Bayes
classifiers (:class:`naive_bayes.BernoulliNB`,
Expand Down
90 changes: 77 additions & 13 deletions sklearn/naive_bayes.py
Original file line number Diff line number Diff line change
Expand Up @@ -537,11 +537,18 @@ def _check_alpha(self):
"with shape [n_features]"
)
if np.min(self.alpha) < _ALPHA_MIN:
warnings.warn(
"alpha too small will result in numeric errors, "
"setting alpha = %.1e" % _ALPHA_MIN
)
return np.maximum(self.alpha, _ALPHA_MIN)
if self.force_alpha:
warnings.warn(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really useful to warn when the user specifically set the parameter force_alpha=True ? I would remove the warning and improve the docstring to mention potential numerical errors.

"alpha too small will result in numeric errors. "
"Proceeding with alpha = %.1e, as "
"force_alpha was set to True." % self.alpha
)
else:
warnings.warn(
"alpha too small will result in numeric errors, "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add something like "use force_alpha=True to keep alpha unchanged.".

"setting alpha = %.1e" % _ALPHA_MIN
)
return np.maximum(self.alpha, _ALPHA_MIN)
return self.alpha

def partial_fit(self, X, y, classes=None, sample_weight=None):
Expand Down Expand Up @@ -735,7 +742,14 @@ class MultinomialNB(_BaseDiscreteNB):
----------
alpha : float, default=1.0
Additive (Laplace/Lidstone) smoothing parameter
(0 for no smoothing).
(set alpha=0 and force_alpha=True, for no smoothing).

force_alpha : bool, default=False
If False and alpha is too close to 0, it will set alpha to
`_ALPHA_MIN`. If True, warn user about potential numeric errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change _ALPHA_MIN into its value to be more informative.

and proceed with alpha unchanged.

.. versionadded:: 1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. versionadded:: 1.0
.. versionadded:: 1.1


fit_prior : bool, default=True
Whether to learn class prior probabilities or not.
Expand Down Expand Up @@ -820,8 +834,11 @@ class MultinomialNB(_BaseDiscreteNB):
https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html
"""

def __init__(self, *, alpha=1.0, fit_prior=True, class_prior=None):
def __init__(
self, *, alpha=1.0, force_alpha=False, fit_prior=True, class_prior=None
):
self.alpha = alpha
self.force_alpha = force_alpha
self.fit_prior = fit_prior
self.class_prior = class_prior

Expand Down Expand Up @@ -862,7 +879,15 @@ class ComplementNB(_BaseDiscreteNB):
Parameters
----------
alpha : float, default=1.0
Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).
Additive (Laplace/Lidstone) smoothing parameter
(set alpha=0 and force_alpha=True, for no smoothing).

force_alpha : bool, default=False
If False and alpha is too close to 0, it will set alpha to
`_ALPHA_MIN`. If True, warn user about potential numeric errors
and proceed with alpha unchanged.

.. versionadded:: 1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. versionadded:: 1.0
.. versionadded:: 1.1


fit_prior : bool, default=True
Only used in edge case with a single class in the training set.
Expand Down Expand Up @@ -949,8 +974,17 @@ class ComplementNB(_BaseDiscreteNB):
https://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf
"""

def __init__(self, *, alpha=1.0, fit_prior=True, class_prior=None, norm=False):
def __init__(
self,
*,
alpha=1.0,
force_alpha=False,
fit_prior=True,
class_prior=None,
norm=False,
):
self.alpha = alpha
self.force_alpha = force_alpha
self.fit_prior = fit_prior
self.class_prior = class_prior
self.norm = norm
Expand Down Expand Up @@ -998,7 +1032,14 @@ class BernoulliNB(_BaseDiscreteNB):
----------
alpha : float, default=1.0
Additive (Laplace/Lidstone) smoothing parameter
(0 for no smoothing).
(set alpha=0 and force_alpha=True, for no smoothing).

force_alpha : bool, default=False
If False and alpha is too close to 0, it will set alpha to
`_ALPHA_MIN`. If True, warn user about potential numeric errors
and proceed with alpha unchanged.

.. versionadded:: 1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. versionadded:: 1.0
.. versionadded:: 1.1


binarize : float or None, default=0.0
Threshold for binarizing (mapping to booleans) of sample features.
Expand Down Expand Up @@ -1079,8 +1120,17 @@ class BernoulliNB(_BaseDiscreteNB):
naive Bayes -- Which naive Bayes? 3rd Conf. on Email and Anti-Spam (CEAS).
"""

def __init__(self, *, alpha=1.0, binarize=0.0, fit_prior=True, class_prior=None):
def __init__(
self,
*,
alpha=1.0,
force_alpha=False,
binarize=0.0,
fit_prior=True,
class_prior=None,
):
self.alpha = alpha
self.force_alpha = force_alpha
self.binarize = binarize
self.fit_prior = fit_prior
self.class_prior = class_prior
Expand Down Expand Up @@ -1144,7 +1194,14 @@ class CategoricalNB(_BaseDiscreteNB):
----------
alpha : float, default=1.0
Additive (Laplace/Lidstone) smoothing parameter
(0 for no smoothing).
(set alpha=0 and force_alpha=True, for no smoothing).

force_alpha : bool, default=False
If False and alpha is too close to 0, it will set alpha to
`_ALPHA_MIN`. If True, warn user about potential numeric errors
and proceed with alpha unchanged.

.. versionadded:: 1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. versionadded:: 1.0
.. versionadded:: 1.1


fit_prior : bool, default=True
Whether to learn class prior probabilities or not.
Expand Down Expand Up @@ -1221,9 +1278,16 @@ class CategoricalNB(_BaseDiscreteNB):
"""

def __init__(
self, *, alpha=1.0, fit_prior=True, class_prior=None, min_categories=None
self,
*,
alpha=1.0,
force_alpha=False,
fit_prior=True,
class_prior=None,
min_categories=None,
):
self.alpha = alpha
self.force_alpha = force_alpha
self.fit_prior = fit_prior
self.class_prior = class_prior
self.min_categories = min_categories
Expand Down
25 changes: 25 additions & 0 deletions sklearn/tests/test_naive_bayes.py
Original file line number Diff line number Diff line change
Expand Up @@ -897,6 +897,31 @@ def test_alpha():
m_nb.partial_fit(X, y, classes=[0, 1])


def test_check_alpha():
# Non-regression test for:
# https://github.com/scikit-learn/scikit-learn/issues/10772
# Test force_alpha if alpha < _ALPHA_MIN
_ALPHA_MIN = 1e-10 # const
msg1 = (
"alpha too small will result in numeric errors. "
"Proceeding with alpha = .+, as "
"force_alpha was set to True."
)
msg2 = (
"alpha too small will result in numeric errors, "
"setting alpha = %.1e" % _ALPHA_MIN
)
b = BernoulliNB(alpha=0, force_alpha=True)
with pytest.warns(UserWarning, match=msg1):
assert b._check_alpha() == 0
b = BernoulliNB(alpha=0, force_alpha=False)
with pytest.warns(UserWarning, match=msg2):
assert b._check_alpha() == _ALPHA_MIN
b = BernoulliNB(alpha=0)
with pytest.warns(UserWarning, match=msg2):
assert b._check_alpha() == _ALPHA_MIN


def test_alpha_vector():
X = np.array([[1, 0], [1, 1]])
y = np.array([0, 1])
Expand Down