Skip to content

Commit 4e9f97d

Browse files
timbickerjnothman
authored andcommitted
FEA Add Categorical Naive Bayes (#12569)
1 parent f78ce00 commit 4e9f97d

File tree

7 files changed

+420
-49
lines changed

7 files changed

+420
-49
lines changed

doc/modules/classes.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1214,9 +1214,10 @@ Model validation
12141214
:template: class.rst
12151215

12161216
naive_bayes.BernoulliNB
1217+
naive_bayes.CategoricalNB
1218+
naive_bayes.ComplementNB
12171219
naive_bayes.GaussianNB
12181220
naive_bayes.MultinomialNB
1219-
naive_bayes.ComplementNB
12201221

12211222

12221223
.. _neighbors_ref:

doc/modules/naive_bayes.rst

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -224,6 +224,40 @@ It is advisable to evaluate both models, if time permits.
224224
<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.5542>`_
225225
3rd Conf. on Email and Anti-Spam (CEAS).
226226

227+
.. _categorical_naive_bayes:
228+
229+
Categorical Naive Bayes
230+
-----------------------
231+
232+
:class:`CategoricalNB` implements the categorical naive Bayes
233+
algorithm for categorically distributed data. It assumes that each feature,
234+
which is described by the index :math:`i`, has its own categorical
235+
distribution.
236+
237+
For each feature :math:`i` in the training set :math:`X`,
238+
:class:`CategoricalNB` estimates a categorical distribution for each feature i
239+
of X conditioned on the class y. The index set of the samples is defined as
240+
:math:`J = \{ 1, \dots, m \}`, with :math:`m` as the number of samples.
241+
242+
The probability of category :math:`t` in feature :math:`i` given class
243+
:math:`c` is estimated as:
244+
245+
.. math::
246+
247+
P(x_i = t \mid y = c \: ;\, \alpha) = \frac{ N_{tic} + \alpha}{N_{c} +
248+
\alpha n_i},
249+
250+
where :math:`N_{tic} = |\{j \in J \mid x_{ij} = t, y_j = c\}|` is the number
251+
of times category :math:`t` appears in the samples :math:`x_{i}`, which belong
252+
to class :math:`c`, :math:`N_{c} = |\{ j \in J\mid y_j = c\}|` is the number
253+
of samples with class c, :math:`\alpha` is a smoothing parameter and
254+
:math:`n_i` is the number of available categories of feature :math:`i`.
255+
256+
:class:`CategoricalNB` assumes that the sample matrix :math:`X` is encoded
257+
(for instance with the help of :class:`OrdinalEncoder`) such that all
258+
categories for each feature :math:`i` are represented with numbers
259+
:math:`0, ..., n_i - 1` where :math:`n_i` is the number of available categories
260+
of feature :math:`i`.
227261

228262
Out-of-core naive Bayes model fitting
229263
-------------------------------------

doc/whats_new/v0.22.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -434,6 +434,14 @@ Changelog
434434
- |Fix| :class:`multioutput.MultiOutputClassifier` now has attribute
435435
``classes_``. :pr:`14629` by :user:`Agamemnon Krasoulis <agamemnonc>`.
436436

437+
:mod:`sklearn.naive_bayes`
438+
...............................
439+
440+
- |MajorFeature| Added :class:`naive_bayes.CategoricalNB` that implements the
441+
Categorical Naive Bayes classifier.
442+
:pr:`12569` by :user:`Tim Bicker <timbicker>` and
443+
:user:`Florian Wilhelm <FlorianWilhelm>`.
444+
437445
:mod:`sklearn.neighbors`
438446
........................
439447

0 commit comments

Comments
 (0)