Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
7019abe
Added general naive bayes
remykarem Jan 14, 2020
4a15a06
Update GeneralNB
remykarem Jan 19, 2020
ac80c3c
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Jan 19, 2020
b610fc8
Merge branch 'master' into general-naive-bayes
Jan 28, 2020
65f9ba9
Update docstring
Jan 28, 2020
a7e9b1e
Added tests
Jan 28, 2020
fd9c34b
Update tests
Jan 28, 2020
368a1ae
Added docs stub
Jan 29, 2020
5019c27
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
Jan 29, 2020
b633847
Update doc
Jan 29, 2020
34339fe
Update docs
Jan 29, 2020
5c2c65d
Merge branch 'general-naive-bayes' of https://github.com/remykarem/sc…
Jan 29, 2020
09d119e
Update function names
Jan 29, 2020
e01984c
Fixed formatting
Jan 29, 2020
48016e9
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Jan 31, 2020
1c975c1
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Feb 3, 2020
6737aa4
Merge branch 'general-naive-bayes' of https://github.com/remykarem/sc…
remykarem Feb 3, 2020
61b8a9f
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Feb 4, 2020
80595b7
[WIP] ColumnTransformer-like API
remykarem Feb 4, 2020
de36f45
Update generalnb
remykarem Feb 5, 2020
0d1ca54
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Feb 6, 2020
63ae994
Fixed bug
remykarem Feb 6, 2020
d710735
Minor fixes
remykarem Feb 6, 2020
c809120
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Feb 8, 2020
adc68a0
Added support for pandas df
remykarem Feb 9, 2020
7d9814b
Update docs
remykarem Feb 9, 2020
3bacdfa
Updated docstring
remykarem Feb 9, 2020
b660df3
Renamed variables
remykarem Feb 9, 2020
531fea8
Removed getter and setter
remykarem Feb 9, 2020
fa72685
Refactored variable names
remykarem Feb 9, 2020
045b389
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Feb 11, 2020
360cfda
Renamed variable names
remykarem Feb 11, 2020
7c11ea0
Moved _validate_callables into _validate_models
remykarem Feb 11, 2020
b067e07
Added temporary docstring for callables
remykarem Feb 11, 2020
6d510b0
Removed _validate_column_callables
remykarem Feb 11, 2020
8e9cc34
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Feb 11, 2020
42c602a
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Feb 13, 2020
d4fe6ab
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Feb 16, 2020
129f435
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Feb 16, 2020
48f3bad
Added docs [WIP]
remykarem Feb 16, 2020
be66c63
Minor fixes
remykarem Feb 17, 2020
f0e356b
[WIP] added tests
remykarem Feb 17, 2020
a5631fb
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Feb 18, 2020
f9b1dfb
Updated docs
remykarem Feb 19, 2020
a64329f
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Feb 19, 2020
eeaaf16
Added GeneralNB module to docs
remykarem Feb 19, 2020
496c069
Added code snippets to docs
remykarem Feb 19, 2020
b2ab2b5
Added more GeneralNB tests
remykarem Feb 19, 2020
78c656a
Update docstrings and raising ValueError
remykarem Feb 19, 2020
04eb2b4
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Feb 23, 2020
a910ed6
Updated docs
remykarem Feb 23, 2020
b554357
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Feb 23, 2020
ffc8d60
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Feb 24, 2020
f60a8c2
Update docs
remykarem Feb 24, 2020
024724f
Minor housekeeping
remykarem Feb 24, 2020
c8254fe
Update tests
remykarem Feb 24, 2020
d5308a6
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Feb 25, 2020
d7817cb
Perform _check_X before calculating likelihood
remykarem Feb 25, 2020
1d991c3
Cast to DataFrame
remykarem Feb 25, 2020
13dc706
Fix formatting
remykarem Feb 25, 2020
f174bcf
Update test
remykarem Feb 25, 2020
cc09b3f
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Feb 27, 2020
6fa37b1
Added FIXMEs
remykarem Mar 7, 2020
e84cc4a
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Mar 7, 2020
cd9f8ca
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Mar 18, 2020
0300f3d
Trigger CI/CD
remykarem Mar 18, 2020
58b8e31
Trigger CI/CD
remykarem Mar 18, 2020
12cfc20
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Mar 19, 2020
1ba9a81
No need to ensure model is internal
remykarem Mar 19, 2020
2428f36
Fix typo
remykarem Mar 19, 2020
d58505a
Update docs
remykarem Mar 19, 2020
3d38d71
Mention that models_ attribute is fitted
remykarem Mar 19, 2020
c3b68a0
Remove restriction to use sklearn's naive bayes
remykarem Mar 20, 2020
303b43b
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Mar 21, 2020
7d1b27e
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Mar 28, 2020
2cffb65
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem Mar 29, 2020
6e1b105
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
remykarem Apr 16, 2020
85040ff
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem May 4, 2020
b854b05
Remove attributes initialised as None
remykarem May 4, 2020
2c1cf1b
Add comment for `callable` check
remykarem May 4, 2020
927d401
Move prior checking to fit
remykarem May 4, 2020
1196267
Add class_prior and fit_prior to constructor
remykarem May 4, 2020
b249382
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem May 9, 2020
15b0876
Remove class_prior and fit_prior in _validate_models
remykarem May 10, 2020
3551542
Remove unused variables
remykarem May 10, 2020
e695c29
Remove checking of all_log_priors
remykarem May 10, 2020
d250bed
Add comments in fit() method
remykarem May 10, 2020
c6cfeff
Remove custom NotFittedError exception
remykarem May 10, 2020
47f8207
Refactor _validate_models and add comments
remykarem May 10, 2020
7f82f0f
Merge remote-tracking branch 'upstream/master' into general-naive-bayes
remykarem May 11, 2020
5c717ee
Remove self._is_fitted
remykarem May 11, 2020
fc2ff64
Reposition marker for positional args
remykarem May 11, 2020
23d8cdf
Add `remainder` parameter and docstring
remykarem May 11, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/modules/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1275,6 +1275,7 @@ Model validation
naive_bayes.ComplementNB
naive_bayes.GaussianNB
naive_bayes.MultinomialNB
naive_bayes.GeneralNB


.. _neighbors_ref:
Expand Down
110 changes: 106 additions & 4 deletions doc/modules/naive_bayes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -229,10 +229,10 @@ It is advisable to evaluate both models, if time permits.
Categorical Naive Bayes
-----------------------

:class:`CategoricalNB` implements the categorical naive Bayes
algorithm for categorically distributed data. It assumes that each feature,
which is described by the index :math:`i`, has its own categorical
distribution.
:class:`CategoricalNB` implements the categorical naive Bayes
algorithm for categorically distributed data. It assumes that each feature,
which is described by the index :math:`i`, has its own categorical
distribution.

For each feature :math:`i` in the training set :math:`X`,
:class:`CategoricalNB` estimates a categorical distribution for each feature i
Expand All @@ -259,6 +259,108 @@ categories for each feature :math:`i` are represented with numbers
:math:`0, ..., n_i - 1` where :math:`n_i` is the number of available categories
of feature :math:`i`.

.. _general_naive_bayes:

General Naive Bayes
-------------------

:class:`GeneralNB` implements multiple naive Bayes models across the
features in the dataset by assuming different distributions for different
features, while maintaining conditional independence between every pair of
features given the value of a class variable.

A practical use for this metaestimator is when encountering data with
both numerical and categorical features. For example, suppose our data
had 5 features where the first three are numerical and the rest categorical.
We then proceed to assume that numerical features follow the Gaussian
distribution and the categorical features follow the categorical
distribution, i.e.,

.. math::

X_1 \mid y \sim \text{Normal}(\mu_1,\sigma_1^2) \\
X_2 \mid y \sim \text{Normal}(\mu_2,\sigma_2^2) \\
X_3 \mid y \sim \text{Normal}(\mu_3,\sigma_3^2) \\
X_4 \mid y \sim \text{Categorical}(\textbf{p}_4) \\
X_5 \mid y \sim \text{Categorical}(\textbf{p}_5)

Let's see how we `GeneralNB` is used with this toy dataset. We first import
the libraries and prepare the data:

>>> import numpy as np
>>> import pandas as pd
>>> from sklearn.naive_bayes import GeneralNB, GaussianNB, CategoricalNB
>>>
>>> X = np.array([[1.5, 2.3, 5.7, 0, 1],
... [2.7, 3.8, 2.3, 1, 0],
... [1.7, 0.1, 4.5, 1, 0]])
>>> y = np.array([1, 0, 0])
>>> X_test = np.array([[1.5, 2.3, 5.7, 0, 1]])

In the `GeneralNB` constructor,
define a name (for easy access of the fitted estimators later)
and the corresponding columns for every naive Bayes model.
Below we defined two tuples, one for the `GaussianNB()` and
one for the `CategoricalNB()` model.
This manner of specification is similar to that of *transformers* in
:class:`ColumnTransformer <sklearn.compose.ColumnTransformer>`.

>>> clf = GeneralNB([
... ("gaussian", GaussianNB(), [0, 1, 2]),
... ("categorical", CategoricalNB(), [3, 4])
... ])
>>> clf.fit(X, y)
GeneralNB(models=[('gaussian', GaussianNB(...), [0, 1, 2]),
('categorical', CategoricalNB(...), [3, 4])])
>>> print(clf.predict(X_test))
[1]


Besides specifying a list of integers, you can also indicate column
names explicitly if the `X` and `y` data are pandas `DataFrame`s:

>>> X = pd.DataFrame(X)
>>> X.columns = ["a", "b", "c", "d", "e"]
>>> y = pd.DataFrame(y)
>>>
>>> clf = GeneralNB([
... ("gaussian", GaussianNB(), ["a", "b", "c"]),
... ("categorical", CategoricalNB(), ["d", "e"])
... ])
>>> clf.fit(X, y)
GeneralNB(models=[('gaussian', GaussianNB(...), ['a', 'b', 'c']),
('categorical', CategoricalNB(...), ['d', 'e'])])

Alternatively, you may also select DataFrame columns using
:func:`sklearn.compose.make_column_selector` as follows. Note that
X and y must be DataFrames.

>>> from sklearn.compose import make_column_selector
>>> clf = GeneralNB([
... ("gaussian", GaussianNB(),
... make_column_selector(pattern=r"[abc]")),
... ("categorical", CategoricalNB(),
... make_column_selector(pattern=r"[de]"))
... ])
>>> clf.fit(X, y)
GeneralNB(models=[('gaussian', GaussianNB(...), ...),
('categorical', CategoricalNB(...), ...)])
>>> print(clf.predict(X.iloc[:1,]))
[1]

Finally, you can access the attributes of the fitted estimators using
the :meth:`named_models_ <sklearn.naive_bayes.GeneralNB.named_models_>`
property and the previously defined names.
Below we obtain the `class_count_` attribute from the fitted
categorical distribution, where `"categorical"` comes from the previously
defined `model` parameter in the constructor.

>>> clf.named_models_.categorical.class_count_
array([2., 1.])

Apart from these two naive Bayes models, you may also use other combinations
of naive Bayes models found on this page to fit your dataset.

Out-of-core naive Bayes model fitting
-------------------------------------

Expand Down
Loading