-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
[MRG+1] FEA Add Categorical Naive Bayes #12569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
87 commits
Select commit
Hold shift + click to select a range
f69741f
initial implementation of categorical NB
timbicker 4acee46
edit docstrings
timbicker 5cc266a
fix np.unique return_counts for older np versions
timbicker ebb97b0
added attribute n_features_
timbicker 83390f8
fix tests for py 2.7
timbicker e9987d8
changed docstrings and adjusted to pep8
timbicker efc3701
added setuptools parse_version
timbicker 1c30b21
use dtype int64 for old numpy version and add handling of unseen cate…
timbicker 4c6d8ba
fix dtype error
timbicker 8b0f49c
fix doctest error and PEP
timbicker ce65046
improve documentation and user feedback
timbicker f7fec8a
include CategoricalNB to general naive bayes tests
timbicker 81d2782
refactor tests
timbicker c6e14d8
Merge branch 'master' into categorical_NB
timbicker 2163afa
fix pep 8
timbicker 3e40f83
Merge branch 'categorical_NB' of github.com:timbicker/scikit-learn in…
timbicker 3bb605d
fix docstring
timbicker dedffd2
change error message
timbicker 248a048
add tests for alpha and unseen categories
timbicker 14ff237
improve handle_unknown error handling
timbicker 5034636
implement remarks
timbicker 29bc282
add documentation
timbicker c1f25cf
improve documentation
timbicker 1a462a8
removed all nditer and float conversions
timbicker 8eff782
update documentation
timbicker ddf132e
merge master
timbicker a51aaef
add comments
timbicker faca471
fix flake 8 and py 3.6 string formatting
timbicker 9af9b2a
remove _count input check for catNB
timbicker 32b200b
add defaultdict
timbicker 53187b3
add seperate input checks
timbicker 2bd2628
fix flake 8
timbicker 17713a2
refactor argument order and function names
timbicker dd932dc
trigger tests
timbicker bd9311a
remove comment
timbicker f896491
Merge master
timbicker 652f613
cleanups to make it more concise
timbicker 7abd2f7
removed old_numpy and cleaned up
timbicker 5c15366
Merge remote-tracking branch 'upstream/master' into categorical_NB
timbicker b61334d
fix flake 8
timbicker 5fbd3ee
small improvements in nb_test for catnb
timbicker bca6358
improve partial fit tests
timbicker e01dd15
add sample_weight functionality
timbicker 9dfb630
add sample weight tests for catnb
timbicker ca16165
Apply suggestions from code review
timbicker 44c8b0c
add remarks
timbicker f3ff651
Merge branch 'categorical_NB' of github.com:timbicker/scikit-learn in…
timbicker 3e48575
add remarks
timbicker 1c91a63
Merge branch 'master' into categorical_NB
timbicker edf2d05
use old tests from master
timbicker 353f306
Merge branch 'master' into categorical_NB
timbicker ea8abe1
fix doctest
timbicker 2951f7c
merge tests from master
timbicker 8444c88
add remarks
timbicker 3f7b41d
fix flake8
timbicker da41ba9
speed up for loop in joint log likelihood
timbicker 34c89b2
add remarks
timbicker bf08c68
Merge branch 'master' into categorical_NB
timbicker cd0610f
add version log
timbicker fd0e44e
Merge branch 'master' into categorical_NB
timbicker 17d6843
merge master
timbicker bb6acd7
use OrdinalEncoder
timbicker 83c35c2
Merge branch 'master' into categorical_NB
timbicker b55cb4f
fix NotImplementedError and check_X force_all_finite
timbicker 45c9264
fix flake 8
timbicker 796dad7
assume OrdinalEncoder is applied in preprocessing
timbicker 50282bd
reimplement partial_fit
timbicker 6accf63
partial fit updates categories
timbicker dc5e5b5
use bincount instead of unique
timbicker 31d84dc
Merge branch 'master' into categorical_NB
timbicker 0a47dbb
fix docs
timbicker 7fc523c
fix remarks
timbicker b3e8c65
Merge branch 'master' into categorical_NB
timbicker 23e4929
Merge remote-tracking branch 'upstream/master' into test
qinhanmin2014 1c0455a
merge master
timbicker 695f79f
add remarks
timbicker f054786
Merge branch 'categorical_NB' of github.com:timbicker/scikit-learn in…
timbicker b36fe8c
fix deprecated
timbicker cf867f1
update random in docstring
timbicker b5a435b
Merge branch 'master' into categorical_NB
timbicker 51922cf
merge master
timbicker 283e96a
add check_X comments
timbicker 9ccb85c
Merge branch 'master' into categorical_NB
timbicker 73a5a0b
add test sample_weight for scale invariance
timbicker cc37c5a
test for non-negativity
timbicker e226b45
fix non negative check, update interface of _update_count_dims
timbicker 7d4b652
inline check_nonnegative
timbicker File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -224,6 +224,40 @@ It is advisable to evaluate both models, if time permits. | |
<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.5542>`_ | ||
3rd Conf. on Email and Anti-Spam (CEAS). | ||
|
||
.. _categorical_naive_bayes: | ||
|
||
Categorical Naive Bayes | ||
----------------------- | ||
|
||
:class:`CategoricalNB` implements the categorical naive Bayes | ||
algorithm for categorically distributed data. It assumes that each feature, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a reference for this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didn't find any. CategoricalNB can be seen as a generalization of Bernoulli naive Bayes if that is of any help. |
||
which is described by the index :math:`i`, has its own categorical | ||
distribution. | ||
|
||
For each feature :math:`i` in the training set :math:`X`, | ||
:class:`CategoricalNB` estimates a categorical distribution for each feature i | ||
of X conditioned on the class y. The index set of the samples is defined as | ||
:math:`J = \{ 1, \dots, m \}`, with :math:`m` as the number of samples. | ||
|
||
The probability of category :math:`t` in feature :math:`i` given class | ||
:math:`c` is estimated as: | ||
|
||
.. math:: | ||
|
||
P(x_i = t \mid y = c \: ;\, \alpha) = \frac{ N_{tic} + \alpha}{N_{c} + | ||
\alpha n_i}, | ||
|
||
where :math:`N_{tic} = |\{j \in J \mid x_{ij} = t, y_j = c\}|` is the number | ||
of times category :math:`t` appears in the samples :math:`x_{i}`, which belong | ||
to class :math:`c`, :math:`N_{c} = |\{ j \in J\mid y_j = c\}|` is the number | ||
of samples with class c, :math:`\alpha` is a smoothing parameter and | ||
:math:`n_i` is the number of available categories of feature :math:`i`. | ||
|
||
:class:`CategoricalNB` assumes that the sample matrix :math:`X` is encoded | ||
timbicker marked this conversation as resolved.
Show resolved
Hide resolved
|
||
(for instance with the help of :class:`OrdinalEncoder`) such that all | ||
categories for each feature :math:`i` are represented with numbers | ||
:math:`0, ..., n_i - 1` where :math:`n_i` is the number of available categories | ||
of feature :math:`i`. | ||
|
||
Out-of-core naive Bayes model fitting | ||
------------------------------------- | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.