-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] FEA Add Categorical Naive Bayes #12569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There might be a problem, when the input array |
@timbicker After having thought a bit about it I think that setting the probability to 1 for an unseen category would make more sense than 0. Given a particular feature, an unseen category would then yield 1 for all classes and thus the feature would basically be ignored since other features could still determine the right class. If all categories of all feature are unseen the decision would be made automatically by the class probabilities. |
Yes, I agree. This makes more sense in my opinion. For the other case, that the category is unseen for a subset of the classes, we have the smoothing parameter. |
…to categorical_NB
If you uncomment, you will see that no error is raised. Instead, |
thanks, I'm unable to reproduce because I use a list. |
@qinhanmin2014 Thanks for your review. If you have no further comments, should @timbicker then change this PR to |
see the previous comment, please avoid calling check_array twice |
I am waiting for the other PR #14872 to be merged. Because otherwise, the tests of this PR would fail. Or should I fix it already? |
which tests? |
The output is as follows:
We assume that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to test other invariances: invariance under sample permutation, invariance under class label permutation up to ties, and maybe a test for how tie breaking is done to avoid regressions.
Still to review main code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM. 👀
@jnothman and @qinhanmin2014 thanks for your remarks |
Thanks and congratulations @timbicker and @FlorianWilhelm! |
@timbicker, great job! Thanks to everyone involved. |
can you please clarify , what you mean here and in under feature distributions per : discrete features that are categorically distributed. The categories of each feature are drawn from a categorical distribution. seems to be you mean that distributions are estimated from data? as mentioned in you never use some know pdf to fit data for each feature? do you have example specific for categorical features data with real categorical data with several important features and several not important
thanks a lot in advance |
@@ -49,6 +51,12 @@ def _joint_log_likelihood(self, X): | |||
predict_proba and predict_log_proba. | |||
""" | |||
|
|||
@abstractmethod | |||
def _check_X(self, X): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having this abstract method added is breaking downstream code.
Which is OK, but being the change hidden in this huge PR made it a bit more difficult than necessary to decipher/dig up if there is any relevant comments about the change, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we are under aware of impact on downstream projects. See #15992
I also think we neglect to consider people inheriting from our objects. Can you submit a PR to remove the abstractmethod for 0.22.1 perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #15996
Reference Issues/PRs
Impelements Categorical NB from #10856
What does this implement/fix? Explain your changes.
This implementation adds the NB classifier for categorical features.
Any other comments?