Skip to content

ENH An alternative to ColumnwiseNB (aka GeneralNB) #29601

@msamsami

Description

@msamsami

Describe the workflow you want to enable

There is an ongoing discussion on #22574 about introducing a new estimator named ColumnwiseNB, which aims to handle different types of features by applying different Naive Bayes models column-wise. This approach is promising for datasets that contain a mix of categorical, binary, and continuous variables, each of which might require a different probabilistic approach for effective classification.

from sklearn.naive_bayes import BernoulliNB, GaussianNB, CategoricalNB

clf = ColumnwiseNB(nb_estimators=[('gnb1', GaussianNB(), [0, 1]),
                                  ('bnb2', BernoulliNB(), [2]),
                                  ('cnb1', CategoricalNB(), [3, 4])])
clf.fit(X_train, y_train)
clf.predict(X_test)

Describe your proposed solution

While scikit-learn is considering the ColumnwiseNB as a potential addition, I've developed a similar feature for a while called GeneralNB in the wnb Python package. This class also supports different distributions for each feature, providing flexibility in handling a variety of data types within a Naive Bayes framework. I would like to introduce the community to this already-implemented solution to gather feedback, comments, and suggestions. Understanding whether GeneralNB could serve as a good alternative or complementary solution to ColumnwiseNB could be beneficial for both scikit-learn developers and users looking for advanced Naive Bayes functionalities.

from wnb import GeneralNB, Distribution as D

gnb = GeneralNB(
    distributions=[D.NORMAL, D.NORMAL, D.BERNOULLI, D.CATEGORICAL, D.CATEGORICAL])
gnb.fit(X_train, y_train)
gnb.predict(X_test)

This solution fully adheres to scikit-learn's API and supports the following continuous and discrete distributions at the moment of writing this issue:

  • Normal
  • Lognormal
  • Exponential
  • Uniform
  • Pareto
  • Gamma
  • Beta
  • Chi-squared
  • T
  • Rayleigh
  • Bernoulli
  • Categorical
  • Geometric
  • Poisson

I encourage community feedback on this implementation and am open to collaborating to integrate similar functionality into scikit-learn if deemed beneficial.

Describe alternatives you've considered, if relevant

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions