Skip to content

[WIP] EHN: Implementation of BalancedRandomForestClassifier #459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Sep 6, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
b3fdd85
MAINT remove _named_check
glemaitre Dec 1, 2017
a87cb7f
udpate py.test to pytest
glemaitre Dec 1, 2017
c309a1c
MAINT remove nose occurences
glemaitre Dec 1, 2017
2d00939
iter
glemaitre Aug 26, 2018
642e706
added file
glemaitre Aug 26, 2018
4590120
FIX documentation and sampling
glemaitre Aug 26, 2018
a63b417
Merge remote-tracking branch 'origin/master' into is/456
glemaitre Aug 29, 2018
d3f4b3f
Merge remote-tracking branch 'origin/master' into is/371
glemaitre Sep 3, 2018
3266717
TST: add balanced cascade test
glemaitre Sep 3, 2018
3f410ab
iter
glemaitre Sep 3, 2018
7b4336b
TST: parametrize common test
glemaitre Sep 3, 2018
39f159a
TST: rework test check_estimator
glemaitre Sep 3, 2018
d03992c
PEP8
glemaitre Sep 3, 2018
a400646
FIX: solve issue conda install appveyor
glemaitre Sep 3, 2018
0b7dda5
MAINT: use rc from pypi
glemaitre Sep 3, 2018
9ad823a
iter
glemaitre Sep 3, 2018
8bca9dd
Merge remote-tracking branch 'glemaitre/is/371' into is/456
glemaitre Sep 3, 2018
008c73d
upgrade numpy and scipy on appveyor
glemaitre Sep 4, 2018
2f3e7c9
MAINT upgrade python
glemaitre Sep 4, 2018
ad69651
iter
glemaitre Sep 4, 2018
ced9db9
FIX: docstring
glemaitre Sep 4, 2018
3f55200
Merge remote-tracking branch 'glemaitre/is/371' into is/456
glemaitre Sep 4, 2018
e712709
solve issue powershell
glemaitre Sep 4, 2018
0d951b9
iter
glemaitre Sep 4, 2018
fabcca1
tensorflow not available in python 3.6
glemaitre Sep 4, 2018
5ba8b8e
MAINT update requirement to 0.20
glemaitre Sep 4, 2018
da9b397
Merge remote-tracking branch 'glemaitre/is/371' into is/456
glemaitre Sep 4, 2018
b19b982
iter
glemaitre Sep 4, 2018
e4b1c2b
Merge remote-tracking branch 'origin/master' into is/456
glemaitre Sep 4, 2018
615edcb
iter
glemaitre Sep 5, 2018
ff26448
MAINT: cleanup deprecation warning in tests and source code (#466)
glemaitre Sep 5, 2018
54c2baf
Merge remote-tracking branch 'origin/master' into is/456
glemaitre Sep 5, 2018
c619eb5
TST: add test for forest
glemaitre Sep 5, 2018
11a2e1d
iter
glemaitre Sep 5, 2018
308e82f
Merge remote-tracking branch 'origin/master' into is/456
glemaitre Sep 5, 2018
d65127c
TST: add test for the BalancedRandomForestClassifier
glemaitre Sep 6, 2018
97672a1
FIX: put back the deprecated parameters
glemaitre Sep 6, 2018
4362550
DOC: update the example for the different ensemble methods
glemaitre Sep 6, 2018
a21418b
PEP8
glemaitre Sep 6, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -159,9 +159,11 @@ Below is a list of the methods currently implemented in this module.
1. SMOTE + Tomek links [12]_
2. SMOTE + ENN [11]_

* Ensemble sampling
* Ensemble classifier using samplers internally
1. EasyEnsemble [13]_
2. BalanceCascade [13]_
3. Balanced Random Forest [16]_
4. Balanced Bagging

The different algorithms are presented in the sphinx-gallery_.

Expand Down Expand Up @@ -200,3 +202,5 @@ References:
.. [14] : I. Tomek, “An experiment with the edited nearest-neighbor rule,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 6(6), pp. 448-452, 1976. [`bib <references.bib#L158>`_]

.. [15] : H. He, Y. Bai, E. A. Garcia, S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” In Proceedings of the 5th IEEE International Joint Conference on Neural Networks, pp. 1322-1328, 2008. [`pdf <https://pdfs.semanticscholar.org/4823/4756b7cf798bfeb47328f7c5d597fd4838c2.pdf>`_] [`bib <references.bib#L62>`_]

.. [16] : C. Chao, A. Liaw, and L. Breiman. "Using random forest to learn imbalanced data." University of California, Berkeley 110 (2004): 1-12.
1 change: 1 addition & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ Prototype selection

ensemble.BalanceCascade
ensemble.BalancedBaggingClassifier
ensemble.BalancedRandomForestClassifier
ensemble.EasyEnsemble
ensemble.EasyEnsembleClassifier

Expand Down
33 changes: 18 additions & 15 deletions doc/ensemble.rst
Original file line number Diff line number Diff line change
Expand Up @@ -116,24 +116,24 @@ random under-sampler::
[ 0, 55, 4],
[ 42, 46, 1091]])

It also possible to turn a balanced bagging classifier into a balanced random
forest using a decision tree classifier and setting the parameter
``max_features='auto'``. It allows to randomly select a subset of features for
each tree::

>>> brf = BalancedBaggingClassifier(
... base_estimator=DecisionTreeClassifier(max_features='auto'),
... random_state=0)
:class:`BalancedRandomForestClassifier` is another ensemble method in which
each tree of the forest will be provided a balanced boostrap sample. This class
provides all functionality of the
:class:`sklearn.ensemble.RandomForestClassifier` and notably the
`feature_importances_` attributes::


>>> from imblearn.ensemble import BalancedRandomForestClassifier
>>> brf = BalancedRandomForestClassifier(n_estimators=10, random_state=0)
>>> brf.fit(X_train, y_train) # doctest: +ELLIPSIS
BalancedBaggingClassifier(...)
BalancedRandomForestClassifier(...)
>>> y_pred = brf.predict(X_test)
>>> confusion_matrix(y_test, y_pred)
array([[ 9, 1, 2],
[ 0, 54, 5],
[ 31, 34, 1114]])

See
:ref:`sphx_glr_auto_examples_ensemble_plot_comparison_bagging_classifier.py`.
[ 3, 54, 2],
[ 113, 47, 1019]])
>>> brf.feature_importances_
array([ 0.63501243, 0.36498757])

A specific method which uses ``AdaBoost`` as learners in the bagging
classifier is called EasyEnsemble. The :class:`EasyEnsembleClassifier` allows
Expand All @@ -149,4 +149,7 @@ the ensemble as::
>>> confusion_matrix(y_test, y_pred)
array([[ 9, 1, 2],
[ 5, 52, 2],
[252, 45, 882]])
[252, 45, 882]])

See
:ref:`sphx_glr_auto_examples_ensemble_plot_comparison_ensemble_classifier.py`.
4 changes: 4 additions & 0 deletions doc/whats_new/v0.0.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,10 @@ New features
AdaBoost classifier trained on balanced bootstrap samples.
:issue:`455` by :user:`Guillaume Lemaitre <glemaitre>`.

- Add :class:`imblearn.ensemble.BalancedRandomForestClassifier` which balanced
each bootstrap provided to each tree of the forest.
:issue:`459` by :user:`Guillaume Lemaitre <glemaitre>`.

Enhancement
...........

Expand Down
124 changes: 0 additions & 124 deletions examples/ensemble/plot_comparison_bagging_classifier.py

This file was deleted.

Loading