Skip to content

DOC add comments regarding to make a balanced random forest from a BalancedBaggingClassifier #372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
glemaitre opened this issue Nov 22, 2017 · 14 comments · Fixed by #373
Closed

Comments

@glemaitre
Copy link
Member

I think that we could add note mentioning that we can achieve a balanced random forest classifier by setting max_features='auto' of the decision tree. I don't think that we should implement a new estimator since scikit-learn is going to do it.

@glemaitre
Copy link
Member Author

@chkoar WDYT?

@chkoar
Copy link
Member

chkoar commented Nov 22, 2017

I don't have a strong opinion on this. I believe that a robust predictors/ensemble methods module could bring traction to the package. For that reason I would may implement BRF even as a shortcut.

@glemaitre
Copy link
Member Author

glemaitre commented Nov 22, 2017 via email

@chkoar
Copy link
Member

chkoar commented Nov 27, 2017

@glemaitre is a PR stalled in scikit-learn where we could contribute to be finished?

@glemaitre
Copy link
Member Author

it is already merged in #373

@chkoar
Copy link
Member

chkoar commented Nov 27, 2017

I asked because you said .

In that regards this why scikit learn has a PR there.

@glemaitre
Copy link
Member Author

glemaitre commented Nov 27, 2017 via email

@chkoar
Copy link
Member

chkoar commented Nov 27, 2017

which one?

@glemaitre
Copy link
Member Author

@chkoar
Copy link
Member

chkoar commented Dec 26, 2017

We wanted also to balanced at each node instead of tree to see the difference.

@glemaitre in each tree? Does this task need modification in the Cython level?

@glemaitre
Copy link
Member Author

glemaitre commented Dec 26, 2017 via email

@potash
Copy link

potash commented Jan 29, 2018

@chkoar, @glemaitre I'm not familiar with imblearn but in sklearn you can balance each tree simply by changing the sample_indices that get passed. This is how I implemented it in scikit-learn/scikit-learn#8732.

So does BalancedBaggingClassifier(max_features="auto") balance at each tree or does it just balance the data once? If the latter, I think it is confusing to call that a "balanced random forest" in the documentation because in the Breiman paper that refers to balancing each tree.

@glemaitre
Copy link
Member Author

The implementation is a pipeline of a random under sampler with an estimator. So if you pass an estimator which is a tree, it will balance each subset and then fit a tree on each subset. Therefore, BalancedBaggingClassifier become a BalancedRandomForest with max_feature='auto' as mentioned in the documentation.

@potash
Copy link

potash commented Jan 29, 2018

@glemaitre OK thanks for the clarification

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants