-
Notifications
You must be signed in to change notification settings - Fork 1.3k
DOC add comments regarding to make a balanced random forest from a BalancedBaggingClassifier #372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@chkoar WDYT? |
I don't have a strong opinion on this. I believe that a robust predictors/ensemble methods module could bring traction to the package. For that reason I would may implement BRF even as a shortcut. |
Yep but it needs to follow the random forest API and not the Bagging classifier. In that regards this why scikit learn has a PR there. We could give an hand actually to do this one. It might go faster together :-)
|
@glemaitre is a PR stalled in scikit-learn where we could contribute to be finished? |
it is already merged in #373 |
I asked because you said .
|
Oh yes there one PR that needs love. We wanted also to balanced at each node instead of tree to see the difference. So there is plenty of things
|
which one? |
@glemaitre in each tree? Does this task need modification in the Cython level? |
Yep you need to do that in cython
|
@chkoar, @glemaitre I'm not familiar with imblearn but in sklearn you can balance each tree simply by changing the sample_indices that get passed. This is how I implemented it in scikit-learn/scikit-learn#8732. So does |
The implementation is a pipeline of a random under sampler with an estimator. So if you pass an estimator which is a tree, it will balance each subset and then fit a tree on each subset. Therefore, BalancedBaggingClassifier become a BalancedRandomForest with |
@glemaitre OK thanks for the clarification |
I think that we could add note mentioning that we can achieve a balanced random forest classifier by setting
max_features='auto'
of the decision tree. I don't think that we should implement a new estimator since scikit-learn is going to do it.The text was updated successfully, but these errors were encountered: