-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Open
Labels
EnhancementModerateAnything that requires some knowledge of conventions and best practicesAnything that requires some knowledge of conventions and best practiceshelp wantedmodule:tree
Description
I am planning on submitting several PRs in an attempt to merge #5041 in slowly, with the ultimate goal being a clean implementation of multithreaded decision tree building so that Gradient Boosting can be faster. With one of the main concepts merged (#5203), here is a list of separate PRs which I'd like to merge in the near future.
- Reorganize _tree.pyx into several files (see PR [MRG+1] split tree module into several packages #5230)[merged]
- Add proxy impurity improvement methods to both Gini and entropy (see PR [MRG] Proxy improvement methods added to entropy/gini #5233)[closed]
- Reevaluate constant feature caching [closed]
- Support sparse data for gradient boosting (see PR [MRG+1] Merge PresortBestSplitter and BestSplitter #5252)
- Add caching of computation between different split levels to avoid recomputation
- Ensure feature importance converge in ensemble (see PR [MRG+1] Stronger tests for variable importances #5261)
- Add tests to ensure the correctness of impurity values, wrt hand-computed values on toy data.
Longer range goals which I'd like to work towards (but have no clear plan as of right now) are the following:
- Add an approximate splitter
- Add multithreading support for single decision trees
- Add a partial fit method for tree building
- Support categorical variables
- Support missing values
At this point, it will be clearer to me what specific changes to Splitter, Criteria, and TreeBuilder need to be added to make multithreading a possibility. @glouppe @arjoly @GaelVaroquaux @pprett if you have any comments, I'd love to hear them.
Metadata
Metadata
Assignees
Labels
EnhancementModerateAnything that requires some knowledge of conventions and best practicesAnything that requires some knowledge of conventions and best practiceshelp wantedmodule:tree