-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FEA Categorical split support for DecisionTree*
, ExtraTree*
, RandomForest*
and `ExtraTrees*
#29437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
❌ Linting issuesThis PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling You can see the details of the linting issues under the
|
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
…into partition
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Reference Issues/PRs
Supersedes #12866 #4899 and #7068 , #3346
What does this implement/fix? Explain your changes.
Implements splitting rules when categorical data is given in a decision tree classifier/regressor.
ParentInfo
,SplitRecord
,Node
) ->_utils.pxd
Open Questions / TODO
breiman_shortcut
be part ofSplitter
? It is completely unused byRandomSplitter
, so it seems weird to pass in an argument that is unnecessary. Otoh, if we specialize thebreiman_shortcut
only forBestSplitter
, then we need to also specialize howbreiman_shortcut
is passed inBaseDecisionTree
.Any other comments?
May be related to #24967
Would be nice to merge in #29458
Notes:
n_categories
, so we can trim down the number of categories. I think this can be implemented last and then benchmarked to determine if this optimization is useful or not. I suspect it is since for categorical splits, there is time involved to "count the unique categories", which seems unnecessary. This is conceptually similar to tracking constant features