-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG+1] Cleaning for fast partial dependence computation #13738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'll take a look from Monday ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I find it much more readable than the previous version.
Is it actually straightforward to expand it for random forest. I would have thought that combining the weight of each tree would be different than in the gradient boosting since this is not a sequential algorithm. What do you have in mind to do so? |
I'm thinking of introducing a The In any case, a few other changes will be required, in particular in order to handle classifiers: for now, the fast PDP helper is only relevant for regression trees. I'll need to think about it a little more, but I want to go incremental on this one, and maybe focus on regressors first. |
OK this seems a good plan. I think going incremental would be great. We always have the non-recursive for the default case. |
@ogrisel Would you have time to take a a look at this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't immediately see substantial differences between the implementations, so I think this is good.
sklearn/tree/_tree.pyx
Outdated
@@ -1123,3 +1123,110 @@ cdef class Tree: | |||
Py_INCREF(self) | |||
arr.base = <PyObject*> self | |||
return arr | |||
|
|||
|
|||
def _partial_dependence(self, DTYPE_t[:, ::1] X, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a public method of a private class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I don't know what the convention is here.
Some methods of the Tree class are public like predict
or apply
, some of them are private like _add_node
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is _add_node ever called from outside tree building?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh OK that's the convention. I'll make it public then
Yes the code is pretty much the same, I mostly avoided the use of some weird patterns like |
Partial dependence computation can be optimized for trees.
Currently, only GradientBoostingClassifier and GradientBoostingRegressor support the optimized method. There's no reason for this, and I'm planning to expand fast PD computation to the other tree estimators.
This is a preliminary PR that:
_partial_dependence_tree
helper from the GBDT code to a method of theTree
class.Ping @thomasjpfan @glemaitre in case you'd be interested in reviewing this ;)