Skip to content

[MRG+1] Cleaning for fast partial dependence computation #13738

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 2, 2019

Conversation

NicolasHug
Copy link
Member

Partial dependence computation can be optimized for trees.
Currently, only GradientBoostingClassifier and GradientBoostingRegressor support the optimized method. There's no reason for this, and I'm planning to expand fast PD computation to the other tree estimators.

This is a preliminary PR that:

  • moves the _partial_dependence_tree helper from the GBDT code to a method of the Tree class.
  • clean the fast PD computation code to make it simpler and more maintainable.

Ping @thomasjpfan @glemaitre in case you'd be interested in reviewing this ;)

@glemaitre
Copy link
Member

I'll take a look from Monday ;)

@glemaitre glemaitre self-requested a review April 29, 2019 11:53
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I find it much more readable than the previous version.

@glemaitre
Copy link
Member

Is it actually straightforward to expand it for random forest. I would have thought that combining the weight of each tree would be different than in the gradient boosting since this is not a sequential algorithm.

What do you have in mind to do so?

@NicolasHug
Copy link
Member Author

I'm thinking of introducing a _compute_partial_dependence method for any tree-based estimator.

The _compute_partial_dependence of GBDTs would be the current _partial_dependence_recursion() function, and random forest would have a different behaviour.

In any case, a few other changes will be required, in particular in order to handle classifiers: for now, the fast PDP helper is only relevant for regression trees.

I'll need to think about it a little more, but I want to go incremental on this one, and maybe focus on regressors first.

@glemaitre
Copy link
Member

OK this seems a good plan. I think going incremental would be great. We always have the non-recursive for the default case.

@glemaitre glemaitre changed the title [MRG] Cleaning for fast partial dependence computation [MRG+1] Cleaning for fast partial dependence computation Apr 29, 2019
@glemaitre
Copy link
Member

@ogrisel Would you have time to take a a look at this one.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't immediately see substantial differences between the implementations, so I think this is good.

@@ -1123,3 +1123,110 @@ cdef class Tree:
Py_INCREF(self)
arr.base = <PyObject*> self
return arr


def _partial_dependence(self, DTYPE_t[:, ::1] X,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a public method of a private class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I don't know what the convention is here.
Some methods of the Tree class are public like predict or apply, some of them are private like _add_node.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is _add_node ever called from outside tree building?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh OK that's the convention. I'll make it public then

@NicolasHug
Copy link
Member Author

Yes the code is pretty much the same, I mostly avoided the use of some weird patterns like value[current_node - root_node]

@jnothman jnothman merged commit 4de404d into scikit-learn:master May 2, 2019
jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request May 6, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants