Skip to content

[MRG] Fast PDP for DecisionTreeRegressor #15848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Dec 9, 2019

This PR implements the fast 'recursion' method for DecisionTreeRegressor

We're only exposing the method which already exists for the gradient boosting estimators.

@NicolasHug NicolasHug changed the title [WIP] Fast PDP for DecisionTreeRegressor [MRG] Fast PDP for DecisionTreeRegressor Dec 9, 2019
@NicolasHug
Copy link
Member Author

CC @thomasjpfan and @glemaitre maybe

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first pass with a few comments/questions:

response_method='auto')
pdp_tree = _partial_dependence_brute(tree, grid, features, X,
response_method='auto')
assert np.allclose(pdp_gbdt, pdp_tree)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using np.testing.assert_allclose yields more informative error messages in case of failure.

@@ -206,6 +212,48 @@ def test_partial_dependence_helpers(est, method, target_feature):
assert np.allclose(pdp, mean_predictions, rtol=rtol)


@skip_if_32bit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any idea why is this necessary? The differences seems very large. Maybe you should try with a non-random dataset (e.g. make_regression) so that the tree and the GBRT model have some more numerically stable structure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it comes from a split with equal gain, but I'm not sure. I was really confused. Now I'm properly testing the recursion method so maybe we can remove this


pdp_gbdt = _partial_dependence_brute(gbdt, grid, features, X,
response_method='auto')
pdp_tree = _partial_dependence_brute(tree, grid, features, X,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you test with the brute force method? Why not the recursion method or the public partial_dependence function with auto mode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should definitely be the recursion method, my bad

@NicolasHug
Copy link
Member Author

Ok so the test is still failing with 32 bits.

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=11194

I'm not sure why, since the prediction sanity check passes.

Note that using make_regression makes the problem worse because many splits are equivalent with these datasets.

@NicolasHug
Copy link
Member Author

NicolasHug commented Dec 11, 2019

@ogrisel , I added a check to make sure that both trees are exactly equal in
397ce88

As you can see, this fails (only) for 32bits. So naturally the PDP can't be equal either.

Since this failure isn't related to the current PR, I would propose to add something like

if not trees_are_equal:
	assert is_32bits  # 32 bits doesn't grow the same tree for some reason
	return

# check pdp here for all other platorms

would that be OK?

EDIT: did just that in c09565a

@NicolasHug
Copy link
Member Author

I'll close in favor of #15864 which also add the RandomForestRegressor support (for very few additional lines)

@NicolasHug NicolasHug closed this Dec 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants