[MRG] Fast PDP for DecisionTreeRegressor #15848

NicolasHug · 2019-12-09T16:19:43Z

This PR implements the fast 'recursion' method for DecisionTreeRegressor

We're only exposing the method which already exists for the gradient boosting estimators.

NicolasHug · 2019-12-09T20:11:40Z

CC @thomasjpfan and @glemaitre maybe

ogrisel

A first pass with a few comments/questions:

ogrisel · 2019-12-10T07:55:48Z

sklearn/inspection/tests/test_partial_dependence.py

+                                             response_method='auto')
+        pdp_tree = _partial_dependence_brute(tree, grid, features, X,
+                                             response_method='auto')
+        assert np.allclose(pdp_gbdt, pdp_tree)


Using np.testing.assert_allclose yields more informative error messages in case of failure.

ogrisel · 2019-12-10T08:00:15Z

sklearn/inspection/tests/test_partial_dependence.py

@@ -206,6 +212,48 @@ def test_partial_dependence_helpers(est, method, target_feature):
    assert np.allclose(pdp, mean_predictions, rtol=rtol)


+@skip_if_32bit


Do you have any idea why is this necessary? The differences seems very large. Maybe you should try with a non-random dataset (e.g. make_regression) so that the tree and the GBRT model have some more numerically stable structure?

I suspect it comes from a split with equal gain, but I'm not sure. I was really confused. Now I'm properly testing the recursion method so maybe we can remove this

ogrisel · 2019-12-10T08:03:33Z

sklearn/inspection/tests/test_partial_dependence.py

+
+        pdp_gbdt = _partial_dependence_brute(gbdt, grid, features, X,
+                                             response_method='auto')
+        pdp_tree = _partial_dependence_brute(tree, grid, features, X,


Why do you test with the brute force method? Why not the recursion method or the public partial_dependence function with auto mode?

It should definitely be the recursion method, my bad

NicolasHug · 2019-12-10T11:29:00Z

Ok so the test is still failing with 32 bits.

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=11194

I'm not sure why, since the prediction sanity check passes.

Note that using make_regression makes the problem worse because many splits are equivalent with these datasets.

sklearn/inspection/tests/test_partial_dependence.py

…rtial_dep_decision_tree

NicolasHug · 2019-12-11T14:21:08Z

@ogrisel , I added a check to make sure that both trees are exactly equal in
397ce88

As you can see, this fails (only) for 32bits. So naturally the PDP can't be equal either.

Since this failure isn't related to the current PR, I would propose to add something like

if not trees_are_equal:
	assert is_32bits  # 32 bits doesn't grow the same tree for some reason
	return

# check pdp here for all other platorms

would that be OK?

EDIT: did just that in c09565a

NicolasHug · 2019-12-11T20:26:33Z

I'll close in favor of #15864 which also add the RandomForestRegressor support (for very few additional lines)

NicolasHug added 6 commits December 6, 2019 16:32

WIP

74d369f

test and doc

17bc2a3

added comment

9fe5234

pep8

d8f5ee3

maybe fix 32 bits issue?

e52ae50

skip test if 32 bits

597006b

NicolasHug changed the title ~~[WIP] Fast PDP for DecisionTreeRegressor~~ [MRG] Fast PDP for DecisionTreeRegressor Dec 9, 2019

ogrisel reviewed Dec 10, 2019

View reviewed changes

NicolasHug added 3 commits December 10, 2019 05:59

test recursion instead of brute

7d2761e

use np.testing

11b6489

put back skipif32bits

7263640

try converting grid to float32

3ee45b6

NicolasHug commented Dec 10, 2019

View reviewed changes

sklearn/inspection/tests/test_partial_dependence.py Outdated Show resolved Hide resolved

NicolasHug added 6 commits December 10, 2019 07:13

Update sklearn/inspection/tests/test_partial_dependence.py

cf0df03

pep8

fb8257b

still nope

db8dc09

Merge branch 'master' of github.com:scikit-learn/scikit-learn into pa…

f141732

…rtial_dep_decision_tree

assert tree from DecisionTree and GBDT is exactly the same

397ce88

pep

be260a0

skip if 32 bits but better

c09565a

NicolasHug closed this Dec 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] Fast PDP for DecisionTreeRegressor #15848

[MRG] Fast PDP for DecisionTreeRegressor #15848

Uh oh!

NicolasHug commented Dec 9, 2019 •

edited

Loading

Uh oh!

NicolasHug commented Dec 9, 2019

Uh oh!

ogrisel left a comment

Uh oh!

ogrisel Dec 10, 2019

Uh oh!

ogrisel Dec 10, 2019

Uh oh!

NicolasHug Dec 10, 2019

Uh oh!

ogrisel Dec 10, 2019

Uh oh!

NicolasHug Dec 10, 2019

Uh oh!

NicolasHug commented Dec 10, 2019

Uh oh!

Uh oh!

NicolasHug commented Dec 11, 2019 •

edited

Loading

Uh oh!

NicolasHug commented Dec 11, 2019

Uh oh!

Uh oh!

		@@ -206,6 +212,48 @@ def test_partial_dependence_helpers(est, method, target_feature):
		assert np.allclose(pdp, mean_predictions, rtol=rtol)


		@skip_if_32bit

Uh oh!

[MRG] Fast PDP for DecisionTreeRegressor #15848

[MRG] Fast PDP for DecisionTreeRegressor #15848

Uh oh!

Conversation

NicolasHug commented Dec 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug commented Dec 9, 2019

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel Dec 10, 2019

Choose a reason for hiding this comment

Uh oh!

ogrisel Dec 10, 2019

Choose a reason for hiding this comment

Uh oh!

NicolasHug Dec 10, 2019

Choose a reason for hiding this comment

Uh oh!

ogrisel Dec 10, 2019

Choose a reason for hiding this comment

Uh oh!

NicolasHug Dec 10, 2019

Choose a reason for hiding this comment

Uh oh!

NicolasHug commented Dec 10, 2019

Uh oh!

Uh oh!

NicolasHug commented Dec 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug commented Dec 11, 2019

Uh oh!

Uh oh!

NicolasHug commented Dec 9, 2019 •

edited

Loading

NicolasHug commented Dec 11, 2019 •

edited

Loading