Cloning decision tree estimators breaks criterion objects #6420

panisson · 2016-02-22T14:19:25Z

I'm trying to implement different criterions for decision trees.
I've found that decision trees could accept a Criterion object as a criterion parameter:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/tree.py#L335
And the easiest way to implement other criterions would be to implement subclasses of tree._criterion.Criterion class.

The normal way to pass a criterion to a decision tree is by using its string name, and it works fine:

from sklearn import tree, model_selection, metrics, datasets
import numpy as np

X, y = datasets.make_classification(n_samples=1000, random_state=42)
cv = model_selection.KFold(n_folds=10, shuffle=True, random_state=43)

dtc = tree.DecisionTreeClassifier(criterion='gini', random_state=42)
print np.mean(model_selection.cross_val_score(dtc, X, y, cv=cv))

mean score is 0.866.

However, if I use a Criterion object, it does not work anymore:

gini = tree._criterion.Gini(n_outputs=1, n_classes=np.array([2]))
dtc = tree.DecisionTreeClassifier(criterion=gini, random_state=42)
print np.mean(model_selection.cross_val_score(dtc, X, y, cv=cv))

mean score now is 0.476.
It seems that the cloning of the decision tree is breaking the criterion object in some way, because this code is also not working:

from sklearn.base import clone

gini = tree._criterion.Gini(n_outputs=1, n_classes=np.array([2]))

dtc = tree.DecisionTreeClassifier(criterion=gini, random_state=42)
scores = []
for train_idx, test_idx in cv.split(X, y):
    estimator = clone(dtc)
    estimator.fit(X[train_idx], y[train_idx])
    scores.append(metrics.accuracy_score(y[test_idx], estimator.predict(X[test_idx])))
print np.mean(scores)

but if I reset the criterion object of the estimator by, e.g.,

estimator.criterion = dtc.criterion

then the score values are back to normal.

I could not find where the cloning is breaking the criterion object, any help would be welcome.

Thanks all for your effort on this project, sklearn is really great!

regards
André

The text was updated successfully, but these errors were encountered:

amueller · 2016-10-07T21:27:32Z

ping @jmschrei @glouppe @arjoly ?

jnothman · 2016-10-08T11:19:53Z

>>> gini = tree._criterion.Gini(n_outputs=1, n_classes=np.array([2]))
>>> dtc = tree.DecisionTreeClassifier(criterion=gini, random_state=42)
>>> gini.__reduce__()
(sklearn.tree._criterion.ClassificationCriterion, (1, array([2])), {})
>>> base.clone(dtc).criterion.__reduce__()
(sklearn.tree._criterion.ClassificationCriterion, (1, array([5764616311966949241])), {})

clone relies on copy.deepcopy:

>>> copy.deepcopy(gini).__reduce__()
(sklearn.tree._criterion.ClassificationCriterion, (1, array([4314681712])), {})

amueller · 2016-10-08T18:46:30Z

So deepcopy of criteria is broken?

jnothman · 2016-10-08T20:53:28Z

Yes, though I've not worked out why, despite spending some time tracing deepcopy.

jnothman · 2016-10-13T13:32:59Z

Need Debugger label?

amueller · 2016-10-13T17:58:25Z

is need debugger the same as need contributor or as need reviewer or something else? lol

olologin · 2016-10-16T16:18:29Z

Looks like I found the problem, np.array object returned from repr doesn't own underlying memory, so unerlying memory gets freed after ClassificationCriterion is destroyed.

copy.deepcopy(gini).reduce()
here we get reduce result from copy, and then copy object is removed, after this reduce result is filled with garbage.

Also, there are some other funny problems, like:

gini = tree._criterion.Gini(n_outputs=1, n_classes=np.array([2]))
gini_copy = copy.deepcopy(gini)
gini_copy.__class__
Out[1]: sklearn.tree._criterion.ClassificationCriterion

I'll try to fix this and make PR.

olologin · 2016-10-16T18:38:56Z

@panisson, Could you check your original problem on this branch #7680 ?

panisson · 2016-10-18T14:01:03Z

This branch solves my original problem. Thanks!

…on objects

…7680)

…ion objects (scikit-learn#7680) # Conflicts: # doc/whats_new.rst

…ion objects (scikit-learn#7680)

* tag '0.18.1': (144 commits) skip tree-test on 32bit do the warning test as we do it in other places. Replase assert_equal by assert_almost_equal in cosine test version bump 0.18.1 fix merge conflict mess in whatsnew add the python2.6 warning to 0.18.1 fix learning_curve test that I messed up in cherry-picking the "reentrant cv" PR. sync whatsnew with master [MRG] TST Ensure __dict__ is unmodified by predict, transform, etc (scikit-learn#7553) FIX scikit-learn#6420: Cloning decision tree estimators breaks criterion objects (scikit-learn#7680) Add whats new entry for scikit-learn#6282 (scikit-learn#7629) [MGR + 2] fix selectFdr bug (scikit-learn#7490) fixed whatsnew cherry-pick mess (somewhat) [MRG + 2] FIX LogisticRegressionCV to correctly handle string labels (scikit-learn#5874) [MRG + 2] Fixed parameter setting in SelectFromModel (scikit-learn#7764) [MRG+2] DOC adding separate `fit()` methods (and docstrings) for DecisionTreeClassifier and DecisionTreeRegressor (scikit-learn#7824) Fix docstring typo (scikit-learn#7844) n_features --> n_components [MRG + 1] DOC adding :user: role to whats_new (scikit-learn#7818) [MRG+1] label binarizer not used consistently in CalibratedClassifierCV (scikit-learn#7799) DOC : fix docstring of AIC/BIC in GMM ...

* releases: (144 commits) skip tree-test on 32bit do the warning test as we do it in other places. Replase assert_equal by assert_almost_equal in cosine test version bump 0.18.1 fix merge conflict mess in whatsnew add the python2.6 warning to 0.18.1 fix learning_curve test that I messed up in cherry-picking the "reentrant cv" PR. sync whatsnew with master [MRG] TST Ensure __dict__ is unmodified by predict, transform, etc (scikit-learn#7553) FIX scikit-learn#6420: Cloning decision tree estimators breaks criterion objects (scikit-learn#7680) Add whats new entry for scikit-learn#6282 (scikit-learn#7629) [MGR + 2] fix selectFdr bug (scikit-learn#7490) fixed whatsnew cherry-pick mess (somewhat) [MRG + 2] FIX LogisticRegressionCV to correctly handle string labels (scikit-learn#5874) [MRG + 2] Fixed parameter setting in SelectFromModel (scikit-learn#7764) [MRG+2] DOC adding separate `fit()` methods (and docstrings) for DecisionTreeClassifier and DecisionTreeRegressor (scikit-learn#7824) Fix docstring typo (scikit-learn#7844) n_features --> n_components [MRG + 1] DOC adding :user: role to whats_new (scikit-learn#7818) [MRG+1] label binarizer not used consistently in CalibratedClassifierCV (scikit-learn#7799) DOC : fix docstring of AIC/BIC in GMM ... Conflicts: removed sklearn/externals/joblib/__init__.py sklearn/externals/joblib/_parallel_backends.py sklearn/externals/joblib/testing.py

* dfsg: (144 commits) skip tree-test on 32bit do the warning test as we do it in other places. Replase assert_equal by assert_almost_equal in cosine test version bump 0.18.1 fix merge conflict mess in whatsnew add the python2.6 warning to 0.18.1 fix learning_curve test that I messed up in cherry-picking the "reentrant cv" PR. sync whatsnew with master [MRG] TST Ensure __dict__ is unmodified by predict, transform, etc (scikit-learn#7553) FIX scikit-learn#6420: Cloning decision tree estimators breaks criterion objects (scikit-learn#7680) Add whats new entry for scikit-learn#6282 (scikit-learn#7629) [MGR + 2] fix selectFdr bug (scikit-learn#7490) fixed whatsnew cherry-pick mess (somewhat) [MRG + 2] FIX LogisticRegressionCV to correctly handle string labels (scikit-learn#5874) [MRG + 2] Fixed parameter setting in SelectFromModel (scikit-learn#7764) [MRG+2] DOC adding separate `fit()` methods (and docstrings) for DecisionTreeClassifier and DecisionTreeRegressor (scikit-learn#7824) Fix docstring typo (scikit-learn#7844) n_features --> n_components [MRG + 1] DOC adding :user: role to whats_new (scikit-learn#7818) [MRG+1] label binarizer not used consistently in CalibratedClassifierCV (scikit-learn#7799) DOC : fix docstring of AIC/BIC in GMM ...

…ion objects (scikit-learn#7680)

amueller added the Bug label Oct 7, 2016

jnothman added Moderate Anything that requires some knowledge of conventions and best practices Need Contributor labels Oct 13, 2016

olologin mentioned this issue Oct 16, 2016

[MRG+2] Fix #6420 Cloning decision tree estimators breaks criterion objects #7680

Merged

olologin added a commit to olologin/scikit-learn that referenced this issue Oct 19, 2016

Fix scikit-learn#6420 Cloning decision tree estimators breaks criteri…

da5a85b

…on objects

olologin added a commit to olologin/scikit-learn that referenced this issue Oct 19, 2016

Fix scikit-learn#6420 Cloning decision tree estimators breaks criteri…

98eaad3

…on objects

jnothman closed this as completed in #7680 Oct 19, 2016

jnothman pushed a commit that referenced this issue Oct 19, 2016

FIX #6420: Cloning decision tree estimators breaks criterion objects (#…

74e4c42

…7680)

amueller added a commit to amueller/scikit-learn that referenced this issue Oct 25, 2016

FIX scikit-learn#6420: Cloning decision tree estimators breaks criter…

0d69158

…ion objects (scikit-learn#7680) # Conflicts: # doc/whats_new.rst

amueller pushed a commit to amueller/scikit-learn that referenced this issue Nov 9, 2016

FIX scikit-learn#6420: Cloning decision tree estimators breaks criter…

2bb913d

…ion objects (scikit-learn#7680)

Sundrique pushed a commit to Sundrique/scikit-learn that referenced this issue Jun 14, 2017

FIX scikit-learn#6420: Cloning decision tree estimators breaks criter…

2f7f34c

…ion objects (scikit-learn#7680)

paulha pushed a commit to paulha/scikit-learn that referenced this issue Aug 19, 2017

FIX scikit-learn#6420: Cloning decision tree estimators breaks criter…

1fcfc36

…ion objects (scikit-learn#7680)

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this issue Nov 15, 2017

FIX scikit-learn#6420: Cloning decision tree estimators breaks criter…

2255f4f

…ion objects (scikit-learn#7680)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cloning decision tree estimators breaks criterion objects #6420

Cloning decision tree estimators breaks criterion objects #6420

panisson commented Feb 22, 2016

amueller commented Oct 7, 2016

jnothman commented Oct 8, 2016

amueller commented Oct 8, 2016

jnothman commented Oct 8, 2016

jnothman commented Oct 13, 2016

amueller commented Oct 13, 2016

olologin commented Oct 16, 2016

olologin commented Oct 16, 2016

panisson commented Oct 18, 2016

Cloning decision tree estimators breaks criterion objects #6420

Cloning decision tree estimators breaks criterion objects #6420

Comments

panisson commented Feb 22, 2016

amueller commented Oct 7, 2016

jnothman commented Oct 8, 2016

amueller commented Oct 8, 2016

jnothman commented Oct 8, 2016

jnothman commented Oct 13, 2016

amueller commented Oct 13, 2016

olologin commented Oct 16, 2016

olologin commented Oct 16, 2016

panisson commented Oct 18, 2016