-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Cloning decision tree estimators breaks criterion objects #6420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
>>> gini = tree._criterion.Gini(n_outputs=1, n_classes=np.array([2]))
>>> dtc = tree.DecisionTreeClassifier(criterion=gini, random_state=42)
>>> gini.__reduce__()
(sklearn.tree._criterion.ClassificationCriterion, (1, array([2])), {})
>>> base.clone(dtc).criterion.__reduce__()
(sklearn.tree._criterion.ClassificationCriterion, (1, array([5764616311966949241])), {})
>>> copy.deepcopy(gini).__reduce__()
(sklearn.tree._criterion.ClassificationCriterion, (1, array([4314681712])), {}) |
So deepcopy of criteria is broken? |
Yes, though I've not worked out why, despite spending some time tracing |
Need Debugger label? |
is need debugger the same as need contributor or as need reviewer or something else? lol |
Looks like I found the problem, np.array object returned from repr doesn't own underlying memory, so unerlying memory gets freed after ClassificationCriterion is destroyed.
Also, there are some other funny problems, like:
I'll try to fix this and make PR. |
This branch solves my original problem. Thanks! |
…ion objects (scikit-learn#7680) # Conflicts: # doc/whats_new.rst
* tag '0.18.1': (144 commits) skip tree-test on 32bit do the warning test as we do it in other places. Replase assert_equal by assert_almost_equal in cosine test version bump 0.18.1 fix merge conflict mess in whatsnew add the python2.6 warning to 0.18.1 fix learning_curve test that I messed up in cherry-picking the "reentrant cv" PR. sync whatsnew with master [MRG] TST Ensure __dict__ is unmodified by predict, transform, etc (scikit-learn#7553) FIX scikit-learn#6420: Cloning decision tree estimators breaks criterion objects (scikit-learn#7680) Add whats new entry for scikit-learn#6282 (scikit-learn#7629) [MGR + 2] fix selectFdr bug (scikit-learn#7490) fixed whatsnew cherry-pick mess (somewhat) [MRG + 2] FIX LogisticRegressionCV to correctly handle string labels (scikit-learn#5874) [MRG + 2] Fixed parameter setting in SelectFromModel (scikit-learn#7764) [MRG+2] DOC adding separate `fit()` methods (and docstrings) for DecisionTreeClassifier and DecisionTreeRegressor (scikit-learn#7824) Fix docstring typo (scikit-learn#7844) n_features --> n_components [MRG + 1] DOC adding :user: role to whats_new (scikit-learn#7818) [MRG+1] label binarizer not used consistently in CalibratedClassifierCV (scikit-learn#7799) DOC : fix docstring of AIC/BIC in GMM ...
* releases: (144 commits) skip tree-test on 32bit do the warning test as we do it in other places. Replase assert_equal by assert_almost_equal in cosine test version bump 0.18.1 fix merge conflict mess in whatsnew add the python2.6 warning to 0.18.1 fix learning_curve test that I messed up in cherry-picking the "reentrant cv" PR. sync whatsnew with master [MRG] TST Ensure __dict__ is unmodified by predict, transform, etc (scikit-learn#7553) FIX scikit-learn#6420: Cloning decision tree estimators breaks criterion objects (scikit-learn#7680) Add whats new entry for scikit-learn#6282 (scikit-learn#7629) [MGR + 2] fix selectFdr bug (scikit-learn#7490) fixed whatsnew cherry-pick mess (somewhat) [MRG + 2] FIX LogisticRegressionCV to correctly handle string labels (scikit-learn#5874) [MRG + 2] Fixed parameter setting in SelectFromModel (scikit-learn#7764) [MRG+2] DOC adding separate `fit()` methods (and docstrings) for DecisionTreeClassifier and DecisionTreeRegressor (scikit-learn#7824) Fix docstring typo (scikit-learn#7844) n_features --> n_components [MRG + 1] DOC adding :user: role to whats_new (scikit-learn#7818) [MRG+1] label binarizer not used consistently in CalibratedClassifierCV (scikit-learn#7799) DOC : fix docstring of AIC/BIC in GMM ... Conflicts: removed sklearn/externals/joblib/__init__.py sklearn/externals/joblib/_parallel_backends.py sklearn/externals/joblib/testing.py
* dfsg: (144 commits) skip tree-test on 32bit do the warning test as we do it in other places. Replase assert_equal by assert_almost_equal in cosine test version bump 0.18.1 fix merge conflict mess in whatsnew add the python2.6 warning to 0.18.1 fix learning_curve test that I messed up in cherry-picking the "reentrant cv" PR. sync whatsnew with master [MRG] TST Ensure __dict__ is unmodified by predict, transform, etc (scikit-learn#7553) FIX scikit-learn#6420: Cloning decision tree estimators breaks criterion objects (scikit-learn#7680) Add whats new entry for scikit-learn#6282 (scikit-learn#7629) [MGR + 2] fix selectFdr bug (scikit-learn#7490) fixed whatsnew cherry-pick mess (somewhat) [MRG + 2] FIX LogisticRegressionCV to correctly handle string labels (scikit-learn#5874) [MRG + 2] Fixed parameter setting in SelectFromModel (scikit-learn#7764) [MRG+2] DOC adding separate `fit()` methods (and docstrings) for DecisionTreeClassifier and DecisionTreeRegressor (scikit-learn#7824) Fix docstring typo (scikit-learn#7844) n_features --> n_components [MRG + 1] DOC adding :user: role to whats_new (scikit-learn#7818) [MRG+1] label binarizer not used consistently in CalibratedClassifierCV (scikit-learn#7799) DOC : fix docstring of AIC/BIC in GMM ...
I'm trying to implement different criterions for decision trees.
I've found that decision trees could accept a Criterion object as a criterion parameter:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/tree.py#L335
And the easiest way to implement other criterions would be to implement subclasses of tree._criterion.Criterion class.
The normal way to pass a criterion to a decision tree is by using its string name, and it works fine:
mean score is 0.866.
However, if I use a Criterion object, it does not work anymore:
mean score now is 0.476.
It seems that the cloning of the decision tree is breaking the criterion object in some way, because this code is also not working:
but if I reset the criterion object of the estimator by, e.g.,
then the score values are back to normal.
I could not find where the cloning is breaking the criterion object, any help would be welcome.
Thanks all for your effort on this project, sklearn is really great!
regards
André
The text was updated successfully, but these errors were encountered: