-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG + 2] EHN additional test for trees regarding fitting behaviour with constant features #8580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG + 2] EHN additional test for trees regarding fitting behaviour with constant features #8580
Conversation
This seems like a good unit test, but can you explain what the issue is a bit more? What do you mean by "the other splitter"? |
I am referring to a tree growth using a depth-first approach. Each node
at a given depth will be given to a splitter object. If for the first
node, the splitter find a constant feature, it should mark this feature
as constant for subsequent deeper (i.e. next depth-level)
nodes. However, the other nodes at the same depth should not be affected
by this assignation.
I did this mistake in the other implementation by invalidating the
feature for all nodes (deeper and the current depth level) and the test
did not fail. With the following data, you should have the following
behaviour:
* a first split without any issue;
* at depth=1, one of the node will face
a constant feature while the second one will not face this case;
* at depth=2, only one of the node will be split.
With the bad implementation, you will stop at depth=1. Just to make it
clear, there is nothing wrong with the current implementation. This is
just an additional test to avoid a future mis-implementation.
Hope that this is more clear.
|
Got it, this looks good to me. Thanks for the contribution. |
sklearn/tree/tests/test_tree.py
Outdated
assert_equal(est.tree_.max_depth, 2) | ||
assert_equal(est.tree_.node_count, 5) | ||
|
||
for name, TreeEstimator in REG_TREES.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 2 separate loops?
Oopsy, I was not looking at the right graph. It make more sense now ;) Not sure it makes sense to test this feature, for the ExtraTree. |
225995f
to
e42b504
Compare
Thx. +1 for merge from me as well... Will merge when appveyor gives a go.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Too slow. |
…ith constant features (scikit-learn#8580) * TST add test checking the behaviour of constant/no-constant features * FIX/TST factorize test * TST Add additional constant features * FIX/TST remove ExtraTree from test
…ith constant features (scikit-learn#8580) * TST add test checking the behaviour of constant/no-constant features * FIX/TST factorize test * TST Add additional constant features * FIX/TST remove ExtraTree from test
…ith constant features (scikit-learn#8580) * TST add test checking the behaviour of constant/no-constant features * FIX/TST factorize test * TST Add additional constant features * FIX/TST remove ExtraTree from test
…ith constant features (scikit-learn#8580) * TST add test checking the behaviour of constant/no-constant features * FIX/TST factorize test * TST Add additional constant features * FIX/TST remove ExtraTree from test
…ith constant features (scikit-learn#8580) * TST add test checking the behaviour of constant/no-constant features * FIX/TST factorize test * TST Add additional constant features * FIX/TST remove ExtraTree from test
…ith constant features (scikit-learn#8580) * TST add test checking the behaviour of constant/no-constant features * FIX/TST factorize test * TST Add additional constant features * FIX/TST remove ExtraTree from test
…ith constant features (scikit-learn#8580) * TST add test checking the behaviour of constant/no-constant features * FIX/TST factorize test * TST Add additional constant features * FIX/TST remove ExtraTree from test
Reference Issue
Related to #8231
What does this implement/fix? Explain your changes.
In #8231, the following error in the implementation did not trigger any failing in the tests of the trees:
When a feature was found to be constant at a given splitter, this feature was declared constant for the remaining of the fitting. However, this is a wrong behaviour since this feature is potentially non-constant for the other splitter.
This additional test, check that the number of nodes and the tree depth are correct with a feature which would be declared constant at the first split as described previously.
Any other comments?