-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG + 1] Add test for __dict__ #7553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We certainly change the |
Got it. Now I'm checking that
|
You should check any of |
(Before deprecation of the feature selection transformation, there certainly should have been models that have all of these, such as logistic regression) |
okay, what I've done by now:
Can you please tell if it's closer to the desired result or not? Thanks! |
I've added some special cases for
error. Any suggestions? |
def check_dict(name, Estimator): | ||
rnd = np.random.RandomState(0) | ||
if name in ['SpectralCoclustering']: | ||
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just "return", no 0.
The issues with SpectralCoclustering
and SpectralBiclustering
may be resolved in #6141
@@ -397,6 +398,39 @@ def check_dtype_object(name, Estimator): | |||
|
|||
|
|||
@ignore_warnings | |||
def check_dict(name, Estimator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe check_dict_unchanged
if name in ['RANSACRegressor']: | ||
X = 3 * rnd.uniform(size=(20, 3)) | ||
else: | ||
X = 2 * rnd.uniform(size=(20, 3)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use 3 *
in all cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I use 3 *
for all, I get ValueError: _BinaryGaussianProcessClassifierLaplace supports only binary classification. y contains classes [0 1 2]
. Which is sounds fair enough.
With 2 *
for all I get on the other side ValueError: No inliers found, possible cause is setting residual_threshold (None) too low.
for RANSACRegressor
:(
set_testing_parameters(estimator) | ||
|
||
if hasattr(estimator, "n_components"): | ||
estimator.n_components = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 3? other tests with this parameter setting use 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well yes, but for some reason with 1
I catch ValueError: n_best cannot be larger than n_components, but 3 > 1
from SpectralBiclustering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm maybe SpectralBiclustering needs n_best=1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, that helped!
estimator = Estimator() | ||
set_testing_parameters(estimator) | ||
|
||
if hasattr(estimator, "n_components"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*Hmm... I wonder if these should be in set_testing_parameters
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't say anything here, I've seen the same part on several places, for example here: https://github.com/kiote/scikit-learn/blob/feature-test-__dict__/sklearn/utils/estimator_checks.py#L443
So, it looks like it could be moved there, do you think it should?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah feel free to move it there.
for method in ["predict", "transform", "decision_function", | ||
"predict_proba"]: | ||
if hasattr(estimator, method): | ||
dict_before = estimator.__dict__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to perform .copy()
at the end of this. Otherwise, in almost all cases, testing equivalence doesn't test that __dict__
is unchanged, as you're actually comparing an object to itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha good point, after the real check I have to skip two more estimators: 'NMF' and 'ProjectedGradientNMF', since they actually change __dict__
on given steps.
if hasattr(estimator, method): | ||
dict_before = estimator.__dict__ | ||
getattr(estimator, method)(X) | ||
assert_equal(estimator.__dict__, dict_before) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use assert_dict_equal
Thanks for working on this, though I suspect you will find many more failures when you correctly perform |
rnd = np.random.RandomState(0) | ||
if name in ['SpectralCoclustering']: | ||
return 0 | ||
if name in ['SpectralCoclustering', 'NMF', 'ProjectedGradientNMF']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do they change the dict? Please leave them in so we can see the error and fix them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both NMF
and ProjectedGradientNMF
changes n_iter
value.
With SpectralCoclustering
situation is different, I just can't make it work. It fails with error:
ValueError: Found array with 0 feature(s) (shape=(23, 0)) while a minimum of 1 is required.
|
||
|
||
@ignore_warnings | ||
def test_predict_does_not_change_estimator_state(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should add this test to the generator that generates all tests in estimator_checks.py
and not add a test here. Though for working on it this might be easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, I'll remove this from here afterwards then!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me once you move this.
|
||
set_random_state(estimator, 1) | ||
|
||
if name in ['SpectralBiclustering']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpectralBiclustering doesn't take y
? I guess that's fixed in #6141.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep! those changes helped (I merged with maniteja123:issue6126
branch to check), so I suppose I could remove this after #6141 merge, right?
Not really sure, what should I do with failing tests for |
It would be acceptable to skip them for now (by checking for their name) and then creating a new issue for this to be solved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting there!
@@ -312,6 +314,14 @@ def set_testing_parameters(estimator): | |||
if not isinstance(estimator, ProjectedGradientNMF): | |||
estimator.set_params(solver='cd') | |||
|
|||
if hasattr(estimator, "n_components"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems setting these parameters here is a bad idea. It severely affects other tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll move that back to the concrete test.
How did you check that, btw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Click "details" next to "continuous-integration/travis-ci/pr" under "Some changes were not successful"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see, thanks!
I added some comments right in the code to explain some things, also I'm gong to remove new code from |
This looks okay... could you change the title from WIP to MRG and put the code into mergeable form? |
def check_dict_unchanged(name, Estimator): | ||
# these two estimators change the state of __dict__ | ||
# and need to be fixed | ||
if name in ['NMF', 'ProjectedGradientNMF']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it might be simpler to remove this line instead, that should get rid of the issue.
done! |
can you please rebase? Otherwise LGTM. |
okay! now it's rebased from master |
Can you check the changed files as shown by github? I think something went wrong during the rebase. |
hmm, could you please tell more, what exactly do you mean? Can't see what's wrong there 😕 |
@kiote never mind, I didn't have enough coffee... |
@jnothman wanna have another look? |
Can you please add a test to |
done, please see here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
@@ -1106,7 +1106,6 @@ def transform(self, X): | |||
nls_max_iter=self.nls_max_iter, sparseness=self.sparseness, | |||
beta=self.beta, eta=self.eta) | |||
|
|||
self.n_iter_ = n_iter_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very awkwardly, there is an Attributes
section in the docstring above (and similar ones in the docs for fit
and fit_transform
!). Please remove it. I suppose this change should also be documented ("Fixed bug where NMF
's n_iter_
attribute was set by calls to transform
"), though it's a very strange feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed all self.n_iter
= something in transform
, but not really sure where this change should be documented. Could you please point me to the right place for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our changelog is in doc/whats_new.rst
getattr(estimator, method)(X) | ||
assert_dict_equal(estimator.__dict__, dict_before, | ||
('Estimator changes __dict__ during' | ||
'predict, transform, decision_function' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just specify the one that's being tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or would we rather list all to make it clear what the API violation is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to list them all, even if only one violates the new rule, but it might be clearer to point directly to the one that's being tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed here: 4bd07c7
@@ -988,6 +988,7 @@ def __init__(self, n_components=None, init=None, solver='cd', | |||
self.l1_ratio = l1_ratio | |||
self.verbose = verbose | |||
self.shuffle = shuffle | |||
self.n_iter_ = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without having this attribute here I got
File "/home/travis/sklearn_build_latest/scikit-learn/sklearn/utils/estimator_checks.py", line 1600, in check_transformer_n_iter
assert_greater_equal(estimator.n_iter_, 1)
AttributeError: 'ProjectedGradientNMF' object has no attribute 'n_iter_'
from tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the misunderstandings.
@@ -1021,9 +1021,6 @@ def fit_transform(self, X, y=None, W=None, H=None): | |||
components_ : array-like, shape (n_components, n_features) | |||
Factorization matrix, sometimes called 'dictionary'. | |||
|
|||
n_iter_ : int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was commenting that there should be no Attributes section at all in a method docstring. it belongs in the class docstring.
@@ -1049,7 +1046,6 @@ def fit_transform(self, X, y=None, W=None, H=None): | |||
|
|||
self.n_components_ = H.shape[0] | |||
self.components_ = H | |||
self.n_iter_ = n_iter_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not have removed this. We need it in fit_transform
, not in transform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woops! placed back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
okay, now I put back some lines I removed accidentally with I wonder should I add doc about the new check as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM.
@@ -1066,9 +1059,6 @@ def fit(self, X, y=None, **params): | |||
components_ : array-like, shape (n_components, n_features) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please drop all attributes here too
@@ -94,6 +94,11 @@ Bug fixes | |||
(`#7680 <https://github.com/scikit-learn/scikit-learn/pull/7680>`_). | |||
By `Ibraim Ganiev`_. | |||
|
|||
- Remove params changing inside of `transform` method of :class:`decomposition.NMF` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following would suffice:
- Fixed a bug where :class:`decomposition.NMF` sets its `n_iters_`
attribute in `transform()`. :issue:`7553` by `Ekaterina Krivich`_.
And under Enhancements, something like "check_estimator now attempts to ensure that methods transform
, predict
, etc. do not set attributes on the estimator."
@@ -988,6 +988,7 @@ def __init__(self, n_components=None, init=None, solver='cd', | |||
self.l1_ratio = l1_ratio | |||
self.verbose = verbose | |||
self.shuffle = shuffle | |||
self.n_iter_ = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this now.
@@ -69,6 +69,11 @@ Enhancements | |||
(`#7723 <https://github.com/scikit-learn/scikit-learn/pull/7723>`_) | |||
by `Mikhail Korobov`_. | |||
|
|||
- ``check_estimator`` now attempts to ensure that methods transform, predict, etc. | |||
do not set attributes on the estimator. | |||
(`#7553 <https://github.com/scikit-learn/scikit-learn/pull/7553>`_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, please use
:issue:`7533`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, anyway nor my version, not this one is not clickable for me (looks like https://github.com/scikit-learn/scikit-learn/blob/master/doc/whats_new.rst#id17). Not the scope of this issue, obviously. I'll fix that!
(`#7553 <https://github.com/scikit-learn/scikit-learn/pull/7553>`_). Since it violates | ||
new represented rule "estimator state does not change at transform/predict/predict_proba time". | ||
By `Ekaterina Krivich`_. | ||
- Fixed a bug where :class:`decomposition.NMF` sets its `n_iters_` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to format n_iters_
correctly, you need double-backticks:
``n_iters_``
Sigh. Can you update your master, rebase or merge, and fix conflicts in whats_new? |
check that "predict", "transform", "decision_function" or "predict_proba" methods do not change the state of __dict__ of any estimator see #7297
that shows that check_estimator fails on an estimator that violates this
oh, it's done |
Thanks! |
…cikit-learn#7553) * Add test for __dict__ for estimator checks check that "predict", "transform", "decision_function" or "predict_proba" methods do not change the state of __dict__ of any estimator see scikit-learn#7297 * Add a test to test_check_estimator that shows that check_estimator fails on an estimator that violates this * Fixed bug where NMF's n_iter_ attribute was set by calls to transform
…cikit-learn#7553) * Add test for __dict__ for estimator checks check that "predict", "transform", "decision_function" or "predict_proba" methods do not change the state of __dict__ of any estimator see scikit-learn#7297 * Add a test to test_check_estimator that shows that check_estimator fails on an estimator that violates this * Fixed bug where NMF's n_iter_ attribute was set by calls to transform
…cikit-learn#7553) * Add test for __dict__ for estimator checks check that "predict", "transform", "decision_function" or "predict_proba" methods do not change the state of __dict__ of any estimator see scikit-learn#7297 * Add a test to test_check_estimator that shows that check_estimator fails on an estimator that violates this * Fixed bug where NMF's n_iter_ attribute was set by calls to transform
* tag '0.18.1': (144 commits) skip tree-test on 32bit do the warning test as we do it in other places. Replase assert_equal by assert_almost_equal in cosine test version bump 0.18.1 fix merge conflict mess in whatsnew add the python2.6 warning to 0.18.1 fix learning_curve test that I messed up in cherry-picking the "reentrant cv" PR. sync whatsnew with master [MRG] TST Ensure __dict__ is unmodified by predict, transform, etc (scikit-learn#7553) FIX scikit-learn#6420: Cloning decision tree estimators breaks criterion objects (scikit-learn#7680) Add whats new entry for scikit-learn#6282 (scikit-learn#7629) [MGR + 2] fix selectFdr bug (scikit-learn#7490) fixed whatsnew cherry-pick mess (somewhat) [MRG + 2] FIX LogisticRegressionCV to correctly handle string labels (scikit-learn#5874) [MRG + 2] Fixed parameter setting in SelectFromModel (scikit-learn#7764) [MRG+2] DOC adding separate `fit()` methods (and docstrings) for DecisionTreeClassifier and DecisionTreeRegressor (scikit-learn#7824) Fix docstring typo (scikit-learn#7844) n_features --> n_components [MRG + 1] DOC adding :user: role to whats_new (scikit-learn#7818) [MRG+1] label binarizer not used consistently in CalibratedClassifierCV (scikit-learn#7799) DOC : fix docstring of AIC/BIC in GMM ...
* releases: (144 commits) skip tree-test on 32bit do the warning test as we do it in other places. Replase assert_equal by assert_almost_equal in cosine test version bump 0.18.1 fix merge conflict mess in whatsnew add the python2.6 warning to 0.18.1 fix learning_curve test that I messed up in cherry-picking the "reentrant cv" PR. sync whatsnew with master [MRG] TST Ensure __dict__ is unmodified by predict, transform, etc (scikit-learn#7553) FIX scikit-learn#6420: Cloning decision tree estimators breaks criterion objects (scikit-learn#7680) Add whats new entry for scikit-learn#6282 (scikit-learn#7629) [MGR + 2] fix selectFdr bug (scikit-learn#7490) fixed whatsnew cherry-pick mess (somewhat) [MRG + 2] FIX LogisticRegressionCV to correctly handle string labels (scikit-learn#5874) [MRG + 2] Fixed parameter setting in SelectFromModel (scikit-learn#7764) [MRG+2] DOC adding separate `fit()` methods (and docstrings) for DecisionTreeClassifier and DecisionTreeRegressor (scikit-learn#7824) Fix docstring typo (scikit-learn#7844) n_features --> n_components [MRG + 1] DOC adding :user: role to whats_new (scikit-learn#7818) [MRG+1] label binarizer not used consistently in CalibratedClassifierCV (scikit-learn#7799) DOC : fix docstring of AIC/BIC in GMM ... Conflicts: removed sklearn/externals/joblib/__init__.py sklearn/externals/joblib/_parallel_backends.py sklearn/externals/joblib/testing.py
* dfsg: (144 commits) skip tree-test on 32bit do the warning test as we do it in other places. Replase assert_equal by assert_almost_equal in cosine test version bump 0.18.1 fix merge conflict mess in whatsnew add the python2.6 warning to 0.18.1 fix learning_curve test that I messed up in cherry-picking the "reentrant cv" PR. sync whatsnew with master [MRG] TST Ensure __dict__ is unmodified by predict, transform, etc (scikit-learn#7553) FIX scikit-learn#6420: Cloning decision tree estimators breaks criterion objects (scikit-learn#7680) Add whats new entry for scikit-learn#6282 (scikit-learn#7629) [MGR + 2] fix selectFdr bug (scikit-learn#7490) fixed whatsnew cherry-pick mess (somewhat) [MRG + 2] FIX LogisticRegressionCV to correctly handle string labels (scikit-learn#5874) [MRG + 2] Fixed parameter setting in SelectFromModel (scikit-learn#7764) [MRG+2] DOC adding separate `fit()` methods (and docstrings) for DecisionTreeClassifier and DecisionTreeRegressor (scikit-learn#7824) Fix docstring typo (scikit-learn#7844) n_features --> n_components [MRG + 1] DOC adding :user: role to whats_new (scikit-learn#7818) [MRG+1] label binarizer not used consistently in CalibratedClassifierCV (scikit-learn#7799) DOC : fix docstring of AIC/BIC in GMM ...
…cikit-learn#7553) * Add test for __dict__ for estimator checks check that "predict", "transform", "decision_function" or "predict_proba" methods do not change the state of __dict__ of any estimator see scikit-learn#7297 * Add a test to test_check_estimator that shows that check_estimator fails on an estimator that violates this * Fixed bug where NMF's n_iter_ attribute was set by calls to transform
…cikit-learn#7553) * Add test for __dict__ for estimator checks check that "predict", "transform", "decision_function" or "predict_proba" methods do not change the state of __dict__ of any estimator see scikit-learn#7297 * Add a test to test_check_estimator that shows that check_estimator fails on an estimator that violates this * Fixed bug where NMF's n_iter_ attribute was set by calls to transform
…cikit-learn#7553) * Add test for __dict__ for estimator checks check that "predict", "transform", "decision_function" or "predict_proba" methods do not change the state of __dict__ of any estimator see scikit-learn#7297 * Add a test to test_check_estimator that shows that check_estimator fails on an estimator that violates this * Fixed bug where NMF's n_iter_ attribute was set by calls to transform
Reference Issue
see #7297
What does this implement/fix? Explain your changes.
Check that estimator does not change the state of dict during
fit
phase.Any other comments?
I add "WIP = work in progress" for this PR, since I'm not sure it does what it should do. So please give me some feedback first :)
After this, I'll add checking for
transform
stage as well.