Skip to content

[MRG + 1] Add test for __dict__ #7553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 5, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion doc/whats_new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,10 +65,14 @@ Enhancements
- Added ``sample_weight`` parameter to :meth:`pipeline.Pipeline.score`.
:issue:`7723` by `Mikhail Korobov`_.

- ``check_estimator`` now attempts to ensure that methods transform, predict, etc.
do not set attributes on the estimator.
:issue:`7533` by `Ekaterina Krivich`_.

Bug fixes
.........

- Fix a bug where :class:`sklearn.feature_selection.SelectFdr` did not
- Fix a bug where :class:`sklearn.feature_selection.SelectFdr` did not
exactly implement Benjamini-Hochberg procedure. It formerly may have
selected fewer features than it should.
:issue:`7490` by `Peng Meng`_.
Expand All @@ -88,6 +92,9 @@ Bug fixes
- Tree splitting criterion classes' cloning/pickling is now memory safe
:issue:`7680` by `Ibraim Ganiev`_.

- Fixed a bug where :class:`decomposition.NMF` sets its ``n_iters_``
attribute in `transform()`. :issue:`7553` by `Ekaterina Krivich`_.

.. _changes_0_18_1:

Version 0.18.1
Expand Down
22 changes: 0 additions & 22 deletions sklearn/decomposition/nmf.py
Original file line number Diff line number Diff line change
Expand Up @@ -1016,14 +1016,6 @@ def fit_transform(self, X, y=None, W=None, H=None):
H : array-like, shape (n_components, n_features)
If init='custom', it is used as initial guess for the solution.

Attributes
----------
components_ : array-like, shape (n_components, n_features)
Factorization matrix, sometimes called 'dictionary'.

n_iter_ : int
Actual number of iterations for the transform.

Returns
-------
W: array, shape (n_samples, n_components)
Expand Down Expand Up @@ -1061,14 +1053,6 @@ def fit(self, X, y=None, **params):
X: {array-like, sparse matrix}, shape (n_samples, n_features)
Data matrix to be decomposed

Attributes
----------
components_ : array-like, shape (n_components, n_features)
Factorization matrix, sometimes called 'dictionary'.

n_iter_ : int
Actual number of iterations for the transform.

Returns
-------
self
Expand All @@ -1084,11 +1068,6 @@ def transform(self, X):
X: {array-like, sparse matrix}, shape (n_samples, n_features)
Data matrix to be transformed by the model

Attributes
----------
n_iter_ : int
Actual number of iterations for the transform.

Returns
-------
W: array, shape (n_samples, n_components)
Expand All @@ -1106,7 +1085,6 @@ def transform(self, X):
nls_max_iter=self.nls_max_iter, sparseness=self.sparseness,
beta=self.beta, eta=self.eta)

self.n_iter_ = n_iter_
return W

def inverse_transform(self, W):
Expand Down
45 changes: 45 additions & 0 deletions sklearn/utils/estimator_checks.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
from sklearn.utils.testing import SkipTest
from sklearn.utils.testing import ignore_warnings
from sklearn.utils.testing import assert_warns
from sklearn.utils.testing import assert_dict_equal


from sklearn.base import (clone, ClassifierMixin, RegressorMixin,
Expand Down Expand Up @@ -230,6 +231,7 @@ def _yield_all_checks(name, Estimator):
yield check_fit1d_1feature
yield check_fit1d_1sample
yield check_get_params_invariance
yield check_dict_unchanged


def check_estimator(Estimator):
Expand Down Expand Up @@ -409,6 +411,49 @@ def check_dtype_object(name, Estimator):


@ignore_warnings
def check_dict_unchanged(name, Estimator):
# this estimator raises
# ValueError: Found array with 0 feature(s) (shape=(23, 0))
# while a minimum of 1 is required.
# error
if name in ['SpectralCoclustering']:
return
rnd = np.random.RandomState(0)
if name in ['RANSACRegressor']:
X = 3 * rnd.uniform(size=(20, 3))
else:
X = 2 * rnd.uniform(size=(20, 3))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use 3 * in all cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I use 3 * for all, I get ValueError: _BinaryGaussianProcessClassifierLaplace supports only binary classification. y contains classes [0 1 2]. Which is sounds fair enough.

With 2 * for all I get on the other side ValueError: No inliers found, possible cause is setting residual_threshold (None) too low. for RANSACRegressor :(


y = X[:, 0].astype(np.int)
y = multioutput_estimator_convert_y_2d(name, y)
estimator = Estimator()
set_testing_parameters(estimator)
if hasattr(estimator, "n_components"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*Hmm... I wonder if these should be in set_testing_parameters)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't say anything here, I've seen the same part on several places, for example here: https://github.com/kiote/scikit-learn/blob/feature-test-__dict__/sklearn/utils/estimator_checks.py#L443

So, it looks like it could be moved there, do you think it should?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah feel free to move it there.

estimator.n_components = 1

if hasattr(estimator, "n_clusters"):
estimator.n_clusters = 1

if hasattr(estimator, "n_best"):
estimator.n_best = 1

set_random_state(estimator, 1)

# should be just `estimator.fit(X, y)`
# after merging #6141
if name in ['SpectralBiclustering']:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpectralBiclustering doesn't take y? I guess that's fixed in #6141.

Copy link
Contributor Author

@kiote kiote Oct 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! those changes helped (I merged with maniteja123:issue6126 branch to check), so I suppose I could remove this after #6141 merge, right?

estimator.fit(X)
else:
estimator.fit(X, y)
for method in ["predict", "transform", "decision_function",
"predict_proba"]:
if hasattr(estimator, method):
dict_before = estimator.__dict__.copy()
getattr(estimator, method)(X)
assert_dict_equal(estimator.__dict__, dict_before,
'Estimator changes __dict__ during %s' % method)


def check_fit2d_predict1d(name, Estimator):
# check by fitting a 2d array and prediting with a 1d array
rnd = np.random.RandomState(0)
Expand Down
19 changes: 19 additions & 0 deletions sklearn/utils/tests/test_estimator_checks.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,20 @@ def predict(self, X):
return np.ones(X.shape[0])


class ChangesDict(BaseEstimator):
def __init__(self):
self.key = 0

def fit(self, X, y=None):
X, y = check_X_y(X, y)
return self

def predict(self, X):
X = check_array(X)
self.key = 1000
return np.ones(X.shape[0])


class NoCheckinPredict(BaseBadClassifier):
def fit(self, X, y):
X, y = check_X_y(X, y)
Expand Down Expand Up @@ -75,6 +89,11 @@ def test_check_estimator():
# check that predict does input validation (doesn't accept dicts in input)
msg = "Estimator doesn't check for NaN and inf in predict"
assert_raises_regex(AssertionError, msg, check_estimator, NoCheckinPredict)
# check that estimator state does not change
# at transform/predict/predict_proba time
msg = 'Estimator changes __dict__ during predict'
assert_raises_regex(AssertionError, msg, check_estimator, ChangesDict)

# check for sparse matrix input handling
name = NoSparseClassifier.__name__
msg = "Estimator " + name + " doesn't seem to fail gracefully on sparse data"
Expand Down