-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Instance level common tests #9019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ping @jnothman @GaelVaroquaux @agramfort @vene ;) |
@@ -1608,16 +1634,16 @@ def param_filter(p): | |||
assert_equal(param_value, init_param.default) | |||
|
|||
|
|||
def multioutput_estimator_convert_y_2d(name, y): | |||
def multioutput_estimator_convert_y_2d(estimator, y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one is to make the other patch simpler...
with the tags PR I will simplify test_common a bit and basically just call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(note to self: got up to line ~650)
@@ -204,3 +204,4 @@ def __init__(self): | |||
check_no_fit_attributes_set_in_init, | |||
'estimator_name', | |||
NonConformantEstimator) | |||
check_estimators_unfitted("estimator", CorrectNotFittedErrorClassifier()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge error: line 190 duplicated here
sklearn/utils/estimator_checks.py
Outdated
|
||
errmsg = "Input contains NaN, infinity or a value too large for " \ | ||
"dtype('float64')." | ||
try: | ||
Estimator().fit(X, y) | ||
estimator.fit(X, y) | ||
except ValueError as e: | ||
if str(e) != errmsg: | ||
raise ValueError("Estimator {0} raised warning as expected, but " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(not this PRs fault, but) ambiguous message here: raised warning or error?
sklearn/tests/test_common.py
Outdated
yield _named_check(check, name), name, Estimator | ||
estimator = Estimator() | ||
# check this on class | ||
yield check_no_fit_attributes_set_in_init, name, Estimator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this not tested for meta estimators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now the tests for meta-estimators are not very strong. They will be much stronger with the estimator tags branch merged. I tried to keep this PR small, it doesn't increase test coverage much, but it doesn't decrease it anywhere afaik.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coverage is a good sanity check for this indeed. Sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still be using _named_check
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we just reframe check_no_fit_attributes_set_in_init
as check_no_fit_attributes_set_upon_clone
, perhaps we can get away with running it on an instance too.
sklearn/utils/estimator_checks.py
Outdated
@@ -244,11 +253,20 @@ def check_estimator(Estimator): | |||
Class to check. Estimator is a class object (not an instance). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be an instance now :)
sklearn/utils/estimator_checks.py
Outdated
@@ -314,7 +332,7 @@ def set_testing_parameters(estimator): | |||
# of components of the random matrix projection will be probably | |||
# greater than the number of features. | |||
# So we impose a smaller number (avoid "auto" mode) | |||
estimator.set_params(n_components=1) | |||
estimator.set_params(n_components=8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
necessary now that the dictlearning bug is fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although 1 is very edge-casey here so I'm not opposed to this change even if not strictly necessary. Maybe 2 or so to shave off a few ms
sklearn/utils/estimator_checks.py
Outdated
rng = np.random.RandomState(0) | ||
X = rng.rand(40, 10) | ||
X[X < .8] = 0 | ||
X_csr = sparse.csr_matrix(X) | ||
y = (4 * rng.rand(40)).astype(np.int) | ||
y = multioutput_estimator_convert_y_2d(estimator, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was the absence of this line a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. Hm also why do we need the scaler special cases down there... huh...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scaling/standardizing when with_mean=True
raises an exception rather than densifying the data. I think the decision was that users would ignore warnings and would blow up the memory...
Really the tags for this estimator will be tricky too. Technically with the default args it does not support sparse inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I thought we deprecated that behavior when we introduced MaxAbsScaler but that was as an alternative for MinMaxScaler, not StandardScaler, my bad
# to not check deprecated classes | ||
return | ||
rnd = np.random.RandomState(0) | ||
X = 3 * rnd.uniform(size=(20, 3)) | ||
y = X[:, 0].astype(np.int) | ||
y = multioutput_estimator_convert_y_2d(name, y) | ||
estimator = Estimator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not cloned here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
i don't have the time to review this now, but in terms of testing rigour,
I'm strongly in support of this concept.
…On 7 Jun 2017 3:44 am, "Vlad Niculae" ***@***.***> wrote:
***@***.**** commented on this pull request.
(note to self: got up to line ~650)
------------------------------
In sklearn/utils/tests/test_estimator_checks.py
<#9019 (comment)>
:
> @@ -204,3 +204,4 @@ def __init__(self):
check_no_fit_attributes_set_in_init,
'estimator_name',
NonConformantEstimator)
+ check_estimators_unfitted("estimator", CorrectNotFittedErrorClassifier())
merge error: line 190 duplicated here
------------------------------
In sklearn/utils/estimator_checks.py
<#9019 (comment)>
:
>
errmsg = "Input contains NaN, infinity or a value too large for " \
"dtype('float64')."
try:
- Estimator().fit(X, y)
+ estimator.fit(X, y)
except ValueError as e:
if str(e) != errmsg:
raise ValueError("Estimator {0} raised warning as expected, but "
(not this PRs fault, but) ambiguous message here: raised *warning* or
*error*?
------------------------------
In sklearn/tests/test_common.py
<#9019 (comment)>
:
> @@ -63,8 +64,12 @@ def test_non_meta_estimators():
continue
if name.startswith("_"):
continue
- for check in _yield_all_checks(name, Estimator):
- yield _named_check(check, name), name, Estimator
+ estimator = Estimator()
+ # check this on class
+ yield check_no_fit_attributes_set_in_init, name, Estimator
is this not tested for meta estimators?
------------------------------
In sklearn/utils/estimator_checks.py
<#9019 (comment)>
:
> @@ -244,11 +253,20 @@ def check_estimator(Estimator):
Class to check. Estimator is a class object (not an instance).
Could be an instance now :)
------------------------------
In sklearn/utils/estimator_checks.py
<#9019 (comment)>
:
> @@ -314,7 +332,7 @@ def set_testing_parameters(estimator):
# of components of the random matrix projection will be probably
# greater than the number of features.
# So we impose a smaller number (avoid "auto" mode)
- estimator.set_params(n_components=1)
+ estimator.set_params(n_components=8)
necessary now that the dictlearning bug is fixed?
------------------------------
In sklearn/utils/estimator_checks.py
<#9019 (comment)>
:
> rng = np.random.RandomState(0)
X = rng.rand(40, 10)
X[X < .8] = 0
X_csr = sparse.csr_matrix(X)
y = (4 * rng.rand(40)).astype(np.int)
+ y = multioutput_estimator_convert_y_2d(estimator, y)
was the absence of this line a bug?
------------------------------
In sklearn/utils/estimator_checks.py
<#9019 (comment)>
:
> # to not check deprecated classes
return
rnd = np.random.RandomState(0)
X = 3 * rnd.uniform(size=(20, 3))
y = X[:, 0].astype(np.int)
- y = multioutput_estimator_convert_y_2d(name, y)
- estimator = Estimator()
not cloned here
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9019 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz67ocUgo385CCo-FCaOYX3xSQ-NIEks5sBY_7gaJpZM4Nxndh>
.
|
if args[0] == "self": | ||
# if_delegate_has_method makes methods into functions | ||
# with an explicit "self", so need to shift arguments | ||
args = args[1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, that was pretty subtle! For reference I made this gist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if some clever ass names the first argument something else than self?
It's bad style, and we would probably complain reviewing that code, so I don't think that we should cater for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this only happens for methods that are passed through @if_delegate_has_method
, maybe we could make that decorator raise an error unless its first attribute is called "self", just to avoid subtle breakage in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open issue?
Most of this PR consists of changes of the form:
into:
I checked and we cannot remove the ignore_warnings outright, because cloning a deprecated estimator re-raises the deprecation warning. However, this has the consequence of possibly ignoring more deprecations, because the coverage is wider. This way we might miss, in the future, if we deprecate e.g. StandardScaler or something and it is used in the common tests. |
set_testing_parameters(classifier) | ||
# try to fit | ||
try: | ||
classifier.fit(X_train, y) | ||
except ValueError as e: | ||
if 'class' not in repr(e): | ||
print(error_string_fit, Classifier, e) | ||
print(error_string_fit, classifier, e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this print classifier.__class__
or similar, or is this ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's fine, but I can also change it if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's fine, printout might be more useful the new way.
@@ -1635,13 +1662,13 @@ def check_non_transformer_estimators_n_iter(name, Estimator): | |||
|
|||
# LassoLars stops early for the default alpha=1.0 the iris dataset. | |||
if name == 'LassoLars': | |||
estimator = Estimator(alpha=0.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(not the fault of this PR but) Just curious why can't we check if n_iter_ is at least 0 instead. LassoLars is doing the right thing: on some datasets there is no work to be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel free to open an issue ;)
return | ||
|
||
else: | ||
e = estimator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this logic will move to the caller of this code instead. Nice!
# Check whether an estimator having both decision_function and | ||
# predict_proba methods has outputs with perfect rank correlation. | ||
|
||
centers = [(2, 2), (4, 4)] | ||
X, y = make_blobs(n_samples=100, random_state=0, n_features=4, | ||
centers=centers, cluster_std=1.0, shuffle=True) | ||
X_test = np.random.randn(20, 2) + 4 | ||
estimator = Estimator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
estimator is not cloned here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
We haven't been consistent with how we ignore warnings in the common tests, and some common test outright ignore all warnings. I think we should not worry about deprecations in the common tests for now. |
re:deprecations, sounds good, and we shouldn't let it slow this PR down. Other than the couple of missing |
rng = np.random.RandomState(0) | ||
X = rng.rand(40, 10) | ||
X[X < .8] = 0 | ||
X_csr = sparse.csr_matrix(X) | ||
y = (4 * rng.rand(40)).astype(np.int) | ||
y = multioutput_estimator_convert_y_2d(estimator, y) | ||
for sparse_format in ['csr', 'csc', 'dok', 'lil', 'coo', 'dia', 'bsr']: | ||
X = X_csr.asformat(sparse_format) | ||
# catch deprecation warnings | ||
with ignore_warnings(category=DeprecationWarning): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also here you didn't move the ignore_warnings
to the top
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, but should be fine, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep just triggered my mindless pattern-matching review process :P
79bd264
to
c636b20
Compare
I think I addressed everything. I renamed the check parameters to |
@jnothman this should actually be "easy" to review ;) |
I'm happy now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any difference in runtime due to all the cloning?
You need to add tests to ensure that check_estimator
still works on both class and instance.
You should also add a test to check that the object passed into check_estimator
is completely unchanged afterwards. This will help to ensure that all current and any future checks include a clone.
I'm sorry if you were hoping for me to give you a green light!
sklearn/tests/test_common.py
Outdated
yield _named_check(check, name), name, Estimator | ||
estimator = Estimator() | ||
# check this on class | ||
yield check_no_fit_attributes_set_in_init, name, Estimator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still be using _named_check
, no?
sklearn/tests/test_common.py
Outdated
yield _named_check(check, name), name, Estimator | ||
estimator = Estimator() | ||
# check this on class | ||
yield check_no_fit_attributes_set_in_init, name, Estimator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we just reframe check_no_fit_attributes_set_in_init
as check_no_fit_attributes_set_upon_clone
, perhaps we can get away with running it on an instance too.
doc/whats_new.rst
Outdated
@@ -319,6 +319,11 @@ API changes summary | |||
now only have ``self.estimators_`` available after ``fit``. | |||
:issue:`7464` by `Lars Buitinck`_ and `Loic Esteve`_. | |||
|
|||
- All checks in ``utils.estimator_checks``, in particular | |||
:func:`utils.estimator_checks.check_estimator` now accept estimator | |||
instances. Checks other than ``check_estimator`` do not accept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently untrue for check_no_fit_attributes_set_in_init
.
name = Estimator.__name__ | ||
check_parameters_default_constructible(name, Estimator) | ||
for check in _yield_all_checks(name, Estimator): | ||
if isinstance(Estimator, type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely certain this is the right check, but my attempts to break it have failed so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://stackoverflow.com/questions/395735/how-to-check-whether-a-variable-is-a-class-or-not sorry, should have looked at the date...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to my knowledge and to a quick Google, this is the right way to do it.
sklearn/utils/estimator_checks.py
Outdated
def _yield_non_meta_checks(name, Estimator): | ||
def assert_almost_equal_dense_sparse(x, y, decimal=6, err_msg=''): | ||
if sparse.issparse(x): | ||
assert_array_almost_equal(x.data, y.data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(fwiw, assert_allclose
may be preferable...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name's shorter? (i.e. not obscenely verbose) :D
More seriously, because it allows control in terms of absolute and relative tolerance, which allows you to compare very large numbers, very small numbers, etc., while decimals is irrelevant to the internal number format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I've seen it said somewhere. And I can't recall the arguments put forth, but I like the brief name, and I like that we're able to use rtol
when appropriate, and it's good practice to be thinking about whether absolute or relative tolerance is the issue for a particular case.
@@ -314,7 +336,7 @@ def set_testing_parameters(estimator): | |||
# of components of the random matrix projection will be probably | |||
# greater than the number of features. | |||
# So we impose a smaller number (avoid "auto" mode) | |||
estimator.set_params(n_components=1) | |||
estimator.set_params(n_components=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set_testing_params
becomes problematic now. We will potentially hide bugs by overriding explicit tests of parameter settings. We should only override these if they are currently at their default values.
Thanks for incidentally changing something in this function so that I was reminded that it was an issue!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does it become problematic now, wasn't it problematic to begin with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because before we were changing defaults from what was most useful for a user to what is most practical in testing. Now if a developer explicitly passes a non-default instantiation, we're ignoring their wishes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a suggestion how to resolve that? I hadn't thought about that but agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great point. only override a param if it is different not different from the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, putting this in test_classifiers
or whatever might make sense. Hopefully we don't break too many downstream libraries using check_estimator
this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or we use an ugly hack, like checking if the estimator comes from sklearn ;) but that wouldn't allow us to specify parameters safely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that requires adding a use_fast_parameters
to all of the checks, right?
Alternatively, we could get rid of the function, and have each estimator declare its testing parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no wait, I got it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the conclusion of this discussion (https://github.com/scikit-learn/scikit-learn/pull/9019/files#r120591375)?
sklearn/utils/estimator_checks.py
Outdated
@@ -337,20 +359,24 @@ def _is_32bit(): | |||
return struct.calcsize('P') * 8 == 32 | |||
|
|||
|
|||
def check_estimator_sparse_data(name, Estimator): | |||
def check_estimator_sparse_data(name, estimator_org): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As someone who has worked too much on Named Entity Recognition, I can't read "org" as anything but "organisation". Can we use estimator_orig
or base_estimator
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base_estimator means something different to me. I orig is fine for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for orig
sklearn/utils/estimator_checks.py
Outdated
|
||
|
||
def check_estimators_partial_fit_n_features(name, Alg): | ||
@ignore_warnings(category=DeprecationWarning) | ||
def check_estimators_partial_fit_n_features(name, alg_org): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename alg to estimator?
sklearn/utils/estimator_checks.py
Outdated
def check_no_fit_attributes_set_in_init(name, Estimator): | ||
"""Check that Estimator.__init__ doesn't set trailing-_ attributes.""" | ||
# STILL ON CLASSES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the world of scikit-learn <= 0.18 is forgotten, this comment will look pretty obscure. How about "Takes a class rather than an estimator instance as input"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also it's nice to have fewer comments that yell at the reader 😛
sklearn/utils/estimator_checks.py
Outdated
@@ -1547,6 +1584,7 @@ def check_estimators_data_not_an_array(name, Estimator, X, y): | |||
|
|||
|
|||
def check_parameters_default_constructible(name, Estimator): | |||
# THIS ONE IS STILL ON CLASSES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be doing type(estimator)()
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, but I want to support non-default constructible estimators in check_estimator
, like GridSearchCV
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I guess. At some point in the future, I think we will want a way to specify testing parameters for non-default-constructables.
@jnothman I was hoping for you to nag as much as possible ;) there are some tests on instances and some on classes but I should make that more explicit. |
should be good now. |
LGTM. +1 for merge. Does anybody else wants to comment, or should we merge? |
I should have either tested locally on old scipy, or you should set up CI for prs onto your fork 😛 in my defence I did check with modern scipy. |
e = Estimator() | ||
set_testing_parameters(e) | ||
def check_estimators_empty_data_messages(name, estimator_orig): | ||
e = clone(estimator_orig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e -> estimator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meh, I just saw that there is also est
all over the place. Nevermind.
genuine travis failure on py27 ubuntu; RandomForest fails the pickle test it seems |
fix flake8 error and shorten
can't reproduce :-/ |
…rn into instance_level_tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -314,7 +336,7 @@ def set_testing_parameters(estimator): | |||
# of components of the random matrix projection will be probably | |||
# greater than the number of features. | |||
# So we impose a smaller number (avoid "auto" mode) | |||
estimator.set_params(n_components=1) | |||
estimator.set_params(n_components=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the conclusion of this discussion (https://github.com/scikit-learn/scikit-learn/pull/9019/files#r120591375)?
@@ -870,23 +883,22 @@ def check_estimators_nan_inf(name, Estimator): | |||
for X_train in [X_train_nan, X_train_inf]: | |||
# catch deprecation warnings | |||
with ignore_warnings(category=DeprecationWarning): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now caught at the check function level.
Congrats, @amueller. I suppose we should start testing some metaestimators
etc
We maybe should update the advice to contrib devs too.
…On 10 Jun 2017 12:34 am, "Andreas Mueller" ***@***.***> wrote:
Merged #9019 <#9019>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9019 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6_HKrxw3j4y3kMbDNMUQmxsSkzpyks5sCVfYgaJpZM4Nxndh>
.
|
This is pulling out another part of #8022: the instance level common tests.
This doesn't give a huge boon by itself, but allows better testing of meta-estimators, because now check_estimators is much more flexible!
Mainly #8022 is still insanely big and I though it might be a good idea to split it up.