[WIP] Consistent and informative error message for partial_fit when n_features changes #12465

jeremiedbb · 2018-10-26T13:14:08Z

This is WIP towards it. Extension of #12431

Currently, when n_features changes between calls to partial_fit, for some estimators an error is raised with an informative error message, and for other estimators an error is raised but the message is not very informative (broadcasting error from numpy or bad shape from pairwise, ...).

In addition, for estimators which give an informative error message, the message differs between estimators.

Finally, there is a common test which checks that an estimator raises an error in that case. However, it's only done for classifiers, regressors and clusterers.

This PR fixes these 3 aspects:

add a helper validation function to check if partial_fit is called on input data with appropriate number of features.
modify the common tests to match the error message, ensuring consistency across estimators.
modify the common tests to check partial_fit for all estimators.

jeremiedbb · 2018-10-26T13:18:42Z

There's one place I didn't manage to add the helper function: in MLP.
I can't figure out the fitted component to test X against... Help welcome :)

sklearn-lgtm · 2018-10-26T13:44:37Z

This pull request introduces 1 alert when merging 1b9e6a0 into 9ec5a15 - view on LGTM.com

new alerts:

1 for Unused import

Comment posted by LGTM.com

amueller · 2018-10-26T15:24:22Z

@jeremiedbb .coef_[0]? But if it already has a good error message, maybe just leave it?

jeremiedbb · 2018-10-29T09:25:58Z

But if it already has a good error message, maybe just leave it?

The error message comes from numpy.dot due to bad shape alignment :/

Thanks for the hint. It turns out it's coefs_[0].T.

jeremiedbb · 2018-10-29T09:28:01Z

sklearn/decomposition/tests/test_online_lda.py

-def test_lda_partial_fit_dim_mismatch():
-    # test `n_features` mismatch in `partial_fit`
-    rng = np.random.RandomState(0)
-    n_components = rng.randint(3, 6)


I removed this test because it does exactly what the common test check_estimators_partial_fit_n_features does.

jeremiedbb · 2018-10-29T09:30:28Z

sklearn/decomposition/tests/test_online_lda.py

@@ -202,7 +187,7 @@ def test_lda_transform_mismatch():
                                    random_state=rng)
    lda.partial_fit(X)
    assert_raises_regexp(ValueError, r"^The provided data has",
-                         lda.partial_fit, X_2)
+                         lda.transform, X_2)


I changed that because as is, it was a duplicate of test_lda_partial_fit_dim_mismatch and I think it did not do what it was expected to do, i.e. "# test n_features mismatch in partial_fit and transform"

jeremiedbb · 2018-10-29T16:43:23Z

Why does estimator_checks need to not have a dependency pytest ?

jnothman · 2018-10-30T07:50:51Z

Why does estimator_checks need to not have a dependency pytest ?

We want estimator_checks to be largely backwards compatible for estimator developers. We haven't yet forced them to migrate from nose.

We could consider adding a warning if pytest is not importable in estimator_checks.py and adding the dependency after a period.

jeremiedbb · 2018-10-30T13:59:26Z

Ok, i removed the pytest dependency.

Before I change the status to [MRG], we can discuss if this is really the way we want to do it. Because enforcing the error message match in the common tests will break a lot of sklearn compatible estimators, right ?

jnothman · 2018-10-31T09:34:28Z

Yes, we could deprecate the lenient behaviour instead. But: Elsewhere we have allowed any message that matches one of a few expressions. Or: we can invert the condition: prohibit certain messages, such as "cannot be broadcast" rather than requiring a certain message.

jeremiedbb · 2018-11-08T13:17:55Z

@amueller do you have an opinion on that ?

amueller · 2019-08-05T19:51:45Z

my opinion is #13969.

jeremiedbb · 2020-07-20T15:02:51Z

closing. superseeded by n_features_in

jeremiedbb commented Oct 29, 2018

View reviewed changes

jeremiedbb added 6 commits October 30, 2018 11:01

add check_partial_fit and change in common tests accordingly

352b38c

add check_partial_fit... to every partial_fit method

ef0b872

add mlp

0806287

remove duplicate test + fix transform test

f1bfca3

incremental_pca un-remove component mismatch err msg

6b6b6ed

undo pytest dependency, use assert_raises_regex

627d2ee

jeremiedbb force-pushed the partial_fit_msg branch from 7862836 to 627d2ee Compare October 30, 2018 10:02

amueller added the Waiting for Reviewer label Aug 5, 2019

jeremiedbb closed this Jul 20, 2020

jeremiedbb mentioned this pull request Jul 22, 2020

[MRG] Better error message for MiniBatchKMeans partial_fit when number of features changes #12431

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Consistent and informative error message for partial_fit when n_features changes #12465

[WIP] Consistent and informative error message for partial_fit when n_features changes #12465

jeremiedbb commented Oct 26, 2018

jeremiedbb commented Oct 26, 2018

sklearn-lgtm commented Oct 26, 2018

amueller commented Oct 26, 2018

jeremiedbb commented Oct 29, 2018 •

edited

Loading

jeremiedbb Oct 29, 2018

jeremiedbb Oct 29, 2018 •

edited

Loading

jeremiedbb commented Oct 29, 2018

jnothman commented Oct 30, 2018

jeremiedbb commented Oct 30, 2018

jnothman commented Oct 31, 2018 via email

jeremiedbb commented Nov 8, 2018

amueller commented Aug 5, 2019

jeremiedbb commented Jul 20, 2020

[WIP] Consistent and informative error message for partial_fit when n_features changes #12465

[WIP] Consistent and informative error message for partial_fit when n_features changes #12465

Conversation

jeremiedbb commented Oct 26, 2018

jeremiedbb commented Oct 26, 2018

sklearn-lgtm commented Oct 26, 2018

amueller commented Oct 26, 2018

jeremiedbb commented Oct 29, 2018 • edited Loading

jeremiedbb Oct 29, 2018

Choose a reason for hiding this comment

jeremiedbb Oct 29, 2018 • edited Loading

Choose a reason for hiding this comment

jeremiedbb commented Oct 29, 2018

jnothman commented Oct 30, 2018

jeremiedbb commented Oct 30, 2018

jnothman commented Oct 31, 2018 via email

jeremiedbb commented Nov 8, 2018

amueller commented Aug 5, 2019

jeremiedbb commented Jul 20, 2020

jeremiedbb commented Oct 29, 2018 •

edited

Loading

jeremiedbb Oct 29, 2018 •

edited

Loading