Skip to content

[WIP] Consistent and informative error message for partial_fit when n_features changes #12465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

jeremiedbb
Copy link
Member

This is WIP towards it. Extension of #12431

Currently, when n_features changes between calls to partial_fit, for some estimators an error is raised with an informative error message, and for other estimators an error is raised but the message is not very informative (broadcasting error from numpy or bad shape from pairwise, ...).

In addition, for estimators which give an informative error message, the message differs between estimators.

Finally, there is a common test which checks that an estimator raises an error in that case. However, it's only done for classifiers, regressors and clusterers.

This PR fixes these 3 aspects:

  • add a helper validation function to check if partial_fit is called on input data with appropriate number of features.
  • modify the common tests to match the error message, ensuring consistency across estimators.
  • modify the common tests to check partial_fit for all estimators.

@jeremiedbb
Copy link
Member Author

There's one place I didn't manage to add the helper function: in MLP.
I can't figure out the fitted component to test X against... Help welcome :)

@sklearn-lgtm
Copy link

This pull request introduces 1 alert when merging 1b9e6a0 into 9ec5a15 - view on LGTM.com

new alerts:

  • 1 for Unused import

Comment posted by LGTM.com

@amueller
Copy link
Member

@jeremiedbb .coef_[0]? But if it already has a good error message, maybe just leave it?

@jeremiedbb
Copy link
Member Author

jeremiedbb commented Oct 29, 2018

But if it already has a good error message, maybe just leave it?

The error message comes from numpy.dot due to bad shape alignment :/

Thanks for the hint. It turns out it's coefs_[0].T.

def test_lda_partial_fit_dim_mismatch():
# test `n_features` mismatch in `partial_fit`
rng = np.random.RandomState(0)
n_components = rng.randint(3, 6)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this test because it does exactly what the common test check_estimators_partial_fit_n_features does.

@@ -202,7 +187,7 @@ def test_lda_transform_mismatch():
random_state=rng)
lda.partial_fit(X)
assert_raises_regexp(ValueError, r"^The provided data has",
lda.partial_fit, X_2)
lda.transform, X_2)
Copy link
Member Author

@jeremiedbb jeremiedbb Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed that because as is, it was a duplicate of test_lda_partial_fit_dim_mismatch and I think it did not do what it was expected to do, i.e. "# test n_features mismatch in partial_fit and transform"

@jeremiedbb
Copy link
Member Author

Why does estimator_checks need to not have a dependency pytest ?

@jnothman
Copy link
Member

Why does estimator_checks need to not have a dependency pytest ?

We want estimator_checks to be largely backwards compatible for estimator developers. We haven't yet forced them to migrate from nose.

We could consider adding a warning if pytest is not importable in estimator_checks.py and adding the dependency after a period.

@jeremiedbb
Copy link
Member Author

Ok, i removed the pytest dependency.

Before I change the status to [MRG], we can discuss if this is really the way we want to do it. Because enforcing the error message match in the common tests will break a lot of sklearn compatible estimators, right ?

@jnothman
Copy link
Member

jnothman commented Oct 31, 2018 via email

@jeremiedbb
Copy link
Member Author

@amueller do you have an opinion on that ?

@amueller
Copy link
Member

amueller commented Aug 5, 2019

my opinion is #13969.

@jeremiedbb
Copy link
Member Author

closing. superseeded by n_features_in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants