Skip to content

ENH Adds n_features_in_ checks to linear and svm modules #18578

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 2, 2021

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Continues #18514

@@ -478,7 +478,7 @@ def predict(self, X):
Returns predicted values.
"""
check_is_fitted(self)

X = self._validate_data(X, accept_sparse='csr', reset=False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach is to not validate if self.estimator_.n_features_in_ is defined and delgate it to self.estimator_.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, think I am in favor to delegating the check to the underlying estimator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are delegating, I think it would good to adjust the error message: #18585 to not include the estimator name.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing this, I am in favor of not checking n_features_int_ when it's missing for any reason (not just stateless estimators). It cuts some useless code complexity and it's likely to be nicer to third party libraries.

@@ -478,7 +478,7 @@ def predict(self, X):
Returns predicted values.
"""
check_is_fitted(self)

X = self._validate_data(X, accept_sparse='csr', reset=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, think I am in favor to delegating the check to the underlying estimator.

@ogrisel
Copy link
Member

ogrisel commented Oct 14, 2020

E               AssertionError: The error message should contain one of the following patterns:
E               X has 1 features, but RANSACRegressor is expecting 2 features as input
E               Got X has 1 features, but Ridge is expecting 2 features as input.

Now that the base_estimator is in charge of doing the check, the common test has to be adapted to reflect this logic.

@thomasjpfan
Copy link
Member Author

WDYT of #18585 where the check expects any name?

@ogrisel
Copy link
Member

ogrisel commented Oct 14, 2020

Ok I merged #18585. Let me resolve the conflict.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM when (hopefully) green.

@ogrisel
Copy link
Member

ogrisel commented Oct 14, 2020

@NicolasHug this one is also ready for quick merge :)

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment but LGTM

@@ -489,10 +490,6 @@ def _validate_for_predict(self, X):
raise ValueError("X.shape[1] = %d should be equal to %d, "
"the number of samples at training time" %
(X.shape[1], self.shape_fit_[0]))
elif not callable(self.kernel) and X.shape[1] != self.shape_fit_[1]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need shape_fit_ then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shape_fit_[0] is still being used in some places of the codebase.

@amueller
Copy link
Member

two approvals and a conflict, do you want to update?

@thomasjpfan
Copy link
Member Author

Synced up PR with master.

@cmarmo
Copy link
Contributor

cmarmo commented Dec 14, 2020

@ogrisel is this PR a good candidate for the final 0.24 release? I believe it just needs to be merged...

@lorentzenchr lorentzenchr merged commit 5946f8b into scikit-learn:master Jan 2, 2021
@glemaitre glemaitre mentioned this pull request Apr 22, 2021
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants