ENH Adds n_features_in_ checks to linear and svm modules #18578

thomasjpfan · 2020-10-09T04:06:42Z

Reference Issues/PRs

Continues #18514

thomasjpfan · 2020-10-09T04:09:18Z

sklearn/linear_model/_ransac.py

@@ -478,7 +478,7 @@ def predict(self, X):
            Returns predicted values.
        """
        check_is_fitted(self)
-
+        X = self._validate_data(X, accept_sparse='csr', reset=False)


Another approach is to not validate if self.estimator_.n_features_in_ is defined and delgate it to self.estimator_.

Indeed, think I am in favor to delegating the check to the underlying estimator.

If we are delegating, I think it would good to adjust the error message: #18585 to not include the estimator name.

sklearn/linear_model/_logistic.py

ogrisel

After reviewing this, I am in favor of not checking n_features_int_ when it's missing for any reason (not just stateless estimators). It cuts some useless code complexity and it's likely to be nicer to third party libraries.

sklearn/linear_model/_base.py

ogrisel · 2020-10-09T16:26:07Z

sklearn/linear_model/_ransac.py

@@ -478,7 +478,7 @@ def predict(self, X):
            Returns predicted values.
        """
        check_is_fitted(self)
-
+        X = self._validate_data(X, accept_sparse='csr', reset=False)


Indeed, think I am in favor to delegating the check to the underlying estimator.

sklearn/linear_model/_stochastic_gradient.py

ogrisel · 2020-10-14T13:35:50Z

E               AssertionError: The error message should contain one of the following patterns:
E               X has 1 features, but RANSACRegressor is expecting 2 features as input
E               Got X has 1 features, but Ridge is expecting 2 features as input.

Now that the base_estimator is in charge of doing the check, the common test has to be adapted to reflect this logic.

thomasjpfan · 2020-10-14T13:56:30Z

WDYT of #18585 where the check expects any name?

ogrisel · 2020-10-14T14:49:28Z

Ok I merged #18585. Let me resolve the conflict.

ogrisel

LGTM when (hopefully) green.

ogrisel · 2020-10-14T17:02:14Z

@NicolasHug this one is also ready for quick merge :)

NicolasHug

minor comment but LGTM

sklearn/linear_model/_stochastic_gradient.py

NicolasHug · 2020-10-14T19:11:29Z

sklearn/svm/_base.py

@@ -489,10 +490,6 @@ def _validate_for_predict(self, X):
                raise ValueError("X.shape[1] = %d should be equal to %d, "
                                 "the number of samples at training time" %
                                 (X.shape[1], self.shape_fit_[0]))
-        elif not callable(self.kernel) and X.shape[1] != self.shape_fit_[1]:


do we still need shape_fit_ then?

shape_fit_[0] is still being used in some places of the codebase.

amueller · 2020-11-20T22:30:01Z

two approvals and a conflict, do you want to update?

thomasjpfan · 2020-11-23T20:07:37Z

Synced up PR with master.

cmarmo · 2020-12-14T14:02:38Z

@ogrisel is this PR a good candidate for the final 0.24 release? I believe it just needs to be merged...

thomasjpfan added 2 commits October 9, 2020 00:02

ENH Adds n_features_in_ checks to linear module

83a538c

REV Reduces diff

61ba6d5

github-actions bot added module:ensemble module:linear_model module:svm labels Oct 9, 2020

thomasjpfan commented Oct 9, 2020

View reviewed changes

ogrisel self-requested a review October 9, 2020 14:39

ogrisel mentioned this pull request Oct 9, 2020

CLN Only check for n_features_in_ only when it exists #18011

Merged

ogrisel reviewed Oct 9, 2020

View reviewed changes

thomasjpfan mentioned this pull request Oct 9, 2020

MNT Uses a less strict error message for n_features_in_ checks #18585

Merged

thomasjpfan added 3 commits October 13, 2020 16:47

Merge remote-tracking branch 'upstream/master' into n_feature_in_svm

345d432

CLN Remove unreachable code

6b44811

REV Reduces diff

7201313

Expect base estimator names

81b3fc7

Merge branch 'master' into n_feature_in_svm

5c8aae9

ogrisel approved these changes Oct 14, 2020

View reviewed changes

NicolasHug approved these changes Oct 14, 2020

View reviewed changes

Merge remote-tracking branch 'upstream/master' into n_feature_in_svm

7d15672

Merge remote-tracking branch 'upstream/master' into n_feature_in_svm

926bf4c

lorentzenchr approved these changes Jan 2, 2021

View reviewed changes

lorentzenchr merged commit 5946f8b into scikit-learn:master Jan 2, 2021

glemaitre mentioned this pull request Apr 22, 2021

Release 0.24.2 #19954

Merged

12 tasks

Uh oh!

ENH Adds n_features_in_ checks to linear and svm modules #18578

ENH Adds n_features_in_ checks to linear and svm modules #18578

Uh oh!

Conversation

thomasjpfan commented Oct 9, 2020

Reference Issues/PRs

Uh oh!

thomasjpfan Oct 9, 2020

Choose a reason for hiding this comment

Uh oh!

ogrisel Oct 9, 2020

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Oct 10, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel Oct 9, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel commented Oct 14, 2020

Uh oh!

thomasjpfan commented Oct 14, 2020

Uh oh!

ogrisel commented Oct 14, 2020

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Oct 14, 2020

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NicolasHug Oct 14, 2020

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Oct 15, 2020

Choose a reason for hiding this comment

Uh oh!

amueller commented Nov 20, 2020

Uh oh!

thomasjpfan commented Nov 23, 2020

Uh oh!

cmarmo commented Dec 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cmarmo commented Dec 14, 2020 •

edited

Loading