TST check error consistency when calling get_feature_names_out on unfitted estimator #25223

jpangas · 2022-12-22T12:01:52Z

Reference Issues/PRs

In issue #24916 , we want to make error messages uniform when calling get_feature_names_out before fit. To adhere to the uniformity, it was agreed that all estimators should raise a NotFittedError if they are unfitted.

What does this implement/fix? Explain your changes.

To solve the issue, we first needed to identify the estimators that don't raise a NotFittedError. Therefore, this PR proposes tests that check if a NotFittedError is raised in estimators with get_feature_names_out.

Any other comments?

For a particular estimator, the test will pass if a NotFittedError is raised by get_feature_names_out and will fail if any other type of error/exception is raised.
In case the test fails, it will show the estimator, the error and which parts of the code led to the error being raised.
The command below can be used to run the tests that check errors generated when get_feature_names_out is called before fit in all estimators:
pytest -vsl sklearn/tests/test_common.py -k estimators_get_feature_names_out_error

…s called before fit

glemaitre · 2022-12-23T16:04:14Z

I think this is a good start. I would now create a whitelist such that we can correct one by one (or more than one if it comes with inheritance) the failing estimators. The patch to do so would be:

diff --git a/sklearn/tests/test_common.py b/sklearn/tests/test_common.py
index b61e2e2a93..d9352f97fe 100644
--- a/sklearn/tests/test_common.py
+++ b/sklearn/tests/test_common.py
@@ -461,16 +461,60 @@ def test_transformers_get_feature_names_out(transformer):
 ESTIMATORS_WITH_GET_FEATURE_NAMES_OUT = [
     est for est in _tested_estimators() if hasattr(est, "get_feature_names_out")
 ]
+WHITELISTED_FAILING_ESTIMATORS = [
+    "AdditiveChi2Sampler",
+    "Binarizer",
+    "DictVectorizer",
+    "GaussianRandomProjection",
+    "GenericUnivariateSelect",
+    "IterativeImputer",
+    "IsotonicRegression",
+    "KBinsDiscretizer",
+    "KNNImputer",
+    "MaxAbsScaler",
+    "MinMaxScaler",
+    "MissingIndicator",
+    "Normalizer",
+    "OrdinalEncoder",
+    "PowerTransformer",
+    "QuantileTransformer",
+    "RFE",
+    "RFECV",
+    "RobustScaler",
+    "SelectFdr",
+    "SelectFpr",
+    "SelectFromModel",
+    "SelectFwe",
+    "SelectKBest",
+    "SelectPercentile",
+    "SequentialFeatureSelector",
+    "SimpleImputer",
+    "SparseRandomProjection",
+    "SplineTransformer",
+    "StackingClassifier",
+    "StackingRegressor",
+    "StandardScaler",
+    "TfidfTransformer",
+    "VarianceThreshold",
+    "VotingClassifier",
+    "VotingRegressor",
+]
 
 
 @pytest.mark.parametrize(
     "estimator", ESTIMATORS_WITH_GET_FEATURE_NAMES_OUT, ids=_get_check_estimator_ids
 )
 def test_estimators_get_feature_names_out_error(estimator):
+    estimator_name = estimator.__class__.__name__
+    if estimator_name in WHITELISTED_FAILING_ESTIMATORS:
+        return pytest.xfail(
+            reason=f"{estimator_name} is not failing with a consistent NotFittedError"
+        )
+
     _set_checking_parameters(estimator)
 
     with ignore_warnings(category=(FutureWarning)):
-        check_get_feature_names_out_error(estimator.__class__.__name__, estimator)
+        check_get_feature_names_out_error(estimator_name, estimator)
 
 
 @pytest.mark.parametrize(

jpangas · 2022-12-24T11:15:29Z

@glemaitre , Just included the patch in the most recent commit.

glemaitre

Thanks. LGTM on my side.

jpangas · 2022-12-28T09:39:05Z

Thanks @glemaitre , should we wait for the PR to be merged or others can begin working on the failing estimators?

glemaitre · 2022-12-28T09:53:52Z

You can always start to work by branching from this branch. You will need to merge main afterward when this PR will be merged. I will try to find a reviewer.

jeremiedbb

LGTM. Thanks @jpangas

glemaitre · 2022-12-28T13:21:21Z

Thanks @jeremiedbb to be so quick ;)
@jpangas people can therefore start branching from main.

jpangas · 2022-12-28T13:45:58Z

Thanks @glemaitre and @jeremiedbb for approving this quickly. I will now mention the estimators that need to be worked on in issue #24916 so that other contributors can join in.

…itted estimator (scikit-learn#25223)

…itted estimator (#25223)

Add tests that check the error generated when get_feature_names_out i…

332168b

…s called before fit

github-actions bot added the module:utils label Dec 22, 2022

Merge branch 'main' into test_estimators_get_feature_names_out_error

625ca05

glemaitre self-requested a review December 23, 2022 15:51

glemaitre changed the title ~~ENH Add tests to check error in get_feature_names_out when called before fit~~ TST check error consistency when calling get_feature_names_out on unfitted estimator Dec 23, 2022

glemaitre added the No Changelog Needed label Dec 23, 2022

glemaitre removed their request for review December 23, 2022 16:07

glemaitre mentioned this pull request Dec 23, 2022

ENH Get list of estimators not raising NotFittedError #25221

Closed

Include Xfail for estimators not raising NotFittedError

c6fff7c

glemaitre approved these changes Dec 26, 2022

View reviewed changes

jeremiedbb and others added 2 commits December 28, 2022 11:25

only ignore future warnings

2696375

Merge branch 'main' into test_estimators_get_feature_names_out_error

21b7596

glemaitre mentioned this pull request Dec 28, 2022

MAINT make AdditiveChi2Sampler stateless and check that stateless Transformers don't raise NotFittedError #25190

Merged

jeremiedbb approved these changes Dec 28, 2022

View reviewed changes

jeremiedbb merged commit 1280e09 into scikit-learn:main Dec 28, 2022

jpangas deleted the test_estimators_get_feature_names_out_error branch December 28, 2022 20:46

jpangas mentioned this pull request Jan 1, 2023

Make error message uniform when calling get_feature_names_out before fit #24916

Closed

36 tasks

jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 3, 2023

TST check error consistency when calling get_feature_names_out on unf…

9518f6b

…itted estimator (scikit-learn#25223)

jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023

TST check error consistency when calling get_feature_names_out on unf…

8f0b306

…itted estimator (scikit-learn#25223)

jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023

TST check error consistency when calling get_feature_names_out on unf…

ad4f90e

…itted estimator (scikit-learn#25223)

jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 23, 2023

TST check error consistency when calling get_feature_names_out on unf…

ad53590

…itted estimator (scikit-learn#25223)

adrinjalali pushed a commit that referenced this pull request Jan 24, 2023

TST check error consistency when calling get_feature_names_out on unf…

54035a4

…itted estimator (#25223)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST check error consistency when calling get_feature_names_out on unfitted estimator #25223

TST check error consistency when calling get_feature_names_out on unfitted estimator #25223

jpangas commented Dec 22, 2022 •

edited

Loading

glemaitre commented Dec 23, 2022 •

edited

Loading

jpangas commented Dec 24, 2022

glemaitre left a comment

jpangas commented Dec 28, 2022

glemaitre commented Dec 28, 2022

jeremiedbb left a comment

glemaitre commented Dec 28, 2022

jpangas commented Dec 28, 2022

TST check error consistency when calling get_feature_names_out on unfitted estimator #25223

TST check error consistency when calling get_feature_names_out on unfitted estimator #25223

Conversation

jpangas commented Dec 22, 2022 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

glemaitre commented Dec 23, 2022 • edited Loading

jpangas commented Dec 24, 2022

glemaitre left a comment

Choose a reason for hiding this comment

jpangas commented Dec 28, 2022

glemaitre commented Dec 28, 2022

jeremiedbb left a comment

Choose a reason for hiding this comment

glemaitre commented Dec 28, 2022

jpangas commented Dec 28, 2022

jpangas commented Dec 22, 2022 •

edited

Loading

glemaitre commented Dec 23, 2022 •

edited

Loading