-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
TST check error consistency when calling get_feature_names_out on unfitted estimator #25223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST check error consistency when calling get_feature_names_out on unfitted estimator #25223
Conversation
…s called before fit
I think this is a good start. I would now create a whitelist such that we can correct one by one (or more than one if it comes with inheritance) the failing estimators. The patch to do so would be: diff --git a/sklearn/tests/test_common.py b/sklearn/tests/test_common.py
index b61e2e2a93..d9352f97fe 100644
--- a/sklearn/tests/test_common.py
+++ b/sklearn/tests/test_common.py
@@ -461,16 +461,60 @@ def test_transformers_get_feature_names_out(transformer):
ESTIMATORS_WITH_GET_FEATURE_NAMES_OUT = [
est for est in _tested_estimators() if hasattr(est, "get_feature_names_out")
]
+WHITELISTED_FAILING_ESTIMATORS = [
+ "AdditiveChi2Sampler",
+ "Binarizer",
+ "DictVectorizer",
+ "GaussianRandomProjection",
+ "GenericUnivariateSelect",
+ "IterativeImputer",
+ "IsotonicRegression",
+ "KBinsDiscretizer",
+ "KNNImputer",
+ "MaxAbsScaler",
+ "MinMaxScaler",
+ "MissingIndicator",
+ "Normalizer",
+ "OrdinalEncoder",
+ "PowerTransformer",
+ "QuantileTransformer",
+ "RFE",
+ "RFECV",
+ "RobustScaler",
+ "SelectFdr",
+ "SelectFpr",
+ "SelectFromModel",
+ "SelectFwe",
+ "SelectKBest",
+ "SelectPercentile",
+ "SequentialFeatureSelector",
+ "SimpleImputer",
+ "SparseRandomProjection",
+ "SplineTransformer",
+ "StackingClassifier",
+ "StackingRegressor",
+ "StandardScaler",
+ "TfidfTransformer",
+ "VarianceThreshold",
+ "VotingClassifier",
+ "VotingRegressor",
+]
@pytest.mark.parametrize(
"estimator", ESTIMATORS_WITH_GET_FEATURE_NAMES_OUT, ids=_get_check_estimator_ids
)
def test_estimators_get_feature_names_out_error(estimator):
+ estimator_name = estimator.__class__.__name__
+ if estimator_name in WHITELISTED_FAILING_ESTIMATORS:
+ return pytest.xfail(
+ reason=f"{estimator_name} is not failing with a consistent NotFittedError"
+ )
+
_set_checking_parameters(estimator)
with ignore_warnings(category=(FutureWarning)):
- check_get_feature_names_out_error(estimator.__class__.__name__, estimator)
+ check_get_feature_names_out_error(estimator_name, estimator)
@pytest.mark.parametrize( |
@glemaitre , Just included the patch in the most recent commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM on my side.
Thanks @glemaitre , should we wait for the PR to be merged or others can begin working on the failing estimators? |
You can always start to work by branching from this branch. You will need to merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @jpangas
Thanks @jeremiedbb to be so quick ;) |
Thanks @glemaitre and @jeremiedbb for approving this quickly. I will now mention the estimators that need to be worked on in issue #24916 so that other contributors can join in. |
…itted estimator (#25223)
Reference Issues/PRs
In issue #24916 , we want to make error messages uniform when calling
get_feature_names_out
beforefit
. To adhere to the uniformity, it was agreed that all estimators should raise aNotFittedError
if they are unfitted.What does this implement/fix? Explain your changes.
To solve the issue, we first needed to identify the estimators that don't raise a
NotFittedError
. Therefore, this PR proposes tests that check if aNotFittedError
is raised in estimators withget_feature_names_out
.Any other comments?
For a particular estimator, the test will pass if a
NotFittedError
is raised byget_feature_names_out
and will fail if any other type of error/exception is raised.In case the test fails, it will show the estimator, the error and which parts of the code led to the error being raised.
The command below can be used to run the tests that check errors generated when
get_feature_names_out
is called beforefit
in all estimators:pytest -vsl sklearn/tests/test_common.py -k estimators_get_feature_names_out_error