-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC Adds feature_names_in_ to docstrings #20787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC Adds feature_names_in_ to docstrings #20787
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have locally extended check_dataframe_column_names_consistency
to also check the docstring of the estimator when feature_names_in_
and found the following classes that miss it:
dev ❯ pytest sklearn/tests/test_common.py -v -k test_pandas_column_name_consistency | grep FAILED
sklearn/tests/test_common.py::test_pandas_column_name_consistency[CategoricalNB()] FAILED [ 7%]
sklearn/tests/test_common.py::test_pandas_column_name_consistency[EmpiricalCovariance()] FAILED [ 14%]
sklearn/tests/test_common.py::test_pandas_column_name_consistency[MaxAbsScaler()] FAILED [ 49%]
sklearn/tests/test_common.py::test_pandas_column_name_consistency[Perceptron()] FAILED [ 71%]
sklearn/tests/test_common.py::test_pandas_column_name_consistency[QuantileRegressor()] FAILED [ 74%]
sklearn/tests/test_common.py::test_pandas_column_name_consistency[SGDOneClassSVM()] FAILED [ 83%]
sklearn/tests/test_common.py::test_pandas_column_name_consistency[SelectFpr()] FAILED [ 87%]
sklearn/tests/test_common.py::test_pandas_column_name_consistency[SplineTransformer()] FAILED [ 96%]
dev ❯ git diff
diff --git a/sklearn/utils/estimator_checks.py b/sklearn/utils/estimator_checks.py
index 8a51d8768..a7f135213 100644
--- a/sklearn/utils/estimator_checks.py
+++ b/sklearn/utils/estimator_checks.py
@@ -3702,6 +3702,12 @@ def check_dataframe_column_names_consistency(name, estimator_orig):
)
assert_array_equal(estimator.feature_names_in_, names)
+ if "feature_names_in_" not in estimator.__doc__:
+ raise ValueError(
+ "Estimator does not document its feature_names_in_ "
+ "attribute"
+ )
+
check_methods = []
for method in (
"predict",
I am not sure if we should commit this extension to check the docstring as part of check_dataframe_column_names_consistency
.
I agree it would be good to have a docstring check. I updated this PR with a check that only does the docstring test for sklearn modules: # Only check sklearn estimators for feature_names_in_ in docstring
if estimator_orig.__module__.startswith("sklearn.") and (
"feature_names_in_" not in estimator.__doc__
):
raise ValueError("Estimator does not document its feature_names_in_ attribute") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks very much @thomasjpfan.
I just pushed 2 commits to allow test estimators without docstrings and tell codecov that it's expected that we don't raise the exception when testing scikit-learn.
@lorentzenchr I think it would be great to have this merged to benefit from the extended estimator check when working on the remaining modules in other PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Since I merged some related PRs in |
\o/ |
…_new add backticks (scikit-learn#20914), deprecate **params in fit (scikit-learn#20843), add feature_names_in_ (scikit-learn#20787)
…_new add backticks (scikit-learn#20914), deprecate **params in fit (scikit-learn#20843), add feature_names_in_ (scikit-learn#20787) uncompromisingly reformat plot_oneclass_vs_svdd with black
…_new add backticks (scikit-learn#20914), deprecate **params in fit (scikit-learn#20843), add feature_names_in_ (scikit-learn#20787) uncompromisingly reformat plot_oneclass_vs_svdd with black
…_new add backticks (scikit-learn#20914), deprecate **params in fit (scikit-learn#20843), add feature_names_in_ (scikit-learn#20787) uncompromisingly reformat plot_oneclass_vs_svdd with black
…_new add backticks (scikit-learn#20914), deprecate **params in fit (scikit-learn#20843), add feature_names_in_ (scikit-learn#20787) uncompromisingly reformat plot_oneclass_vs_svdd with black
…_new add backticks (scikit-learn#20914), deprecate **params in fit (scikit-learn#20843), add feature_names_in_ (scikit-learn#20787) uncompromisingly reformat plot_oneclass_vs_svdd with black
Reference Issues/PRs
Follow up to #18010
What does this implement/fix? Explain your changes.
This PR adds
feature_names_in_
to the docstrings of estimators that already have them. (I place the new entry undern_features_in_
.Any other comments?