Skip to content

DOC Adds feature_names_in_ to docstrings #20787

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Aug 26, 2021

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Follow up to #18010

What does this implement/fix? Explain your changes.

This PR adds feature_names_in_ to the docstrings of estimators that already have them. (I place the new entry under n_features_in_.

Any other comments?

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have locally extended check_dataframe_column_names_consistency to also check the docstring of the estimator when feature_names_in_ and found the following classes that miss it:

devpytest sklearn/tests/test_common.py -v -k test_pandas_column_name_consistency | grep FAILED
sklearn/tests/test_common.py::test_pandas_column_name_consistency[CategoricalNB()] FAILED [  7%]
sklearn/tests/test_common.py::test_pandas_column_name_consistency[EmpiricalCovariance()] FAILED [ 14%]
sklearn/tests/test_common.py::test_pandas_column_name_consistency[MaxAbsScaler()] FAILED [ 49%]
sklearn/tests/test_common.py::test_pandas_column_name_consistency[Perceptron()] FAILED [ 71%]
sklearn/tests/test_common.py::test_pandas_column_name_consistency[QuantileRegressor()] FAILED [ 74%]
sklearn/tests/test_common.py::test_pandas_column_name_consistency[SGDOneClassSVM()] FAILED [ 83%]
sklearn/tests/test_common.py::test_pandas_column_name_consistency[SelectFpr()] FAILED [ 87%]
sklearn/tests/test_common.py::test_pandas_column_name_consistency[SplineTransformer()] FAILED [ 96%]

devgit diff
diff --git a/sklearn/utils/estimator_checks.py b/sklearn/utils/estimator_checks.py
index 8a51d8768..a7f135213 100644
--- a/sklearn/utils/estimator_checks.py
+++ b/sklearn/utils/estimator_checks.py
@@ -3702,6 +3702,12 @@ def check_dataframe_column_names_consistency(name, estimator_orig):
         )
     assert_array_equal(estimator.feature_names_in_, names)
 
+    if "feature_names_in_" not in estimator.__doc__:
+        raise ValueError(
+            "Estimator does not document its feature_names_in_ "
+            "attribute"
+        )
+
     check_methods = []
     for method in (
         "predict",

I am not sure if we should commit this extension to check the docstring as part of check_dataframe_column_names_consistency.

@thomasjpfan
Copy link
Member Author

I am not sure if we should commit this extension to check the docstring as part of check_dataframe_column_names_consistency.

I agree it would be good to have a docstring check. I updated this PR with a check that only does the docstring test for sklearn modules:

    # Only check sklearn estimators for feature_names_in_ in docstring
    if estimator_orig.__module__.startswith("sklearn.") and (
        "feature_names_in_" not in estimator.__doc__
    ):
        raise ValueError("Estimator does not document its feature_names_in_ attribute")

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks very much @thomasjpfan.

I just pushed 2 commits to allow test estimators without docstrings and tell codecov that it's expected that we don't raise the exception when testing scikit-learn.

@ogrisel
Copy link
Member

ogrisel commented Aug 25, 2021

@lorentzenchr I think it would be great to have this merged to benefit from the extended estimator check when working on the remaining modules in other PRs.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ogrisel
Copy link
Member

ogrisel commented Aug 26, 2021

Since I merged some related PRs in main we have new failures. Working on it.

@lorentzenchr lorentzenchr merged commit 786c4e5 into scikit-learn:main Aug 26, 2021
@ogrisel
Copy link
Member

ogrisel commented Aug 26, 2021

\o/

ivannz added a commit to ivannz/scikit-learn that referenced this pull request Nov 10, 2021
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Nov 10, 2021
…_new

add backticks (scikit-learn#20914), deprecate **params in fit (scikit-learn#20843), add feature_names_in_ (scikit-learn#20787)
uncompromisingly reformat plot_oneclass_vs_svdd with black
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
ivannz added a commit to ivannz/scikit-learn that referenced this pull request May 15, 2022
…_new

add backticks (scikit-learn#20914), deprecate **params in fit (scikit-learn#20843), add feature_names_in_ (scikit-learn#20787)
uncompromisingly reformat plot_oneclass_vs_svdd with black
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Jun 14, 2022
…_new

add backticks (scikit-learn#20914), deprecate **params in fit (scikit-learn#20843), add feature_names_in_ (scikit-learn#20787)
uncompromisingly reformat plot_oneclass_vs_svdd with black
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Aug 29, 2022
…_new

add backticks (scikit-learn#20914), deprecate **params in fit (scikit-learn#20843), add feature_names_in_ (scikit-learn#20787)
uncompromisingly reformat plot_oneclass_vs_svdd with black
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Sep 5, 2022
…_new

add backticks (scikit-learn#20914), deprecate **params in fit (scikit-learn#20843), add feature_names_in_ (scikit-learn#20787)
uncompromisingly reformat plot_oneclass_vs_svdd with black
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants