Skip to content

Is it intention to drop check_estimator_sparse_data in 1.5? #28966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
koaning opened this issue May 7, 2024 · 6 comments · Fixed by #28968
Closed

Is it intention to drop check_estimator_sparse_data in 1.5? #28966

koaning opened this issue May 7, 2024 · 6 comments · Fixed by #28968
Labels
Developer API Third party developer API related Documentation

Comments

@koaning
Copy link

koaning commented May 7, 2024

Describe the bug

While running the 1.5.0 release candidate, my collaborator @FBruzzesi on scikit-lego ran our testing suite and noticed something breaking. Here's the error message:

ImportError while loading conftest '/home/runner/work/scikit-lego/scikit-lego/tests/conftest.py'.
tests/conftest.py:43: in <module>
    estimator_checks.check_estimator_sparse_data,
E   AttributeError: module 'sklearn.utils.estimator_checks' has no attribute 'check_estimator_sparse_data'. Did you mean: 'check_estimator_sparse_array'?

Figured I'd ping and check, did a function get renamed? If so, it feels breaking and I can't recall a warning (but I may be mistaken, please tell me if I missed that). This is mostly just a ping issue so folks are aware, feel free to close if this feels like a false alarm.

Steps/Code to Reproduce

from sklearn.utils import estimator_checks

estimator_checks.check_estimator_sparse_array

Expected Results

No error is thrown.

Actual Results

AttributeError: module 'sklearn.utils.estimator_checks' has no attribute 'check_estimator_sparse_data'. Did you mean: 'check_estimator_sparse_array'?

Versions

This is the v1.5rc01
@jeremiedbb
Copy link
Member

Thanks for the feedback on the RC @koaning !

This check was split into 2 different checks, one for sparse matrices and one for sparse arrays, see #27576. It's kind of a grey area because all these checks are not really public so we may change them without deprecation. I think it can't hurt to at least mention it in the changelog. Note that there's also ongoing work to refactor these checks to expose a third party developer API for which these considerations will be better defined.

@koaning
Copy link
Author

koaning commented May 7, 2024

When is something in the public API? I assumed it was public enough because it publicly accessible (no underscore prefix in the function name). I think we'll be fine with making the switch on the sklego side but there may certainly be other 3rd party packages that use this function because you kind of need to use these testing functions if you want to guarantee that you are compatible to the sklearn API.

@betatim
Copy link
Member

betatim commented May 7, 2024

Agree with Jeremie about it being a gray area.

I think we should follow the normal deprecation policy if we change something related to testing estimators that is directly mentioned in the documentation https://scikit-learn.org/dev/developers/develop.html#rolling-your-own-estimator (Maybe another thing to remember is when we change the checks so that estimators that used to pass suddenly fail)

However the individual checks, what they are called and what they do is probably an "implementation detail".

So I think we should make sure that we steer people towards using things that we want them to use, and away from the other stuff.

@jeremiedbb
Copy link
Member

to me, the official public API is what is officially documented, i.e. in classes.rst. check_estimator is, but individual checks are not. I'd like however to have a third category, developer API, in between the current public/private API. One of its goal is to better define what checks are required to make an estimator scikit-learn compatible, because we are aware that the current check_estimator is too strict (see #13969, #16241).

@koaning
Copy link
Author

koaning commented May 7, 2024

Fair enough, the tests in scikit-lego are pretty old (>6 years) so it could be perfectly fine/normal for us to revamp it. The only small warning is that there may be a fair amount of other plugin packages who are going to notice, but if this is mentioned in the release notes that should suffice.

@adrinjalali
Copy link
Member

We certainly need to have a public set of tests for third party developers. It's along the lines of what @glemaitre has been working on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Developer API Third party developer API related Documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants