-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
check_estimator is stricter than what is stated in the Estimator API doc #16241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think that setting private attributes in |
Arguably it's hard to write an automated tool that would check what I described above in #16241 (comment) but maybe it's fine if |
So I suppose what we really want to assert is that:
vars(Est(param=val).fit(X)) == vars(Est().set_params(param=val).fit(X))
and
vars(Est(param=val).fit(X))
== vars(Est(param=alternative_val).set_params(param=val).fit(X))
We don't currently have a way to determine a reasonable value (different
from the default) to use in a generalised test.
|
While we figure out a solution to this, should we remove |
Well we have talked about a strict mode for a while. Maybe here is the
place to implement it.
|
Hello guys! Is there any progress on this? Will it be possible to resolve this issue by next release? |
Thank you for 0.22+ support preparing :) |
A pull request implementing a "strict" tags is welcome, which would enforce
adherence to standards we apply within the core library, but which are not
strictly necessary elsewhere.
|
+1 to prioritizing this. Fixing unintentional breakage of downstream packages should rank pretty high on the list of todos, IMHO. |
@h-vetinari As mentioned above, a PR for the stict option for check_estimator would be very welcome. I would be happy to review it if someone makes it. There was some related work in #14929 that is now stalled. |
@rth I do not know the code-base, so I don't know where to start or what would be necessary. However, I'd be willing to try with some pointers of what needs to be done. |
Can you explain how this is breaking workflows for downstream users? In my understanding the issue is that running
The general idea is to add a |
Ahh, I see the discussion is from conda-forge/lightgbm-feedstock#21 . You can just skip the failing test IMO. |
I'm just a middleman here (or rather end-user, trying to minimize the risk of silent breakages in my projects). From what I can make out of this comment from @StrikerRUS in microsoft/LightGBM#2628, I think it's that setting private attributes in I can't judge the extent of the breakage, but it seems the lightgbm-folks feel strongly enough about it that they felt the need to preemptively state in their docs:
I'm just trying to highlight this issue again. |
PS. Thanks for taking a look at it again, it's not taken for granted. I cannot promise anything regarding a PR, but I'll try to look into it. |
We have tried to be accommodating to the capacity we have, regarding that specific lightgbm issue. Things take time and hopefully soon they'll be fixed. But as you say, it's an open source community with people volunteering their time to work on it. If people really need to use sklearn AND lightgbm, they can use the previous version. And if people really think such regressions "should" be considered high priority, they or their employer could support its development in one way or another, other than "why aren't you fixing it". I'd really appreciate if you could avoid such a tone here. |
No it's fine @adrinjalali , there have been indeed been some miscommunication between this issue and microsoft/LightGBM#2628 which made it sound like scikit-learn 0.22 made a regression which prevents it from being used with lightgbm. There was a regression but it was fixed, this issue is benign as far as I can tell and lightgbm should just ignore the failing common test as I explained in microsoft/LightGBM#2628 (comment) . (As a side note I finally understand why so many users wanted this obscure and non critical change applied). Thanks for pinging us on this @h-vetinari! |
Tone unfortunately doesn't transmit well in a written medium. I'm sorry if it came across as demanding, I was in fact trying to negate this with the caveat (i.e. just calmly saying that the perceived priority should IMO be higher, regardless of how quickly that translates into available resources). And I did offer to look into it myself, although I'd be starting from scratch (aside from the kind pointers by @rth). Hope that clears the air a bit, so to speak. |
Moving milestone to the next major version (which is currently named 1.0). |
Moving to 1.1, but happy to have this in if @glemaitre 's PR gets in fast. |
very unlikely to be part of 1.1, moving to 1.2 |
A lot of the discussion in this issue is around setting private attributes in So maybe we need an update regarding the problem that needs solving. |
This issue was raised in a discussion regarding the LightGBM scikit-learn compatible estimators: microsoft/LightGBM#2628 (comment)
The problem is that check_estimator complains about private attributes set in the
__init__
of a scikit-learn estimator while our documentation just state the following (while not explicitly prohibiting setting private attributes in__init__
):from: https://scikit-learn.org/0.22/developers/develop.html#instantiation
The strict check is defined in
sklearn.utils.estimator_checks.check_no_attributes_set_in_init
.The text was updated successfully, but these errors were encountered: