-
Notifications
You must be signed in to change notification settings - Fork 228
Corrects the forgotten bits of PR #267 #269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks, LGTM. Before merging we should fix the mysterious failure that arises. Note that in this PR the failed tests return "assert 3 == 0" (so I guess 3 warnings are raised) while I think before we had "assert 2 == 0" One way to investigate this if you cannot reproduce the error could be to (temporarily) change the failing test so as to print the warnings in the Travis log? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_raise_not_fitted_error_if_not_fitted
should also be updated accordingly for quadruplets learners.
And I am not sure we test this for supervised learners anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failing tests have been fixed in #270, so we're just missing the above small modification and this PR will be good to merge
Some tests are still failing.. |
I did not want to test for fitting when calling I modified the code to prevent the user to define a threshold on an unfitted estimator, which gives an attribute Please tell me if you believe it to be too restrictive. |
Thanks. I agree with your solution to prevent the user to set the threshold on an unfitted estimator - one additional reason is that fitting will anyway change the threshold through the call to @RobinVogel Could you quickly double check that all calls to @wdevazelhes Would you mind providing an additional review to make sure these changes are consistent with the previous state of things? |
I checked it, they are all necessary, there are three cases:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RobinVogel No need anymore to check Python version for calls to check_if_fitted
, see #273
Sorry for the trouble
Ok, one needs to keep in mind that some tests check the number of warnings in scikit-learn, and since I have a former version locally it said something of that type, which raised an
But it's all good for travis. |
Yes, I think this is fine as this problem only appears for version 0.22.0 |
Naive question: since |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, sorry for the late review, the PR looks good to me
I modified the code to prevent the user to define a threshold on an unfitted estimator, which gives an attribute threshold_ to it. I think makes sense and it complies with sklearn's assumption that a _ sub/post-fix on an attribute's name corresponds to a fitted estimator.
-> Yes, I think it's reasonable: the choice of a good threshold is clearly data dependent so it doesn't make much sense to set a threshold before fitting
Regarding the problem of possible redundant calls of check_is_fitted
, I am losing a bit track of the path taken by all possible methods calls, but in general, I wouldn't care if there is a bit of redundancy/conservatism if it improves simplicity/readability: for instance a systematic check_is_fitted(self)
before any method for which the estimator should be fitted first is self-explanatory and allows to not think about which attributes are needed to be fitted in the method.
On the other hand something that could be a good middle ground could be to check_is_fitted
an attribute just before using it. It might still lead to redundancies but not too much (e.g.
def m1(self):
self.m2()
self.m3() #v3 is checked inside m3 because it's used inside m3, so we don't need to check_is_fitted(self, v3_) here in m1 in the line above this comment, and therefore we save one use of check_is_fitted
check_is_fitted(self, v1_) #redundant: useless because already checked with m2, but it's easier to read to keep it here
self.v1 += 1
def m2(self):
check_is_fitted(self, v1_)
self.v1_ += 1
def m3(self):
check_is_fitted(self, v3_)
self.v3_ += 1
). And it would be simple/readable in a way because each check_is_fitted
would be done locally where we need it.
This sounds like a good idea actually. I agree it is better to prioritize readability over avoiding redundancies. @RobinVogel maybe you can quickly go through the methods and adjust the calls to |
I agree with the convention and I added a missing However, I think there's a legitimate few exceptions when ones needs to check that an estimator is fitted before {modifying the threshold, checking that the threshold exists}: in
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks a lot @RobinVogel !
See title and PR #267.