Skip to content

Corrects the forgotten bits of PR #267 #269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jan 24, 2020
Merged

Corrects the forgotten bits of PR #267 #269

merged 16 commits into from
Jan 24, 2020

Conversation

RobinVogel
Copy link
Contributor

See title and PR #267.

@bellet
Copy link
Member

bellet commented Dec 10, 2019

Thanks, LGTM.

Before merging we should fix the mysterious failure that arises. Note that in this PR the failed tests return "assert 3 == 0" (so I guess 3 warnings are raised) while I think before we had "assert 2 == 0"

One way to investigate this if you cannot reproduce the error could be to (temporarily) change the failing test so as to print the warnings in the Travis log?

Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_raise_not_fitted_error_if_not_fitted should also be updated accordingly for quadruplets learners.

And I am not sure we test this for supervised learners anywhere?

Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failing tests have been fixed in #270, so we're just missing the above small modification and this PR will be good to merge

@bellet
Copy link
Member

bellet commented Dec 20, 2019

Some tests are still failing..
If we cannot do otherwise maybe we can simply copy the former check_is_fitted function from sklearn so that we can enforce specific attributes to be set

@RobinVogel
Copy link
Contributor Author

RobinVogel commented Dec 23, 2019

Some tests are still failing..
If we cannot do otherwise maybe we can simply copy the former check_is_fitted function from sklearn so that we can enforce specific attributes to be set

I did not want to test for fitting when calling calibrate_threshold because it calls many other methods. However, the call to self._prepare_inputs would define a preprocessor_ elements that ruins all of the following check_is_fitted. The simple fix is to come back on my choice and use check_is_fitted there also before calling that method.

I modified the code to prevent the user to define a threshold on an unfitted estimator, which gives an attribute threshold_ to it. I think makes sense and it complies with sklearn's assumption that a _ sub/post-fix on an attribute's name corresponds to a fitted estimator.

Please tell me if you believe it to be too restrictive.

@bellet
Copy link
Member

bellet commented Jan 3, 2020

Thanks. I agree with your solution to prevent the user to set the threshold on an unfitted estimator - one additional reason is that fitting will anyway change the threshold through the call to calibrate_threshold.

@RobinVogel Could you quickly double check that all calls to check_if_fitted are actually needed? It now feels like some of them might be redundant

@wdevazelhes Would you mind providing an additional review to make sure these changes are consistent with the previous state of things?

@RobinVogel
Copy link
Contributor Author

RobinVogel commented Jan 6, 2020

I checked it, they are all necessary, there are three cases:

  • a fitted parameter is called later in the method,
  • we set the threshold or check that the estimator is fitted before using the threshold (predict, set_threshold),
  • calibrate_threshold, which sets the attribute preprocessor_ in the function. If I don't test those, the tests in the calls to decision_function pass and I get a "self.components_ doesn't exists error."

@bellet bellet requested a review from wdevazelhes January 9, 2020 20:19
Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RobinVogel No need anymore to check Python version for calls to check_if_fitted, see #273
Sorry for the trouble

@RobinVogel
Copy link
Contributor Author

Ok, one needs to keep in mind that some tests check the number of warnings in scikit-learn, and since I have a former version locally it said something of that type, which raised an AssertionError in the testing process:

{message : FutureWarning('Passing attributes to check_is_fitted is deprecated and will be removed in 0.23. The attributes argument is ignored.'), category : 'FutureWarning', filename : '/home/robin/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py', lineno : 933, line : None}

But it's all good for travis.

@RobinVogel RobinVogel requested a review from bellet January 13, 2020 21:18
@bellet
Copy link
Member

bellet commented Jan 14, 2020

Ok, one needs to keep in mind that some tests check the number of warnings in scikit-learn, and since I have a former version locally it said something of that type, which raised an AssertionError in the testing process:

{message : FutureWarning('Passing attributes to check_is_fitted is deprecated and will be removed in 0.23. The attributes argument is ignored.'), category : 'FutureWarning', filename : '/home/robin/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py', lineno : 933, line : None}

But it's all good for travis.

Yes, I think this is fine as this problem only appears for version 0.22.0

@bellet
Copy link
Member

bellet commented Jan 14, 2020

I checked it, they are all necessary, there are three cases:

* a fitted parameter is called later in the method,

* we set the threshold or check that the estimator is fitted before using the threshold (predict, set_threshold),

* `calibrate_threshold`, which sets the attribute `preprocessor_` in the function. If I don't test those, the tests in the calls to `decision_function` pass and I get a "`self.components_` doesn't exists error."

Naive question: since check_is_fitted accepts attributes again, can we reduce the number of calls by requiring the presence of more attributes in some of the checks?

Copy link
Member

@wdevazelhes wdevazelhes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, sorry for the late review, the PR looks good to me

I modified the code to prevent the user to define a threshold on an unfitted estimator, which gives an attribute threshold_ to it. I think makes sense and it complies with sklearn's assumption that a _ sub/post-fix on an attribute's name corresponds to a fitted estimator.

-> Yes, I think it's reasonable: the choice of a good threshold is clearly data dependent so it doesn't make much sense to set a threshold before fitting

Regarding the problem of possible redundant calls of check_is_fitted, I am losing a bit track of the path taken by all possible methods calls, but in general, I wouldn't care if there is a bit of redundancy/conservatism if it improves simplicity/readability: for instance a systematic check_is_fitted(self) before any method for which the estimator should be fitted first is self-explanatory and allows to not think about which attributes are needed to be fitted in the method.
On the other hand something that could be a good middle ground could be to check_is_fitted an attribute just before using it. It might still lead to redundancies but not too much (e.g.

def m1(self):
    self.m2()
    self.m3() #v3 is checked inside m3 because it's used inside m3, so we don't need to check_is_fitted(self, v3_) here in m1 in the line above this comment, and therefore we save one use of check_is_fitted
    check_is_fitted(self, v1_) #redundant: useless because already checked with m2, but it's easier to read to keep it here
    self.v1 += 1

  def m2(self):
    check_is_fitted(self, v1_)
    self.v1_ += 1

  def m3(self):
      check_is_fitted(self, v3_)
      self.v3_ += 1

). And it would be simple/readable in a way because each check_is_fitted would be done locally where we need it.

@bellet
Copy link
Member

bellet commented Jan 23, 2020

On the other hand something that could be a good middle ground could be to check_is_fitted an attribute just before using it. It might still lead to redundancies but not too much (e.g.

def m1(self):
    self.m2()
    self.m3() #v3 is checked inside m3 because it's used inside m3, so we don't need to check_is_fitted(self, v3_) here in m1 in the line above this comment, and therefore we save one use of check_is_fitted
    check_is_fitted(self, v1_) #redundant: useless because already checked with m2, but it's easier to read to keep it here
    self.v1 += 1

  def m2(self):
    check_is_fitted(self, v1_)
    self.v1_ += 1

  def m3(self):
      check_is_fitted(self, v3_)
      self.v3_ += 1

). And it would be simple/readable in a way because each check_is_fitted would be done locally where we need it.

This sounds like a good idea actually. I agree it is better to prioritize readability over avoiding redundancies.

@RobinVogel maybe you can quickly go through the methods and adjust the calls to check_if_fitted accordingly whenever needed? Then we are good to merge

@RobinVogel
Copy link
Contributor Author

RobinVogel commented Jan 24, 2020

I agree with the convention and I added a missing check_is_fitted to reflect that.

However, I think there's a legitimate few exceptions when ones needs to check that an estimator is fitted before {modifying the threshold, checking that the threshold exists}: in metric_learn/base_metric.py:

  • L. 342 in predict: I want to check that the estimator is fitted before I check the existence of a threshold, since one has to fit before he thresholds. I don't want to call self.decision_function if there's no threshold, since it's wasteful.
  • L. 422 in set_threshold: We forbid people to set the threshold to a unfitted estimator.
  • L. 486 in calibrate_threshold: We forbid people to start computing (even if it's just checking matrix sizes) on an unfitted estimator.

Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot @RobinVogel !

@bellet bellet merged commit 2380f51 into scikit-learn-contrib:master Jan 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants