Corrects the forgotten bits of PR #267 #269

RobinVogel · 2019-12-10T10:30:04Z

See title and PR #267.

bellet · 2019-12-10T10:58:25Z

Thanks, LGTM.

Before merging we should fix the mysterious failure that arises. Note that in this PR the failed tests return "assert 3 == 0" (so I guess 3 warnings are raised) while I think before we had "assert 2 == 0"

One way to investigate this if you cannot reproduce the error could be to (temporarily) change the failing test so as to print the warnings in the Travis log?

bellet

The test test_raise_not_fitted_error_if_not_fitted should also be updated accordingly for quadruplets learners.

And I am not sure we test this for supervised learners anywhere?

test/test_pairs_classifiers.py

metric_learn/base_metric.py

bellet

The failing tests have been fixed in #270, so we're just missing the above small modification and this PR will be good to merge

…ic-learn into c_issue_255

bellet · 2019-12-20T18:51:22Z

Some tests are still failing..
If we cannot do otherwise maybe we can simply copy the former check_is_fitted function from sklearn so that we can enforce specific attributes to be set

RobinVogel · 2019-12-23T10:37:13Z

Some tests are still failing..
If we cannot do otherwise maybe we can simply copy the former check_is_fitted function from sklearn so that we can enforce specific attributes to be set

I did not want to test for fitting when calling calibrate_threshold because it calls many other methods. However, the call to self._prepare_inputs would define a preprocessor_ elements that ruins all of the following check_is_fitted. The simple fix is to come back on my choice and use check_is_fitted there also before calling that method.

I modified the code to prevent the user to define a threshold on an unfitted estimator, which gives an attribute threshold_ to it. I think makes sense and it complies with sklearn's assumption that a _ sub/post-fix on an attribute's name corresponds to a fitted estimator.

Please tell me if you believe it to be too restrictive.

bellet · 2020-01-03T09:56:28Z

Thanks. I agree with your solution to prevent the user to set the threshold on an unfitted estimator - one additional reason is that fitting will anyway change the threshold through the call to calibrate_threshold.

@RobinVogel Could you quickly double check that all calls to check_if_fitted are actually needed? It now feels like some of them might be redundant

@wdevazelhes Would you mind providing an additional review to make sure these changes are consistent with the previous state of things?

RobinVogel · 2020-01-06T17:59:58Z

I checked it, they are all necessary, there are three cases:

a fitted parameter is called later in the method,
we set the threshold or check that the estimator is fitted before using the threshold (predict, set_threshold),
calibrate_threshold, which sets the attribute preprocessor_ in the function. If I don't test those, the tests in the calls to decision_function pass and I get a "self.components_ doesn't exists error."

bellet

@RobinVogel No need anymore to check Python version for calls to check_if_fitted, see #273
Sorry for the trouble

…ic-learn into c_issue_255

RobinVogel · 2020-01-13T21:18:02Z

Ok, one needs to keep in mind that some tests check the number of warnings in scikit-learn, and since I have a former version locally it said something of that type, which raised an AssertionError in the testing process:

{message : FutureWarning('Passing attributes to check_is_fitted is deprecated and will be removed in 0.23. The attributes argument is ignored.'), category : 'FutureWarning', filename : '/home/robin/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py', lineno : 933, line : None}

But it's all good for travis.

bellet · 2020-01-14T07:29:38Z

Ok, one needs to keep in mind that some tests check the number of warnings in scikit-learn, and since I have a former version locally it said something of that type, which raised an AssertionError in the testing process:
{message : FutureWarning('Passing attributes to check_is_fitted is deprecated and will be removed in 0.23. The attributes argument is ignored.'), category : 'FutureWarning', filename : '/home/robin/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py', lineno : 933, line : None}
But it's all good for travis.

Yes, I think this is fine as this problem only appears for version 0.22.0

bellet · 2020-01-14T07:36:55Z

I checked it, they are all necessary, there are three cases:

* a fitted parameter is called later in the method,

* we set the threshold or check that the estimator is fitted before using the threshold (predict, set_threshold),

* `calibrate_threshold`, which sets the attribute `preprocessor_` in the function. If I don't test those, the tests in the calls to `decision_function` pass and I get a "`self.components_` doesn't exists error."

Naive question: since check_is_fitted accepts attributes again, can we reduce the number of calls by requiring the presence of more attributes in some of the checks?

wdevazelhes

Hi, sorry for the late review, the PR looks good to me

I modified the code to prevent the user to define a threshold on an unfitted estimator, which gives an attribute threshold_ to it. I think makes sense and it complies with sklearn's assumption that a _ sub/post-fix on an attribute's name corresponds to a fitted estimator.

-> Yes, I think it's reasonable: the choice of a good threshold is clearly data dependent so it doesn't make much sense to set a threshold before fitting

Regarding the problem of possible redundant calls of check_is_fitted, I am losing a bit track of the path taken by all possible methods calls, but in general, I wouldn't care if there is a bit of redundancy/conservatism if it improves simplicity/readability: for instance a systematic check_is_fitted(self) before any method for which the estimator should be fitted first is self-explanatory and allows to not think about which attributes are needed to be fitted in the method.
On the other hand something that could be a good middle ground could be to check_is_fitted an attribute just before using it. It might still lead to redundancies but not too much (e.g.

def m1(self):
    self.m2()
    self.m3() #v3 is checked inside m3 because it's used inside m3, so we don't need to check_is_fitted(self, v3_) here in m1 in the line above this comment, and therefore we save one use of check_is_fitted
    check_is_fitted(self, v1_) #redundant: useless because already checked with m2, but it's easier to read to keep it here
    self.v1 += 1

  def m2(self):
    check_is_fitted(self, v1_)
    self.v1_ += 1

  def m3(self):
      check_is_fitted(self, v3_)
      self.v3_ += 1

). And it would be simple/readable in a way because each check_is_fitted would be done locally where we need it.

bellet · 2020-01-23T08:56:43Z

On the other hand something that could be a good middle ground could be to check_is_fitted an attribute just before using it. It might still lead to redundancies but not too much (e.g.
def m1(self):
    self.m2()
    self.m3() #v3 is checked inside m3 because it's used inside m3, so we don't need to check_is_fitted(self, v3_) here in m1 in the line above this comment, and therefore we save one use of check_is_fitted
    check_is_fitted(self, v1_) #redundant: useless because already checked with m2, but it's easier to read to keep it here
    self.v1 += 1

  def m2(self):
    check_is_fitted(self, v1_)
    self.v1_ += 1

  def m3(self):
      check_is_fitted(self, v3_)
      self.v3_ += 1
). And it would be simple/readable in a way because each check_is_fitted would be done locally where we need it.

This sounds like a good idea actually. I agree it is better to prioritize readability over avoiding redundancies.

@RobinVogel maybe you can quickly go through the methods and adjust the calls to check_if_fitted accordingly whenever needed? Then we are good to merge

…ic-learn into c_issue_255

RobinVogel · 2020-01-24T08:34:36Z

I agree with the convention and I added a missing check_is_fitted to reflect that.

However, I think there's a legitimate few exceptions when ones needs to check that an estimator is fitted before {modifying the threshold, checking that the threshold exists}: in metric_learn/base_metric.py:

L. 342 in predict: I want to check that the estimator is fitted before I check the existence of a threshold, since one has to fit before he thresholds. I don't want to call self.decision_function if there's no threshold, since it's wasteful.
L. 422 in set_threshold: We forbid people to set the threshold to a unfitted estimator.
L. 486 in calibrate_threshold: We forbid people to start computing (even if it's just checking matrix sizes) on an unfitted estimator.

bellet

LGTM, thanks a lot @RobinVogel !

RobinVogel added 5 commits November 29, 2019 16:24

maj

871731e

maj

5f4def7

Merge branch 'master' of https://github.com/scikit-learn-contrib/metr…

a461ef7

…ic-learn

corrected PR 267

3d50a37

trailing whitespace

397808e

RobinVogel mentioned this pull request Dec 10, 2019

More systematic checks that an estimator was fit before using its parameters #267

Merged

bellet requested changes Dec 10, 2019

View reviewed changes

test/test_pairs_classifiers.py Show resolved Hide resolved

test calibrate_threshold, test predict

65e2e83

bellet reviewed Dec 13, 2019

View reviewed changes

metric_learn/base_metric.py Show resolved Hide resolved

bellet approved these changes Dec 13, 2019

View reviewed changes

RobinVogel added 2 commits December 20, 2019 11:12

Merge branch 'master' of https://github.com/scikit-learn-contrib/metr…

1c30408

…ic-learn into c_issue_255

maj

560d3e0

RobinVogel added 2 commits December 23, 2019 12:19

Checks estimator is fitted before set threshold

832d3e0

correct failed tests with MockBadClassifier

30106e7

bellet requested a review from wdevazelhes January 9, 2020 20:19

bellet requested changes Jan 13, 2020

View reviewed changes

RobinVogel added 3 commits January 13, 2020 21:43

Merge branch 'master' of https://github.com/scikit-learn-contrib/metr…

0d38349

…ic-learn into c_issue_255

remove checks

d52575f

forgot one

ed609dd

RobinVogel requested a review from bellet January 13, 2020 21:18

wdevazelhes approved these changes Jan 23, 2020

View reviewed changes

RobinVogel added 2 commits January 24, 2020 08:55

Merge branch 'master' of https://github.com/scikit-learn-contrib/metr…

393a718

…ic-learn into c_issue_255

missed one check_is_fitted

33fbdf6

sklearn changed the assumptions behind check_is_fitted

79ab5ba

bellet approved these changes Jan 24, 2020

View reviewed changes

bellet merged commit 2380f51 into scikit-learn-contrib:master Jan 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Corrects the forgotten bits of PR #267 #269

Corrects the forgotten bits of PR #267 #269

RobinVogel commented Dec 10, 2019

bellet commented Dec 10, 2019

bellet left a comment

bellet left a comment

bellet commented Dec 20, 2019

RobinVogel commented Dec 23, 2019 •

edited

Loading

bellet commented Jan 3, 2020

RobinVogel commented Jan 6, 2020 •

edited

Loading

bellet left a comment •

edited

Loading

RobinVogel commented Jan 13, 2020

bellet commented Jan 14, 2020

bellet commented Jan 14, 2020

wdevazelhes left a comment

bellet commented Jan 23, 2020

RobinVogel commented Jan 24, 2020 •

edited

Loading

bellet left a comment

Corrects the forgotten bits of PR #267 #269

Corrects the forgotten bits of PR #267 #269

Conversation

RobinVogel commented Dec 10, 2019

bellet commented Dec 10, 2019

bellet left a comment

Choose a reason for hiding this comment

bellet left a comment

Choose a reason for hiding this comment

bellet commented Dec 20, 2019

RobinVogel commented Dec 23, 2019 • edited Loading

bellet commented Jan 3, 2020

RobinVogel commented Jan 6, 2020 • edited Loading

bellet left a comment • edited Loading

Choose a reason for hiding this comment

RobinVogel commented Jan 13, 2020

bellet commented Jan 14, 2020

bellet commented Jan 14, 2020

wdevazelhes left a comment

Choose a reason for hiding this comment

bellet commented Jan 23, 2020

RobinVogel commented Jan 24, 2020 • edited Loading

bellet left a comment

Choose a reason for hiding this comment

RobinVogel commented Dec 23, 2019 •

edited

Loading

RobinVogel commented Jan 6, 2020 •

edited

Loading

bellet left a comment •

edited

Loading

RobinVogel commented Jan 24, 2020 •

edited

Loading