Skip to content

FIX accept meta-estimator in SelfTrainingClassifier #19126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 8, 2021

Conversation

glemaitre
Copy link
Member

closes #19119

SelfTrainingClassifier did not accept nested estimators that did not expose predict_proba.
One fix is to validate a fitted estimator where we know if predict_proba will be then available.

@@ -207,8 +207,11 @@ def fit(self, X, y):

if self.n_iter_ == 1:
# Only validate in the first iteration so that n_iter=0 is
# equivalent to the base_estimator itself.
_validate_estimator(self.base_estimator)
# equivalent to the base_estimator_ itself.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oliverrausch Do you recall what is the meaning of this line and the previous one?
It was not obvious to me.

Copy link
Member

@ogrisel ogrisel Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I suggest:

# Only validate the fitted estimator of the first iteration.

Actually since the validation is really cheap, I would not mind simplifying this code and removing the if self.n_iter_ == 1: condition.

@glemaitre glemaitre added this to the 0.24.1 milestone Jan 6, 2021
@glemaitre glemaitre added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Jan 6, 2021
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (once the review comments are addressed).

@@ -207,8 +207,11 @@ def fit(self, X, y):

if self.n_iter_ == 1:
# Only validate in the first iteration so that n_iter=0 is
# equivalent to the base_estimator itself.
_validate_estimator(self.base_estimator)
# equivalent to the base_estimator_ itself.
Copy link
Member

@ogrisel ogrisel Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I suggest:

# Only validate the fitted estimator of the first iteration.

Actually since the validation is really cheap, I would not mind simplifying this code and removing the if self.n_iter_ == 1: condition.

@riyadhctg
Copy link

riyadhctg commented Jan 6, 2021

LGTM. For the sake of sharing, the solution I had in my mind for #19119 was to change the order of the following two lines in https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/_stacking.py

names, all_estimators = self._validate_estimators()
self._validate_final_estimator()

to

self._validate_final_estimator()
names, all_estimators = self._validate_estimators()

@glemaitre
Copy link
Member Author

@riyadhctg The changes that you proposed is in the StackingClassifier. I think that the code is fine there. The changes required is in the SelfTrainingClassifier.

glemaitre and others added 2 commits January 6, 2021 17:29
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@ogrisel ogrisel merged commit 6d3d1b8 into scikit-learn:master Jan 8, 2021
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Jan 18, 2021
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
jeremiedbb pushed a commit that referenced this pull request Jan 19, 2021
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:semi_supervised To backport PR merged in master that need a backport to a release branch defined based on the milestone.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StackClassifier is SelfTrainingClassifier throws error
4 participants