-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG+1] test one class reduction with sample weight for classifiers (Fixes #10337) #10347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn/utils/estimator_checks.py
Outdated
def check_classifiers_one_label_sample_weights(name, classifier_orig): | ||
# check that classifiers accepting sample_weight fit fine or | ||
# throws an ValueError if the problem is reduce to one class. | ||
error_fit = ("Classifier can't train when only one class is present " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
less that two classes are present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scikit-learn/sklearn/utils/estimator_checks.py
Lines 1167 to 1168 in c30c503
def check_classifiers_one_label(name, classifier_orig): | |
error_string_fit = "Classifier can't train when only one class is present." |
I must admit that I just did a copy/paste from the check_classifiers_one_label
, but I can change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just thought of this change as a possibility. But I guess we can wait for a code dev to review.
Hi, |
I've not looked at the errors, but try to correct unless there is a very
good reason not to
|
sklearn/utils/estimator_checks.py
Outdated
try: | ||
classifier.fit(X, y, sample_weight=sample_weight) | ||
except ValueError as e: | ||
# 'specified nu is infeasible' thrown by nuSVC in this case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please improve NuSVC instead of making it a special case here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove this comment, it was corrected by the modification on baseSVC.
sklearn/utils/estimator_checks.py
Outdated
# 'specified nu is infeasible' thrown by nuSVC in this case | ||
if (("class" not in repr(e)) and | ||
("specified nu is infeasible" not in repr(e))): | ||
print(error_fit, classifier, e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be putting anything on stdout
sklearn/utils/estimator_checks.py
Outdated
classifier.fit(X, y, sample_weight=sample_weight) | ||
except ValueError as e: | ||
# 'specified nu is infeasible' thrown by nuSVC in this case | ||
if (("class" not in repr(e)) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will pick up "classify" and "classifier" too. Use a regex with \b
Yes, we have probably made this mistake elsewhere, and it should be fixed.
sklearn/naive_bayes.py
Outdated
@@ -600,6 +603,9 @@ def fit(self, X, y, sample_weight=None): | |||
sample_weight = np.atleast_2d(sample_weight) | |||
Y *= check_array(sample_weight).T | |||
|
|||
# find the number of class after trimming used for complementNB | |||
self.nb_trim_classes_ = np.sum(np.sum(Y, axis=0) > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A public attribute like this should be documented. Or put the underscore at the beginning to make it a private implementation detail. We also tend to use n rather than nb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we keep it private (I don't know if it will be interesting).
sklearn/naive_bayes.py
Outdated
@@ -832,7 +838,7 @@ def _joint_log_likelihood(self, X): | |||
|
|||
X = check_array(X, accept_sparse="csr") | |||
jll = safe_sparse_dot(X, self.feature_log_prob_.T) | |||
if len(self.classes_) == 1: | |||
if len(self.classes_) == 1 or self.nb_trim_classes_ == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the second condition sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it is.
sklearn/utils/estimator_checks.py
Outdated
"after sample_weight trimming.") | ||
error_predict = ("Classifier can't predict when only one class is " | ||
"present after sample_weight trimming.") | ||
if has_fit_parameter(classifier_orig, "sample_weight"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has_fit_parameter will return false if **kwargs are accepted. I would just try fit with this kwarg and continue if a TypeError is raised
sklearn/utils/estimator_checks.py
Outdated
if ("class" not in repr(e)): | ||
print(error_fit, classifier, e) | ||
traceback.print_exc(file=sys.stdout) | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise without an argument would be better here.
sklearn/utils/estimator_checks.py
Outdated
classifier.fit(X_train, y, sample_weight=sample_weight) | ||
except ValueError as e: | ||
if ("class" not in repr(e)): | ||
print(error_fit, classifier, e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to provide the second two args. They will be output by pytest
sklearn/utils/estimator_checks.py
Outdated
else: | ||
return | ||
except Exception as exc: | ||
print(error_fit, classifier, exc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above
sklearn/utils/estimator_checks.py
Outdated
try: | ||
assert_array_equal(classifier.predict(X_test), np.ones(10)) | ||
except Exception as exc: | ||
print(error_predict, classifier, exc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above
assert_regex_matches(repr(e), r"\bclass(es)?\b", msg=error_fit) | ||
return | ||
except TypeError as e: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment that sample_weight not supported. Indeed could check the message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM!
# Test that fit won't raise an unexpected exception | ||
try: | ||
classifier.fit(X_train, y, sample_weight=sample_weight) | ||
except ValueError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much neater, thanks!
sklearn/utils/estimator_checks.py
Outdated
# check that classifiers accepting sample_weight fit fine or | ||
# throws an ValueError if the problem is reduce to one class. | ||
error_fit = ("Classifier can't train when only one class is present " | ||
"after sample_weight trimming.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a note that it may raise an error mentioning "class" instead.
sklearn/utils/estimator_checks.py
Outdated
assert_regex_matches(repr(e), r"\bsample_weight\b", | ||
msg="sample_weight not supported") | ||
try: | ||
assert_regex_matches(repr(e), r"\bsample_weight\b") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
surely if not re.search('...', repr(e)): raise
is much more readable!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, sorry.
Hi @Johayon, I know it has been a while... apologies for that... but you already have approval... if you are still interested in working on this, do you mind fixing conflicts? Thanks a lot for your patience! |
Hello @cmarmo, |
Closing as superseded by #24140. |
Fixes #10337