Skip to content

[MRG+1] test one class reduction with sample weight for classifiers (Fixes #10337) #10347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from

Conversation

Johayon
Copy link
Contributor

@Johayon Johayon commented Dec 20, 2017

Fixes #10337

  • added a test for all classifiers with sample_weight in the fit arguments where only one class remains after trimming.
  • make SVC and linearSVC fail.
  • bug correction on GaussianNB and ComplementNB.

def check_classifiers_one_label_sample_weights(name, classifier_orig):
# check that classifiers accepting sample_weight fit fine or
# throws an ValueError if the problem is reduce to one class.
error_fit = ("Classifier can't train when only one class is present "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

less that two classes are present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def check_classifiers_one_label(name, classifier_orig):
error_string_fit = "Classifier can't train when only one class is present."

I must admit that I just did a copy/paste from the check_classifiers_one_label, but I can change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought of this change as a possibility. But I guess we can wait for a code dev to review.

@Johayon
Copy link
Contributor Author

Johayon commented Jan 2, 2018

Hi,
Should I skip when the test fails for the moment ? Or should I try to correct the problems on the different classifiers ?

@jnothman
Copy link
Member

jnothman commented Jan 2, 2018 via email

@Johayon Johayon changed the title #10337 added test sample weight one class for classifiers [MRG] test one class reduction with sample weight for classifiers (Fixes #10337) Jan 3, 2018
try:
classifier.fit(X, y, sample_weight=sample_weight)
except ValueError as e:
# 'specified nu is infeasible' thrown by nuSVC in this case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please improve NuSVC instead of making it a special case here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove this comment, it was corrected by the modification on baseSVC.

# 'specified nu is infeasible' thrown by nuSVC in this case
if (("class" not in repr(e)) and
("specified nu is infeasible" not in repr(e))):
print(error_fit, classifier, e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be putting anything on stdout

classifier.fit(X, y, sample_weight=sample_weight)
except ValueError as e:
# 'specified nu is infeasible' thrown by nuSVC in this case
if (("class" not in repr(e)) and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will pick up "classify" and "classifier" too. Use a regex with \b
Yes, we have probably made this mistake elsewhere, and it should be fixed.

@@ -600,6 +603,9 @@ def fit(self, X, y, sample_weight=None):
sample_weight = np.atleast_2d(sample_weight)
Y *= check_array(sample_weight).T

# find the number of class after trimming used for complementNB
self.nb_trim_classes_ = np.sum(np.sum(Y, axis=0) > 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A public attribute like this should be documented. Or put the underscore at the beginning to make it a private implementation detail. We also tend to use n rather than nb

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we keep it private (I don't know if it will be interesting).

@@ -832,7 +838,7 @@ def _joint_log_likelihood(self, X):

X = check_array(X, accept_sparse="csr")
jll = safe_sparse_dot(X, self.feature_log_prob_.T)
if len(self.classes_) == 1:
if len(self.classes_) == 1 or self.nb_trim_classes_ == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the second condition sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is.

"after sample_weight trimming.")
error_predict = ("Classifier can't predict when only one class is "
"present after sample_weight trimming.")
if has_fit_parameter(classifier_orig, "sample_weight"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_fit_parameter will return false if **kwargs are accepted. I would just try fit with this kwarg and continue if a TypeError is raised

if ("class" not in repr(e)):
print(error_fit, classifier, e)
traceback.print_exc(file=sys.stdout)
raise e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise without an argument would be better here.

classifier.fit(X_train, y, sample_weight=sample_weight)
except ValueError as e:
if ("class" not in repr(e)):
print(error_fit, classifier, e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to provide the second two args. They will be output by pytest

else:
return
except Exception as exc:
print(error_fit, classifier, exc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

try:
assert_array_equal(classifier.predict(X_test), np.ones(10))
except Exception as exc:
print(error_predict, classifier, exc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

assert_regex_matches(repr(e), r"\bclass(es)?\b", msg=error_fit)
return
except TypeError as e:
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment that sample_weight not supported. Indeed could check the message

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM!

# Test that fit won't raise an unexpected exception
try:
classifier.fit(X_train, y, sample_weight=sample_weight)
except ValueError as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much neater, thanks!

# check that classifiers accepting sample_weight fit fine or
# throws an ValueError if the problem is reduce to one class.
error_fit = ("Classifier can't train when only one class is present "
"after sample_weight trimming.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note that it may raise an error mentioning "class" instead.

@jnothman jnothman changed the title [MRG] test one class reduction with sample weight for classifiers (Fixes #10337) [MRG+1] test one class reduction with sample weight for classifiers (Fixes #10337) Jan 9, 2018
assert_regex_matches(repr(e), r"\bsample_weight\b",
msg="sample_weight not supported")
try:
assert_regex_matches(repr(e), r"\bsample_weight\b")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surely if not re.search('...', repr(e)): raise is much more readable!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sorry.

@cmarmo
Copy link
Contributor

cmarmo commented Jul 17, 2020

Hi @Johayon, I know it has been a while... apologies for that... but you already have approval... if you are still interested in working on this, do you mind fixing conflicts? Thanks a lot for your patience!

@Johayon
Copy link
Contributor Author

Johayon commented Jul 22, 2020

Hello @cmarmo,
I don't mind,
I can take a look at this in the week-end.

@cmarmo
Copy link
Contributor

cmarmo commented Aug 15, 2022

Closing as superseded by #24140.

@cmarmo cmarmo closed this Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add common test for classifiers reducing to less than two classes via sample weights during fit
5 participants