[MRG+1] test one class reduction with sample weight for classifiers (Fixes #10337) #10347

Johayon · 2017-12-20T15:31:23Z

added a test for all classifiers with sample_weight in the fit arguments where only one class remains after trimming.
make SVC and linearSVC fail.
bug correction on GaussianNB and ComplementNB.

gxyd · 2017-12-20T19:19:45Z

sklearn/utils/estimator_checks.py

+def check_classifiers_one_label_sample_weights(name, classifier_orig):
+    # check that classifiers accepting sample_weight fit fine or
+    # throws an ValueError if the problem is reduce to one class.
+    error_fit = ("Classifier can't train when only one class is present "


less that two classes are present?

scikit-learn/sklearn/utils/estimator_checks.py

Lines 1167 to 1168 in c30c503

def check_classifiers_one_label(name, classifier_orig):

error_string_fit = "Classifier can't train when only one class is present."

I must admit that I just did a copy/paste from the check_classifiers_one_label, but I can change it.

I just thought of this change as a possibility. But I guess we can wait for a code dev to review.

Johayon · 2018-01-02T10:28:42Z

Hi,
Should I skip when the test fails for the moment ? Or should I try to correct the problems on the different classifiers ?

jnothman · 2018-01-02T20:49:22Z

I've not looked at the errors, but try to correct unless there is a very good reason not to

…class

jnothman · 2017-12-20T22:20:05Z

sklearn/utils/estimator_checks.py

+        try:
+            classifier.fit(X, y, sample_weight=sample_weight)
+        except ValueError as e:
+            # 'specified nu is infeasible' thrown by nuSVC in this case


Please improve NuSVC instead of making it a special case here

I can remove this comment, it was corrected by the modification on baseSVC.

jnothman · 2017-12-20T22:21:00Z

sklearn/utils/estimator_checks.py

+            # 'specified nu is infeasible' thrown by nuSVC in this case
+            if (("class" not in repr(e)) and
+               ("specified nu is infeasible" not in repr(e))):
+                print(error_fit, classifier, e)


I don't think we should be putting anything on stdout

jnothman · 2017-12-20T22:22:23Z

sklearn/utils/estimator_checks.py

+            classifier.fit(X, y, sample_weight=sample_weight)
+        except ValueError as e:
+            # 'specified nu is infeasible' thrown by nuSVC in this case
+            if (("class" not in repr(e)) and


This will pick up "classify" and "classifier" too. Use a regex with \b
Yes, we have probably made this mistake elsewhere, and it should be fixed.

jnothman · 2018-01-09T06:54:00Z

sklearn/naive_bayes.py

@@ -600,6 +603,9 @@ def fit(self, X, y, sample_weight=None):
            sample_weight = np.atleast_2d(sample_weight)
            Y *= check_array(sample_weight).T

+        # find the number of class after trimming used for complementNB
+        self.nb_trim_classes_ = np.sum(np.sum(Y, axis=0) > 0)


A public attribute like this should be documented. Or put the underscore at the beginning to make it a private implementation detail. We also tend to use n rather than nb

maybe we keep it private (I don't know if it will be interesting).

jnothman · 2018-01-09T06:55:07Z

sklearn/naive_bayes.py

@@ -832,7 +838,7 @@ def _joint_log_likelihood(self, X):

        X = check_array(X, accept_sparse="csr")
        jll = safe_sparse_dot(X, self.feature_log_prob_.T)
-        if len(self.classes_) == 1:
+        if len(self.classes_) == 1 or self.nb_trim_classes_ == 1:


Isn't the second condition sufficient?

jnothman · 2018-01-09T07:00:43Z

sklearn/utils/estimator_checks.py

+                 "after sample_weight trimming.")
+    error_predict = ("Classifier can't predict when only one class is "
+                     "present after sample_weight trimming.")
+    if has_fit_parameter(classifier_orig, "sample_weight"):


has_fit_parameter will return false if **kwargs are accepted. I would just try fit with this kwarg and continue if a TypeError is raised

jnothman · 2018-01-09T07:02:38Z

sklearn/utils/estimator_checks.py

+            if ("class" not in repr(e)):
+                print(error_fit, classifier, e)
+                traceback.print_exc(file=sys.stdout)
+                raise e


raise without an argument would be better here.

jnothman · 2018-01-09T07:04:59Z

sklearn/utils/estimator_checks.py

+            classifier.fit(X_train, y, sample_weight=sample_weight)
+        except ValueError as e:
+            if ("class" not in repr(e)):
+                print(error_fit, classifier, e)


No need to provide the second two args. They will be output by pytest

jnothman · 2018-01-09T07:05:08Z

sklearn/utils/estimator_checks.py

+            else:
+                return
+        except Exception as exc:
+            print(error_fit, classifier, exc)


jnothman · 2018-01-09T07:05:18Z

sklearn/utils/estimator_checks.py

+        try:
+            assert_array_equal(classifier.predict(X_test), np.ones(10))
+        except Exception as exc:
+            print(error_predict, classifier, exc)


jnothman · 2018-01-09T21:55:54Z

sklearn/utils/estimator_checks.py

+        assert_regex_matches(repr(e), r"\bclass(es)?\b", msg=error_fit)
+        return
+    except TypeError as e:
+        return


Add a comment that sample_weight not supported. Indeed could check the message

jnothman

Otherwise LGTM!

jnothman · 2018-01-09T21:56:31Z

sklearn/utils/estimator_checks.py

+    # Test that fit won't raise an unexpected exception
+    try:
+        classifier.fit(X_train, y, sample_weight=sample_weight)
+    except ValueError as e:


Much neater, thanks!

jnothman · 2018-01-09T21:59:49Z

sklearn/utils/estimator_checks.py

+    # check that classifiers accepting sample_weight fit fine or
+    # throws an ValueError if the problem is reduce to one class.
+    error_fit = ("Classifier can't train when only one class is present "
+                 "after sample_weight trimming.")


Add a note that it may raise an error mentioning "class" instead.

jnothman · 2018-01-10T23:59:48Z

sklearn/utils/estimator_checks.py

-        assert_regex_matches(repr(e), r"\bsample_weight\b",
-                             msg="sample_weight not supported")
+        try:
+            assert_regex_matches(repr(e), r"\bsample_weight\b")


surely if not re.search('...', repr(e)): raise is much more readable!

yes, sorry.

cmarmo · 2020-07-17T21:57:35Z

Hi @Johayon, I know it has been a while... apologies for that... but you already have approval... if you are still interested in working on this, do you mind fixing conflicts? Thanks a lot for your patience!

Johayon · 2020-07-22T20:39:10Z

Hello @cmarmo,
I don't mind,
I can take a look at this in the week-end.

cmarmo · 2022-08-15T03:04:21Z

Closing as superseded by #24140.

Johayon added 4 commits December 20, 2017 15:55

scikit-learn#10337 - add test for one label after sample_weight

78c9d00

scikit-learn#10337 change behavior for SVC with precomputed kernel

b938bf7

respect flake8 length

783d2a0

pyflakes checks different from flake8

9557252

gxyd reviewed Dec 20, 2017

View reviewed changes

scikit-learn#10337 add predict test

6ded7bf

Johayon added 4 commits January 3, 2018 02:05

svc and linearsvc should not compute with only one class

49c1ee4

GaussianNB should not divide by 0 and ComplementNB check if only one …

f875ee7

…class

test the tests

af9f4e7

flake8 except

dc85e6f

Johayon changed the title ~~#10337 added test sample weight one class for classifiers~~ [MRG] test one class reduction with sample weight for classifiers (Fixes #10337) Jan 3, 2018

jnothman reviewed Jan 9, 2018

View reviewed changes

Johayon added 3 commits January 9, 2018 13:32

comments modification

2b2e15a

flake8 correction

ba0e774

depreciated in 3.4 but needed for 2.7

77600dd

jnothman reviewed Jan 9, 2018

View reviewed changes

jnothman approved these changes Jan 9, 2018

View reviewed changes

jnothman changed the title ~~[MRG] test one class reduction with sample weight for classifiers (Fixes #10337)~~ [MRG+1] test one class reduction with sample weight for classifiers (Fixes #10337) Jan 9, 2018

Johayon added 2 commits January 10, 2018 17:03

change error message and add check for TypeError

e836e56

quotes and raise original error for TypeError

b176ebf

jnothman reviewed Jan 10, 2018

View reviewed changes

Johayon added 2 commits January 11, 2018 01:13

bad coding correction

d7b3210

merge master solve conflicts

5f4663a

amueller added the Waiting for Reviewer label Aug 5, 2019

github-actions bot added module:naive_bayes module:svm module:utils labels Mar 2, 2020

cmarmo added help wanted Stalled and removed Waiting for Reviewer labels Jul 18, 2020

cmarmo added Waiting for Reviewer and removed Stalled help wanted labels Jul 22, 2020

cmarmo added Stalled help wanted and removed Waiting for Reviewer labels Aug 15, 2020

Base automatically changed from master to main January 22, 2021 10:49

cmarmo mentioned this pull request Aug 8, 2022

TST Add common tests for single class fitting induced by sample weights #24140

Merged

cmarmo added Superseded PR has been replace by a newer PR and removed Stalled help wanted labels Aug 8, 2022

cmarmo closed this Aug 15, 2022

	def check_classifiers_one_label(name, classifier_orig):
	error_string_fit = "Classifier can't train when only one class is present."

Uh oh!

[MRG+1] test one class reduction with sample weight for classifiers (Fixes #10337) #10347

[MRG+1] test one class reduction with sample weight for classifiers (Fixes #10337) #10347

Uh oh!

Conversation

Johayon commented Dec 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Johayon commented Jan 2, 2018

Uh oh!

jnothman commented Jan 2, 2018 via email

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmarmo commented Jul 17, 2020

Uh oh!

Johayon commented Jul 22, 2020

Uh oh!

cmarmo commented Aug 15, 2022

Uh oh!

Uh oh!

Johayon commented Dec 20, 2017 •

edited

Loading