ENH add `only_non_negative` parameter to `_check_sample_weight` #20880

simonandras · 2021-08-28T22:41:25Z

Reference Issues/PRs

Fixes #20036.
Partially addressed #15531

What does this implement/fix? Explain your changes.

The LinearRegression() objects .fit() method gives unclear error message if the sample_weight parameter of the fit() method has negative value. The exception occured only at the call of linalg.lstsq(X, y). I think its better if the negative value check is done at the beginning of the .fit() method. The new message is: "The weights of the samples cannot be negative".

Any other comments?

I think the check for negative values should not be done inside the _check_sample_weight() function but inside the given .fit() method. Maybe there are other cases too where this exception message and the negative weight value check can be useful.

This is my first PR to the open source community and i wellcome any comment or advice related to this PR!

glemaitre · 2021-09-01T13:12:35Z

@simonandras Could you check the message that I let on the original issue: #20036 (comment)

I think that we should change _check_sample_weight instead.

… regression)

…ed on with the only_positive bool parameter (beckward compatible). For the check the alredy written check_non_negative is used.

…d rename the only_positive parameter to only_non_negative, because zeros are accepted.

simonandras · 2021-09-01T18:35:06Z

Thank you for the reply!
I changed the _check_sample_weight function as you suggested. Now i used the alredy writen check_non_negative function for the non negativity check. This function has an error message writer part, only the "whom" parameter had to be added. Also added the only_non_negative parameter to _check_sample_weight (i chose this name instead of only_positive, because zeros are allowed).

simonandras · 2021-09-02T01:09:17Z

I wasnt able to make all checks succesfull. I think the problem is with the test_adaboost_negative_weight_error function and the new error message but im not sure what the problem actually is and what to do with it. Any help would be appriciated!

sklearn/utils/validation.py

sklearn/linear_model/_base.py

sklearn/utils/validation.py

glemaitre · 2021-09-02T09:04:01Z

Here I give you a patch that show as well how to modify the current code such that the test failures should be solved:

diff --git a/sklearn/ensemble/_weight_boosting.py b/sklearn/ensemble/_weight_boosting.py
index 77ef449ba1..ed3b0ad853 100644
--- a/sklearn/ensemble/_weight_boosting.py
+++ b/sklearn/ensemble/_weight_boosting.py
@@ -123,10 +123,10 @@ class BaseWeightBoosting(BaseEnsemble, metaclass=ABCMeta):
             y_numeric=is_regressor(self),
         )
 
-        sample_weight = _check_sample_weight(sample_weight, X, np.float64, copy=True)
+        sample_weight = _check_sample_weight(
+            sample_weight, X, np.float64, copy=True, non_negative=True
+        )
         sample_weight /= sample_weight.sum()
-        if np.any(sample_weight < 0):
-            raise ValueError("sample_weight cannot contain negative weights")
 
         # Check parameters
         self._validate_estimator()
@@ -136,7 +136,7 @@ class BaseWeightBoosting(BaseEnsemble, metaclass=ABCMeta):
         self.estimator_weights_ = np.zeros(self.n_estimators, dtype=np.float64)
         self.estimator_errors_ = np.ones(self.n_estimators, dtype=np.float64)
 
-        # Initializion of the random number instance that will be used to
+        # Initialization of the random number instance that will be used to
         # generate a seed at each iteration
         random_state = check_random_state(self.random_state)
 
diff --git a/sklearn/ensemble/tests/test_weight_boosting.py b/sklearn/ensemble/tests/test_weight_boosting.py
index 6927d47c11..159f83abf2 100755
--- a/sklearn/ensemble/tests/test_weight_boosting.py
+++ b/sklearn/ensemble/tests/test_weight_boosting.py
@@ -576,6 +576,6 @@ def test_adaboost_negative_weight_error(model, X, y):
     sample_weight = np.ones_like(y)
     sample_weight[-1] = -10
 
-    err_msg = "sample_weight cannot contain negative weight"
+    err_msg = "Negative values in data passed to `sample_weight`"
     with pytest.raises(ValueError, match=err_msg):
         model.fit(X, y, sample_weight=sample_weight)
diff --git a/sklearn/linear_model/_base.py b/sklearn/linear_model/_base.py
index 6e1d6acb0c..514525a14a 100644
--- a/sklearn/linear_model/_base.py
+++ b/sklearn/linear_model/_base.py
@@ -662,7 +662,7 @@ class LinearRegression(MultiOutputMixin, RegressorMixin, LinearModel):
             X, y, accept_sparse=accept_sparse, y_numeric=True, multi_output=True
         )
 
-        if sample_weight is not None:  # the weights should be non-negative
+        if sample_weight is not None:
             sample_weight = _check_sample_weight(
                 sample_weight, X, only_non_negative=True, dtype=X.dtype
             )
diff --git a/sklearn/neighbors/_kde.py b/sklearn/neighbors/_kde.py
index 328a13371b..dcfc7b51f4 100644
--- a/sklearn/neighbors/_kde.py
+++ b/sklearn/neighbors/_kde.py
@@ -191,9 +191,7 @@ class KernelDensity(BaseEstimator):
         X = self._validate_data(X, order="C", dtype=DTYPE)
 
         if sample_weight is not None:
-            sample_weight = _check_sample_weight(sample_weight, X, DTYPE)
-            if sample_weight.min() <= 0:
-                raise ValueError("sample_weight must have positive values")
+            sample_weight = _check_sample_weight(sample_weight, X, DTYPE, only_non_negative=True)
 
         kwargs = self.metric_params
         if kwargs is None:
diff --git a/sklearn/neighbors/tests/test_kde.py b/sklearn/neighbors/tests/test_kde.py
index 84f7623c8d..d4fb775c44 100644
--- a/sklearn/neighbors/tests/test_kde.py
+++ b/sklearn/neighbors/tests/test_kde.py
@@ -209,7 +209,7 @@ def test_sample_weight_invalid():
     data = np.reshape([1.0, 2.0, 3.0], (-1, 1))
 
     sample_weight = [0.1, -0.2, 0.3]
-    expected_err = "sample_weight must have positive values"
+    expected_err = "Negative values in data passed to `sample_weight`"
     with pytest.raises(ValueError, match=expected_err):
         kde.fit(data, sample_weight=sample_weight)
 
diff --git a/sklearn/utils/validation.py b/sklearn/utils/validation.py
index 554f821539..eb015d287d 100644
--- a/sklearn/utils/validation.py
+++ b/sklearn/utils/validation.py
@@ -1480,7 +1480,11 @@ def _check_psd_eigenvalues(lambdas, enable_warnings=False):
 
 
 def _check_sample_weight(
-    sample_weight, X, only_non_negative=False, dtype=None, copy=False
+    sample_weight,
+    X,
+    dtype=None,
+    copy=False,
+    only_non_negative=False,
 ):
     """Validate sample weights.
 
@@ -1497,9 +1501,6 @@ def _check_sample_weight(
     X : {ndarray, list, sparse matrix}
         Input data.
 
-    only_non_negative : if True then a non negativity check for the sample_weight
-        will be done.
-
     dtype : dtype, default=None
        dtype of the validated `sample_weight`.
        If None, and the input `sample_weight` is an array, the dtype of the
@@ -1510,6 +1511,9 @@ def _check_sample_weight(
     copy : bool, default=False
         If True, a copy of sample_weight will be created.
 
+    only_non_negative : bool, default=False,
+        Whether or not the weights are expected to be non-negative.
+
     Returns
     -------
     sample_weight : ndarray of shape (n_samples,)
@@ -1546,7 +1550,7 @@ def _check_sample_weight(
             )
 
     if only_non_negative:
-        check_non_negative(sample_weight, "sample weight")
+        check_non_negative(sample_weight, "`sample_weight`")
 
     return sample_weight

glemaitre · 2021-09-02T09:11:31Z

Indeed, you need to:

move only_non_negative as the last parameter of the _check_sample_weight signature.
modify AdaBoost and KDE such that we don't check anymore by hand for negative weights.
change the error message in the associated test files.

Then everythin should be fine :)

glemaitre · 2021-09-02T09:13:43Z

In addition, we will want a test in test_validation.py to check that we raise the expected error when using _check_sample_weigth.
We don't need an entry in the changelog since the tool is part of the private API (probably dev API thouht).

update documentation Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

delete unnecessary comment Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Add `...` to function name in the error message Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

…to my-new-branch

…-learn into my-new-branch

…ents to 4 space indents

…n the _check_sample_weigth function

simonandras · 2021-09-02T22:26:32Z

I made the changes and wrote the test inside the already exisiting test_check_sample_weight function as a new part. Every check is alright now except one, which is due to there is no new entry in the changelog. As you said we dont need an entry for this one, so this failed check can be ignored?

…ll make the parameter order like in the function definition

glemaitre · 2021-09-03T09:54:56Z

Please add an entry to the change log at doc/whats_new/v1.0.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:. This will solve the failure in the CI. I will shortly make another review.

…n the changelog.

glemaitre

LGTM. Probably @rth can have a look at this one.

ogrisel

LGTM as well!

glemaitre · 2021-09-08T08:58:14Z

I will move this item for 1.1 since this is not a blocker for the release

glemaitre

I moved the changelog entry. I will merge if the CIs pass

…it-learn#20880) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

github-actions bot added the module:linear_model label Aug 28, 2021

suhassridhar approved these changes Aug 31, 2021

View reviewed changes

simonandras added 2 commits September 1, 2021 20:13

Adding informative error message for the negative weight case (linear…

d05db36

… regression)

The negativity check is now in _check_sample_weight which can be turn…

7370c51

…ed on with the only_positive bool parameter (beckward compatible). For the check the alredy written check_non_negative is used.

simonandras force-pushed the my-new-branch branch from d3db181 to 7370c51 Compare September 1, 2021 18:16

Adding parameter documentation to the changed _check_sample_weight an…

8c07156

…d rename the only_positive parameter to only_non_negative, because zeros are accepted.

simonandras added 5 commits September 2, 2021 02:00

Making some lines shorter than 88 char to pass tests

f0583d9

Make one line documentation in two lines to pass test (it was too long)

b3216f8

Delete whitespace in blank line

bc914dc

Add a dot to end of sentence (in modified function documentation)

1cbdcb4

formatting with black .

a0d6a70

glemaitre reviewed Sep 2, 2021

View reviewed changes

sklearn/utils/validation.py Outdated Show resolved Hide resolved

glemaitre self-requested a review September 2, 2021 08:07

glemaitre reviewed Sep 2, 2021

View reviewed changes

sklearn/linear_model/_base.py Outdated Show resolved Hide resolved

glemaitre reviewed Sep 2, 2021

View reviewed changes

sklearn/utils/validation.py Outdated Show resolved Hide resolved

glemaitre reviewed Sep 2, 2021

View reviewed changes

sklearn/utils/validation.py Outdated Show resolved Hide resolved

glemaitre changed the title ~~Adding informative error message for the negative weight case (linear regression)~~ ENH add only_non_negative parameter to _check_sample_weight Sep 2, 2021

rth mentioned this pull request Sep 2, 2021

Enforce positive sample_weight #15531

Open

rth self-requested a review September 2, 2021 11:11

simonandras and others added 5 commits September 2, 2021 20:19

Update sklearn/utils/validation.py

622a1c7

update documentation Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Update sklearn/linear_model/_base.py

72c05f9

delete unnecessary comment Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Update sklearn/utils/validation.py

fb01e06

Add `...` to function name in the error message Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Merge branch 'scikit-learn:main' into my-new-branch

4797539

Merge branch 'main' of https://github.com/simonandras/scikit-learn in…

a8c8dd7

…to my-new-branch

simonandras added 6 commits September 2, 2021 20:37

Merge branch 'my-new-branch' of https://github.com/simonandras/scikit…

5658d22

…-learn into my-new-branch

Solve the previous problems

da6c6fd

reformatting with black .

a7d5f2d

In the documentation of _check_sample_weight changing the 3 space ind…

b8741db

…ents to 4 space indents

Adding test to test_validation.py to test the negative weight error i…

ed20f5a

…n the _check_sample_weigth function

black .

9c24e51

In LinearRegression in fit method in _check_sample_weight function ca…

3acb9b3

…ll make the parameter order like in the function definition

simonandras and others added 2 commits September 3, 2021 13:47

Added entry to the changelog (sklearn.utils). Also corrected 2 typo i…

95ac8bf

…n the changelog.

Merge branch 'main' into my-new-branch

acece97

glemaitre approved these changes Sep 6, 2021

View reviewed changes

Merge branch 'main' into my-new-branch

97cae25

ogrisel approved these changes Sep 7, 2021

View reviewed changes

Merge branch 'scikit-learn:main' into my-new-branch

12acd84

glemaitre self-requested a review September 8, 2021 09:08

move whats new version

6864e9b

glemaitre approved these changes Sep 8, 2021

View reviewed changes

glemaitre merged commit 9c84abd into scikit-learn:main Sep 8, 2021

simonandras deleted the my-new-branch branch September 8, 2021 12:38

glemaitre mentioned this pull request Oct 23, 2021

Release 1.0.1 #21404

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH add `only_non_negative` parameter to `_check_sample_weight` #20880

ENH add `only_non_negative` parameter to `_check_sample_weight` #20880

Uh oh!

simonandras commented Aug 28, 2021 •

edited by glemaitre

Loading

Uh oh!

glemaitre commented Sep 1, 2021

Uh oh!

simonandras commented Sep 1, 2021 •

edited

Loading

Uh oh!

simonandras commented Sep 2, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre commented Sep 2, 2021

Uh oh!

glemaitre commented Sep 2, 2021

Uh oh!

glemaitre commented Sep 2, 2021

Uh oh!

simonandras commented Sep 2, 2021

Uh oh!

glemaitre commented Sep 3, 2021 •

edited

Loading

Uh oh!

glemaitre left a comment

Uh oh!

ogrisel left a comment

Uh oh!

glemaitre commented Sep 8, 2021

Uh oh!

glemaitre left a comment

Uh oh!

Uh oh!

Uh oh!

ENH add only_non_negative parameter to _check_sample_weight #20880

ENH add only_non_negative parameter to _check_sample_weight #20880

Uh oh!

Conversation

simonandras commented Aug 28, 2021 • edited by glemaitre Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

glemaitre commented Sep 1, 2021

Uh oh!

simonandras commented Sep 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonandras commented Sep 2, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre commented Sep 2, 2021

Uh oh!

glemaitre commented Sep 2, 2021

Uh oh!

glemaitre commented Sep 2, 2021

Uh oh!

simonandras commented Sep 2, 2021

Uh oh!

glemaitre commented Sep 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Sep 8, 2021

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ENH add `only_non_negative` parameter to `_check_sample_weight` #20880

ENH add `only_non_negative` parameter to `_check_sample_weight` #20880

simonandras commented Aug 28, 2021 •

edited by glemaitre

Loading

simonandras commented Sep 1, 2021 •

edited

Loading

glemaitre commented Sep 3, 2021 •

edited

Loading