-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH add only_non_negative
parameter to _check_sample_weight
#20880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@simonandras Could you check the message that I let on the original issue: #20036 (comment) I think that we should change |
…ed on with the only_positive bool parameter (beckward compatible). For the check the alredy written check_non_negative is used.
d3db181
to
7370c51
Compare
…d rename the only_positive parameter to only_non_negative, because zeros are accepted.
Thank you for the reply! |
I wasnt able to make all checks succesfull. I think the problem is with the test_adaboost_negative_weight_error function and the new error message but im not sure what the problem actually is and what to do with it. Any help would be appriciated! |
Here I give you a patch that show as well how to modify the current code such that the test failures should be solved: diff --git a/sklearn/ensemble/_weight_boosting.py b/sklearn/ensemble/_weight_boosting.py
index 77ef449ba1..ed3b0ad853 100644
--- a/sklearn/ensemble/_weight_boosting.py
+++ b/sklearn/ensemble/_weight_boosting.py
@@ -123,10 +123,10 @@ class BaseWeightBoosting(BaseEnsemble, metaclass=ABCMeta):
y_numeric=is_regressor(self),
)
- sample_weight = _check_sample_weight(sample_weight, X, np.float64, copy=True)
+ sample_weight = _check_sample_weight(
+ sample_weight, X, np.float64, copy=True, non_negative=True
+ )
sample_weight /= sample_weight.sum()
- if np.any(sample_weight < 0):
- raise ValueError("sample_weight cannot contain negative weights")
# Check parameters
self._validate_estimator()
@@ -136,7 +136,7 @@ class BaseWeightBoosting(BaseEnsemble, metaclass=ABCMeta):
self.estimator_weights_ = np.zeros(self.n_estimators, dtype=np.float64)
self.estimator_errors_ = np.ones(self.n_estimators, dtype=np.float64)
- # Initializion of the random number instance that will be used to
+ # Initialization of the random number instance that will be used to
# generate a seed at each iteration
random_state = check_random_state(self.random_state)
diff --git a/sklearn/ensemble/tests/test_weight_boosting.py b/sklearn/ensemble/tests/test_weight_boosting.py
index 6927d47c11..159f83abf2 100755
--- a/sklearn/ensemble/tests/test_weight_boosting.py
+++ b/sklearn/ensemble/tests/test_weight_boosting.py
@@ -576,6 +576,6 @@ def test_adaboost_negative_weight_error(model, X, y):
sample_weight = np.ones_like(y)
sample_weight[-1] = -10
- err_msg = "sample_weight cannot contain negative weight"
+ err_msg = "Negative values in data passed to `sample_weight`"
with pytest.raises(ValueError, match=err_msg):
model.fit(X, y, sample_weight=sample_weight)
diff --git a/sklearn/linear_model/_base.py b/sklearn/linear_model/_base.py
index 6e1d6acb0c..514525a14a 100644
--- a/sklearn/linear_model/_base.py
+++ b/sklearn/linear_model/_base.py
@@ -662,7 +662,7 @@ class LinearRegression(MultiOutputMixin, RegressorMixin, LinearModel):
X, y, accept_sparse=accept_sparse, y_numeric=True, multi_output=True
)
- if sample_weight is not None: # the weights should be non-negative
+ if sample_weight is not None:
sample_weight = _check_sample_weight(
sample_weight, X, only_non_negative=True, dtype=X.dtype
)
diff --git a/sklearn/neighbors/_kde.py b/sklearn/neighbors/_kde.py
index 328a13371b..dcfc7b51f4 100644
--- a/sklearn/neighbors/_kde.py
+++ b/sklearn/neighbors/_kde.py
@@ -191,9 +191,7 @@ class KernelDensity(BaseEstimator):
X = self._validate_data(X, order="C", dtype=DTYPE)
if sample_weight is not None:
- sample_weight = _check_sample_weight(sample_weight, X, DTYPE)
- if sample_weight.min() <= 0:
- raise ValueError("sample_weight must have positive values")
+ sample_weight = _check_sample_weight(sample_weight, X, DTYPE, only_non_negative=True)
kwargs = self.metric_params
if kwargs is None:
diff --git a/sklearn/neighbors/tests/test_kde.py b/sklearn/neighbors/tests/test_kde.py
index 84f7623c8d..d4fb775c44 100644
--- a/sklearn/neighbors/tests/test_kde.py
+++ b/sklearn/neighbors/tests/test_kde.py
@@ -209,7 +209,7 @@ def test_sample_weight_invalid():
data = np.reshape([1.0, 2.0, 3.0], (-1, 1))
sample_weight = [0.1, -0.2, 0.3]
- expected_err = "sample_weight must have positive values"
+ expected_err = "Negative values in data passed to `sample_weight`"
with pytest.raises(ValueError, match=expected_err):
kde.fit(data, sample_weight=sample_weight)
diff --git a/sklearn/utils/validation.py b/sklearn/utils/validation.py
index 554f821539..eb015d287d 100644
--- a/sklearn/utils/validation.py
+++ b/sklearn/utils/validation.py
@@ -1480,7 +1480,11 @@ def _check_psd_eigenvalues(lambdas, enable_warnings=False):
def _check_sample_weight(
- sample_weight, X, only_non_negative=False, dtype=None, copy=False
+ sample_weight,
+ X,
+ dtype=None,
+ copy=False,
+ only_non_negative=False,
):
"""Validate sample weights.
@@ -1497,9 +1501,6 @@ def _check_sample_weight(
X : {ndarray, list, sparse matrix}
Input data.
- only_non_negative : if True then a non negativity check for the sample_weight
- will be done.
-
dtype : dtype, default=None
dtype of the validated `sample_weight`.
If None, and the input `sample_weight` is an array, the dtype of the
@@ -1510,6 +1511,9 @@ def _check_sample_weight(
copy : bool, default=False
If True, a copy of sample_weight will be created.
+ only_non_negative : bool, default=False,
+ Whether or not the weights are expected to be non-negative.
+
Returns
-------
sample_weight : ndarray of shape (n_samples,)
@@ -1546,7 +1550,7 @@ def _check_sample_weight(
)
if only_non_negative:
- check_non_negative(sample_weight, "sample weight")
+ check_non_negative(sample_weight, "`sample_weight`")
return sample_weight |
Indeed, you need to:
Then everythin should be fine :) |
only_non_negative
parameter to _check_sample_weight
In addition, we will want a test in |
update documentation Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
delete unnecessary comment Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Add `...` to function name in the error message Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…to my-new-branch
I made the changes and wrote the test inside the already exisiting test_check_sample_weight function as a new part. Every check is alright now except one, which is due to there is no new entry in the changelog. As you said we dont need an entry for this one, so this failed check can be ignored? |
…ll make the parameter order like in the function definition
Please add an entry to the change log at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Probably @rth can have a look at this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as well!
I will move this item for 1.1 since this is not a blocker for the release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the changelog entry. I will merge if the CIs pass
…it-learn#20880) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Reference Issues/PRs
Fixes #20036.
Partially addressed #15531
What does this implement/fix? Explain your changes.
The LinearRegression() objects .fit() method gives unclear error message if the sample_weight parameter of the fit() method has negative value. The exception occured only at the call of linalg.lstsq(X, y). I think its better if the negative value check is done at the beginning of the .fit() method. The new message is: "The weights of the samples cannot be negative".
Any other comments?
I think the check for negative values should not be done inside the _check_sample_weight() function but inside the given .fit() method. Maybe there are other cases too where this exception message and the negative weight value check can be useful.
This is my first PR to the open source community and i wellcome any comment or advice related to this PR!