-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Common check for sample weight invariance with removed samples #17176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Common check for sample weight invariance with removed samples #17176
Conversation
…t-learn#16507) Co-Authored-By: Nicolas Hug <contact@nicolas-hug.com>
cc @NicolasHug . We need to first merge #16963 I think. Earlier version of this PR basically implemented a simpler version of that PR. |
I guess we need to start with #17134 then :p |
@NicolasHug This PR should be good now. The coverage failure doesn't look relevant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rth, a few nits but LGTM!
sklearn/utils/estimator_checks.py
Outdated
@@ -845,7 +846,7 @@ def check_sample_weights_shape(name, estimator_orig): | |||
|
|||
|
|||
@ignore_warnings(category=FutureWarning) | |||
def check_sample_weights_invariance(name, estimator_orig): | |||
def check_sample_weights_invariance(name, estimator_orig, kind="ones"): | |||
# check that the estimators yield same results for | |||
# unit weights and no weights | |||
if (has_fit_parameter(estimator_orig, "sample_weight") and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while we're at it we might as well return early here instead of having the entire function in a if
block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually moved this _yield_all_checks
to only run check_sample_weight on estimators that support it. If find that showing a green status for a check when it doesn't apply is not great (and skipping it also produces too much noise).
sklearn/utils/estimator_checks.py
Outdated
@@ -861,25 +862,45 @@ def check_sample_weights_invariance(name, estimator_orig): | |||
X = np.array([[1, 3], [1, 3], [1, 3], [1, 3], | |||
[2, 1], [2, 1], [2, 1], [2, 1], | |||
[3, 3], [3, 3], [3, 3], [3, 3], | |||
[4, 1], [4, 1], [4, 1], [4, 1]], dtype=np.dtype('float')) | |||
[4, 1], [4, 1], [4, 1], [4, 1]], dtype=np.float64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for consistency these could be X1 and y1
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rth
sklearn/utils/estimator_checks.py
Outdated
if sparse.issparse(X_pred1): | ||
X_pred1 = X_pred1.toarray() | ||
X_pred2 = X_pred2.toarray() | ||
assert_allclose(X_pred1, X_pred2, err_msg=err_msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use sklearn.utils._testing.assert_allclose_dense_sparse
here.
…t-learn#17176) Co-Authored-By: Nicolas Hug <contact@nicolas-hug.com>
…ple is never a support vector
…ple is never a support vector
…ple is never a support vector
…ple is never a support vector
…t-learn#17176) Co-Authored-By: Nicolas Hug <contact@nicolas-hug.com>
…ple is never a support vector
…ple is never a support vector
…ple is never a support vector
…ple is never a support vector
…ple is never a support vector
…ple is never a support vector
…ple is never a support vector
…ple is never a support vector
…ple is never a support vector
…ple is never a support vector
…ple is never a support vector
…ple is never a support vector
Same as #16507 but taking into account latest updates in check_estimator and handling of xfail estimators.
I initially tried to fix this quickly in #17175, but in the end decided to revert the inital commit to fix CI and do a less rushed solution in this PR.