Skip to content

[MRG+1] FIX Negative or null sample_weights in SVM #14286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Sep 24, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
9f6914b
[MRG] FIX Fail to train SVM when got "warning: class label 0 specifie…
Jul 6, 2019
b5663e4
[MRG] FIX #9494 - improved remaining labels check
Jul 7, 2019
090d038
[MRG] FIX #9494 - changed comment
Jul 7, 2019
2586bcf
FIX #9494 - changed error message
Jul 7, 2019
e0b2119
Update sklearn/svm/src/libsvm/svm.cpp
alexshacked Jul 9, 2019
0beacb6
Update sklearn/svm/src/libsvm/svm.cpp
alexshacked Jul 9, 2019
3f90e78
Update sklearn/svm/src/libsvm/svm.cpp
alexshacked Jul 9, 2019
83bcdb5
Update sklearn/svm/src/libsvm/svm.cpp
alexshacked Jul 9, 2019
5e649c6
Update sklearn/svm/src/libsvm/svm.cpp
alexshacked Jul 9, 2019
a8a50fb
Update sklearn/svm/src/libsvm/svm.cpp
alexshacked Jul 9, 2019
1ecf05f
Update sklearn/svm/src/libsvm/svm.cpp
alexshacked Jul 9, 2019
8471b33
Update sklearn/svm/src/libsvm/svm.cpp
alexshacked Jul 9, 2019
a4ffddf
Update sklearn/svm/src/libsvm/svm.cpp
alexshacked Jul 9, 2019
176b1f2
Update sklearn/svm/src/libsvm/svm.cpp
alexshacked Jul 9, 2019
c6c9e3b
Update sklearn/svm/src/libsvm/svm.cpp
alexshacked Jul 9, 2019
0ddcb18
Update sklearn/svm/src/libsvm/svm.cpp
alexshacked Jul 9, 2019
f569a86
Update sklearn/svm/src/libsvm/svm.cpp
alexshacked Jul 9, 2019
db9474f
Update sklearn/svm/src/libsvm/svm.cpp
alexshacked Jul 9, 2019
578e830
Update sklearn/svm/src/libsvm/svm.cpp
alexshacked Jul 9, 2019
b2b3a79
Update sklearn/svm/src/libsvm/svm.cpp
alexshacked Jul 9, 2019
f7ec6ea
Update sklearn/svm/src/libsvm/svm.cpp
alexshacked Jul 9, 2019
212f7bb
Update sklearn/svm/src/libsvm/svm.cpp
alexshacked Jul 9, 2019
4900e66
Update sklearn/svm/src/libsvm/svm.cpp
alexshacked Jul 9, 2019
b70acc2
Update sklearn/svm/src/libsvm/svm.cpp
alexshacked Jul 9, 2019
54d588f
put back blank line dividing functions that were removed by mistake
Jul 9, 2019
7d1d1ab
[MRG] FIX #9494 two commands under an if-case need brackets
Jul 9, 2019
7a93169
[MRG] FIX #9494 parametrized the test
Jul 9, 2019
267836f
[MRG] FIX #9494 changed comment
Jul 9, 2019
808f71c
[MRG] FIX #9494 fixed tests
alexshacked Jul 11, 2019
3546fda
[MRG] FIX #9494 changed comment 2
alexshacked Jul 11, 2019
208f46d
[MRG] FIX #9494 comment added
alexshacked Jul 11, 2019
accb79a
[MRG] FIX #9494 updated whats_new
alexshacked Jul 13, 2019
3855c7d
[MRG] FIX #9494 fixed explanation in whats_new
alexshacked Jul 13, 2019
90acfaa
Merge remote-tracking branch 'origin/master' into pr/alexshacked/14286
glemaitre Sep 19, 2019
534e1f4
Merge branch 'fix_9494' of github.com:alexshacked/scikit-learn into p…
glemaitre Sep 19, 2019
8d83f48
[MRG] FIX #9494 fixed identation in svm.cpp and improved parameteriza…
Sep 20, 2019
43a39ed
Merge remote-tracking branch 'upstream/master' into fix_9494
Sep 20, 2019
632f445
Merge remote-tracking branch 'origin/fix_9494' into fix_9494
Sep 20, 2019
21d8926
[MRG] FIX #9494 changed test to fit new error message
Sep 20, 2019
923aeee
[MRG] FIX #9494 improved error message and parameters name
Sep 23, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions doc/whats_new/v0.22.rst
Original file line number Diff line number Diff line change
Expand Up @@ -503,6 +503,13 @@ Changelog
`kernel='precomputed'` and fit on non-square data.
:pr:`14336` by :user:`Gregory Dexter <gdex1>`.

- |Fix| :class:`svm.SVC`, :class:`svm.SVR`, :class:`svm.NuSVR` and
:class:`svm.OneClassSVM` when received values negative or zero
for parameter ``sample_weight`` in method fit(), generated an
invalid model. This behavior occured only in some border scenarios.
Now in these cases, fit() will fail with an Exception.
:pr:`14286` by :user:`Alex Shacked <alexshacked>`.

:mod:`sklearn.tree`
...................

Expand Down
36 changes: 36 additions & 0 deletions sklearn/svm/src/libsvm/svm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3101,6 +3101,42 @@ const char *PREFIX(check_parameter)(const PREFIX(problem) *prob, const svm_param
free(count);
}

if(svm_type == C_SVC ||
svm_type == EPSILON_SVR ||
svm_type == NU_SVR ||
svm_type == ONE_CLASS)
{
PREFIX(problem) newprob;
// filter samples with negative and null weights
remove_zero_weight(&newprob, prob);

char* msg = NULL;
// all samples were removed
if(newprob.l == 0)
msg = "Invalid input - all samples have zero or negative weights.";
else if(prob->l != newprob.l &&
svm_type == C_SVC)
{
bool only_one_label = true;
int first_label = newprob.y[0];
for(int i=1;i<newprob.l;i++)
{
if(newprob.y[i] != first_label)
{
only_one_label = false;
break;
}
}
if(only_one_label == true)
msg = "Invalid input - all samples with positive weights have the same label.";
}

free(newprob.x);
free(newprob.y);
free(newprob.W);
if(msg != NULL)
return msg;
}
return NULL;
}

Expand Down
150 changes: 140 additions & 10 deletions sklearn/svm/tests/test_svm.py
Original file line number Diff line number Diff line change
Expand Up @@ -422,24 +422,154 @@ def test_weight():
assert f1_score(y_[100:], y_pred) > .3


def test_sample_weights():
# Test weights on individual samples
# TODO: check on NuSVR, OneClass, etc.
clf = svm.SVC()
clf.fit(X, Y)
assert_array_equal(clf.predict([X[2]]), [1.])
@pytest.mark.parametrize("estimator", [svm.SVC(C=1e-2), svm.NuSVC()])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't fail in master for me. Should it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't check zero or negative weights. I added it to the unit tests for this feature as asked by @glemaitre who was working on another feature related to weights in SVM. Probably that feature is already merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you check the previous test_sample_weights, it was not testing anything from the sample_weight.
The added tests do not fail because it was no bug because at least they are added to be sure that the results are rights. It could be done in another PR but I wanted to avoid splitting it for simplicity for a first contrib :)

def test_svm_classifier_sided_sample_weight(estimator):
# fit a linear SVM and check that giving more weight to opposed samples
# in the space will flip the decision toward these samples.
X = [[-2, 0], [-1, -1], [0, -2], [0, 2], [1, 1], [2, 0]]
estimator.set_params(kernel='linear')

# check that with unit weights, a sample is supposed to be predicted on
# the boundary
sample_weight = [1] * 6
estimator.fit(X, Y, sample_weight=sample_weight)
y_pred = estimator.decision_function([[-1., 1.]])
assert y_pred == pytest.approx(0)

# give more weights to opposed samples
sample_weight = [10., .1, .1, .1, .1, 10]
estimator.fit(X, Y, sample_weight=sample_weight)
y_pred = estimator.decision_function([[-1., 1.]])
assert y_pred < 0

sample_weight = [1., .1, 10., 10., .1, .1]
estimator.fit(X, Y, sample_weight=sample_weight)
y_pred = estimator.decision_function([[-1., 1.]])
assert y_pred > 0

sample_weight = [.1] * 3 + [10] * 3
clf.fit(X, Y, sample_weight=sample_weight)
assert_array_equal(clf.predict([X[2]]), [2.])

@pytest.mark.parametrize(
"estimator",
[svm.SVR(C=1e-2), svm.NuSVR(C=1e-2)]
)
def test_svm_regressor_sided_sample_weight(estimator):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't fail in master for me. Should it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't check zero or negative weights. I added it to the unit tests for this feature as asked by @glemaitre who was working on another feature related weights in SVM. Probably that feature is already merged.

# similar test to test_svm_classifier_sided_sample_weight but for
# SVM regressors
X = [[-2, 0], [-1, -1], [0, -2], [0, 2], [1, 1], [2, 0]]
estimator.set_params(kernel='linear')

# check that with unit weights, a sample is supposed to be predicted on
# the boundary
sample_weight = [1] * 6
estimator.fit(X, Y, sample_weight=sample_weight)
y_pred = estimator.predict([[-1., 1.]])
assert y_pred == pytest.approx(1.5)

# give more weights to opposed samples
sample_weight = [10., .1, .1, .1, .1, 10]
estimator.fit(X, Y, sample_weight=sample_weight)
y_pred = estimator.predict([[-1., 1.]])
assert y_pred < 1.5

sample_weight = [1., .1, 10., 10., .1, .1]
estimator.fit(X, Y, sample_weight=sample_weight)
y_pred = estimator.predict([[-1., 1.]])
assert y_pred > 1.5


def test_svm_equivalence_sample_weight_C():
# test that rescaling all samples is the same as changing C
clf = svm.SVC()
clf.fit(X, Y)
dual_coef_no_weight = clf.dual_coef_
clf.set_params(C=100)
clf.fit(X, Y, sample_weight=np.repeat(0.01, len(X)))
assert_array_almost_equal(dual_coef_no_weight, clf.dual_coef_)
assert_allclose(dual_coef_no_weight, clf.dual_coef_)


@pytest.mark.parametrize(
"Estimator, err_msg",
[(svm.SVC,
'Invalid input - all samples have zero or negative weights.'),
(svm.NuSVC, '(negative dimensions are not allowed|nu is infeasible)'),
(svm.SVR,
'Invalid input - all samples have zero or negative weights.'),
(svm.NuSVR,
'Invalid input - all samples have zero or negative weights.'),
(svm.OneClassSVM,
'Invalid input - all samples have zero or negative weights.')
],
ids=['SVC', 'NuSVC', 'SVR', 'NuSVR', 'OneClassSVM']
)
@pytest.mark.parametrize(
"sample_weight",
[[0] * len(Y), [-0.3] * len(Y)],
ids=['weights-are-zero', 'weights-are-negative']
)
def test_negative_sample_weights_mask_all_samples(Estimator,
err_msg, sample_weight):
est = Estimator(kernel='linear')
with pytest.raises(ValueError, match=err_msg):
est.fit(X, Y, sample_weight=sample_weight)


@pytest.mark.parametrize(
"Classifier, err_msg",
[(svm.SVC,
'Invalid input - all samples with positive weights have the same label'),
(svm.NuSVC, 'specified nu is infeasible')],
ids=['SVC', 'NuSVC']
)
@pytest.mark.parametrize(
"sample_weight",
[[0, -0.5, 0, 1, 1, 1],
[1, 1, 1, 0, -0.1, -0.3]],
ids=['mask-label-1', 'mask-label-2']
)
def test_negative_weights_svc_leave_just_one_label(Classifier,
err_msg,
sample_weight):
clf = Classifier(kernel='linear')
with pytest.raises(ValueError, match=err_msg):
clf.fit(X, Y, sample_weight=sample_weight)


@pytest.mark.parametrize(
"Classifier, model",
[(svm.SVC, {'when-left': [0.3998, 0.4], 'when-right': [0.4, 0.3999]}),
(svm.NuSVC, {'when-left': [0.3333, 0.3333],
'when-right': [0.3333, 0.3333]})],
ids=['SVC', 'NuSVC']
)
@pytest.mark.parametrize(
"sample_weight, mask_side",
[([1, -0.5, 1, 1, 1, 1], 'when-left'),
([1, 1, 1, 0, 1, 1], 'when-right')],
ids=['partial-mask-label-1', 'partial-mask-label-2']
)
def test_negative_weights_svc_leave_two_labels(Classifier, model,
sample_weight, mask_side):
clf = Classifier(kernel='linear')
clf.fit(X, Y, sample_weight=sample_weight)
assert_allclose(clf.coef_, [model[mask_side]], rtol=1e-3)


@pytest.mark.parametrize(
"Estimator",
[svm.SVC, svm.NuSVC, svm.NuSVR],
ids=['SVC', 'NuSVC', 'NuSVR']
)
@pytest.mark.parametrize(
"sample_weight",
[[1, -0.5, 1, 1, 1, 1], [1, 1, 1, 0, 1, 1]],
ids=['partial-mask-label-1', 'partial-mask-label-2']
)
def test_negative_weight_equal_coeffs(Estimator, sample_weight):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't fail in master for me. Should it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test shouldn't fail. It does not verify the main goal of the feature - throw Exception during fit() if samples of one label are totaly removed because they all had zero or negative weghts.
Insted this test validates the "healthy behaviour" of current model. That is, If only a few samples are removed and the ratio between the samples with diferent labels is not severely disrupted, then the classifier's model doesen't change much. The model coefficients that were originally equal, still remain close to each other.
Since this test doesen't actually verify the feature I intend to add, I can remove the test if you think we are better without it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the test, thinking it can detect future changes to the classifier model which will change this present behaviour.

# model generates equal coefficients
est = Estimator(kernel='linear')
est.fit(X, Y, sample_weight=sample_weight)
coef = np.abs(est.coef_).ravel()
assert coef[0] == pytest.approx(coef[1], rel=1e-3)


@ignore_warnings(category=UndefinedMetricWarning)
Expand Down