[MRG] Fix ZeroDivisionError when using sparse data in SVM in case where support_vectors_ is empty #14894

danna-naser · 2019-09-05T17:41:11Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

When model.support_vectors_ is an empty sparse matrix, to calculate model.dual_coef_, we use

dual_coef_indptr = np.arange(0, dual_coef_indices.size + 1,
                                         dual_coef_indices.size / n_class)

which results in ZeroDivisionError.
This change skips this calculation in this case and makes the model.dual_coef_ consistent in dense vs sparse data

Any other comments?

adrinjalali · 2019-09-06T16:12:30Z

@danna-naser thanks for reporting it and the fix. But it may be the case that there's another underlying issue we need to solve.

@agramfort could you please have a look at this one? The issue is that sometimes you may get a solution from the SVR with 0 support vectors, and the output is just the intercept. The question is, do we want to raise a warning or even an error? Also, this example is very curious in the sense that the intercept is not the mean of all the data points.

import numpy as np
from scipy import sparse
from sklearn import svm
x_train = np.array([[0, 1, 0, 0],
                    [0, 0, 0, 1],
                    [0, 0, 1, 0],
                    [0, 0, 0, 1]])
y_train = np.array([0.04, 0.04, 0.10, 0.16])
model = svm.SVR(kernel='linear')
model.fit(x_train, y_train)

danna-naser · 2019-09-30T15:03:55Z

@adrinjalali any update on this? If an underlying issue was found, please let me know and I'll close :)

adrinjalali · 2019-10-01T09:23:04Z

@NicolasHug what do you think of this issue?

NicolasHug · 2019-10-01T11:46:37Z

I'm not sure what the intercept should be but since the dense version also has no SV, the fix looks correct to me?

adrinjalali

I'm convinced this fixes the issue at hand, still not sure why this is happening (no SV I mean). But I'm happy to have this in. Thanks @danna-naser

sklearn/svm/tests/test_svm.py

adrinjalali

Otherwise LGTM.

sklearn/svm/tests/test_svm.py

NicolasHug

Thanks for the PR @danna-naser ! Some minor nits but LGTM anyway

NicolasHug · 2019-10-07T12:49:02Z

sklearn/svm/base.py

-        self.dual_coef_ = sp.csr_matrix(
-            (dual_coef_data, dual_coef_indices, dual_coef_indptr),
-            (n_class, n_SV))
+        if dual_coef_indices.size == 0:


I would replace this check with if n_SV == 0 and only declare dual_coef_indices where it is actually used, i.e. in the else clause.

replaced with not n_SV check. If you'd prefer n_SV == 0 , just let me know

NicolasHug · 2019-10-07T12:50:22Z

sklearn/svm/tests/test_svm.py

+    y_train = np.array([0.04, 0.04, 0.10, 0.16])
+    model = svm.SVR(kernel='linear')
+    model.fit(X_train, y_train)
+    assert model.support_vectors_.data.size == 0


model.support_vectors_.size is enough (same below)

I made the change. Please let me know if anything else can be improved!

NicolasHug · 2019-10-07T12:50:55Z

sklearn/svm/tests/test_svm.py

@@ -560,6 +560,19 @@ def test_sparse_precomputed():
        assert "Sparse precomputed" in str(e)


+def test_sparse_fit_support_vectors_empty():
+    # Regression test for #14893


Wow Github is linking to the issue that's pretty cool

rth

Please add an entry to the change log at doc/whats_new/v0.22.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself with :user:.

LGTM, otherwise.

NicolasHug · 2019-10-07T20:28:50Z

Thanks @danna-naser !

danna-naser changed the title ~~[WIP] Fix ZeroDivisionError when using sparse data in SVM in case where support_vectors_ is empty~~ [MRG] Fix ZeroDivisionError when using sparse data in SVM in case where support_vectors_ is empty Sep 6, 2019

adrinjalali reviewed Oct 2, 2019

View reviewed changes

sklearn/svm/tests/test_svm.py Outdated Show resolved Hide resolved

adrinjalali approved these changes Oct 4, 2019

View reviewed changes

sklearn/svm/tests/test_svm.py Show resolved Hide resolved

NicolasHug approved these changes Oct 7, 2019

View reviewed changes

rth reviewed Oct 7, 2019

View reviewed changes

danna-naser added 10 commits October 7, 2019 15:26

add logic

060a639

fix test

c44a9ae

rename test

1b5735e

test

35d1f1e

linting

3cfe4bd

linting again

3f3977b

make test simpler:

85e6d41

comment about test

e1d0493

fixed based on suggestions

76a40d8

change log rebase

0e58862

danna-naser force-pushed the sparse_svm_divide_by_zero branch from 3906328 to 0e58862 Compare October 7, 2019 19:29

NicolasHug merged commit a89462b into scikit-learn:master Oct 7, 2019

Uh oh!

[MRG] Fix ZeroDivisionError when using sparse data in SVM in case where support_vectors_ is empty #14894

[MRG] Fix ZeroDivisionError when using sparse data in SVM in case where support_vectors_ is empty #14894

Uh oh!

Conversation

danna-naser commented Sep 5, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

adrinjalali commented Sep 6, 2019

Uh oh!

danna-naser commented Sep 30, 2019

Uh oh!

adrinjalali commented Oct 1, 2019

Uh oh!

NicolasHug commented Oct 1, 2019

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 7, 2019

Choose a reason for hiding this comment

Uh oh!

danna-naser Oct 7, 2019

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 7, 2019

Choose a reason for hiding this comment

Uh oh!

danna-naser Oct 7, 2019

Choose a reason for hiding this comment

Uh oh!

NicolasHug Oct 7, 2019

Choose a reason for hiding this comment

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug commented Oct 7, 2019

Uh oh!

Uh oh!