Skip to content

Linear SVM does wrong prediction #17557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
C4rst3n opened this issue Jun 10, 2020 · 9 comments
Open

Linear SVM does wrong prediction #17557

C4rst3n opened this issue Jun 10, 2020 · 9 comments

Comments

@C4rst3n
Copy link

C4rst3n commented Jun 10, 2020

Describe the bug

The prediction for specific train data is wrong

Steps/Code to Reproduce

You can run the following code. While the first decision limit is correct, for the 2nd pair of points the decision limit is totally wrong placed

Example:

from sklearn import svm
from matplotlib import pyplot as plt
import numpy as np

lclf = svm.LinearSVC()

p1 = 0.25
p2 = -0.25

x_min = -2
x_max = 2
x_h = 0.01
y_min = -2
y_max = 2
y_h = 0.01
xx, yy = np.meshgrid(np.arange(x_min, x_max, x_h),
                    np.arange(y_min, y_max, y_h))

lclf.fit([[p1,0],[p2,0]],[-1,1])

plt.plot(p1, 0, "ro")
plt.plot(p2, 0, "go")

Z = lclf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.gca().contourf(xx, yy, Z, colors = ["red", "green"], alpha = 0.1)
plt.show()


#The following does not work

p3 = 0.025545042466768184
p4 = 0.0293294646367189

x_min = 0
x_max = 0.035
x_h = 0.00001
y_min = -0.01
y_max = 0.01
y_h = 0.001
xx, yy = np.meshgrid(np.arange(x_min, x_max, x_h),
                    np.arange(y_min, y_max, y_h))

lclf.fit([[p3, 0], [p4,0]], [1,-1])

plt.plot(p3, 0, "go")
plt.plot(p4, 0, "ro")

Z = lclf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.gca().contourf(xx, yy, Z, colors = ["red", "green"], alpha = 0.1)
plt.show()

Expected Results

The decision boarder should be centered between the two train points

Actual Results

the boarder is on the left of both points

Versions

System:
python: 3.7.5 (tags/v3.7.5:5c02a39a0b, Oct 15 2019, 00:11:34) [MSC v.1916 64 bit (AMD64)]
executable: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\python.exe
machine: Windows-10-10.0.18362-SP0

Python dependencies:
pip: 20.0.2
setuptools: 46.1.3
sklearn: 0.22.2.post1
numpy: 1.18.3
scipy: 1.4.1
Cython: None
pandas: 1.0.3
matplotlib: 3.2.1
joblib: 0.14.1

Built with OpenMP: True

@amueller
Copy link
Member

amueller commented Jun 10, 2020

This might be an issue with the intercept being penalized and the data not scaled, but I'm not sure.
Are you saying there is an issue with the optimization or are you saying that penalizing the intercept is unexpected?

There could be an issue with the optimization but from this example it's not clear to me.

I'm a bit surprised that setting intercept_scaling to a higher number breaks the first example... that's... odd.

@amueller amueller added Bug and removed Bug: triage labels Jun 10, 2020
@TomDLT
Copy link
Member

TomDLT commented Jun 10, 2020

I don't think this is a bug.

The default parameters cannot be good parameter for all datasets. Therefore, they are probably defined to be good parameters for data with standard scales.

Between your two examples, one works and the other does not. Yet, without talking about the change of intercept, if you scale your features by a factor 0.1, you need to scale the regularization C, and probably the tolerance tol as well.
In the given example, changing to lclf = svm.LinearSVC(tol=1e-5, C=10) solves the issue.

More generally, the user guide states:

Support Vector Machine algorithms are not scale invariant, so it is highly recommended to scale your data. For example, scale each attribute on the input vector X to [0,1] or [-1,+1], or standardize it to have mean 0 and variance 1. Note that the same scaling must be applied to the test vector to obtain meaningful results. This can be done easily by using a Pipeline.

In the given example, changing to lclf = make_pipeline(StandardScaler(), LinearSVC()) also solves the issue.

@amueller
Copy link
Member

amueller commented Jun 10, 2020

@TomDLT I was confused that setting C=100 doesn't fix this, and even more confused that setting intercept_scaling=1000 broke the first example as well. It could be that the solver is just quirky with one example but it seems not very robust.

But you're right, it might be just a tol issue here.

@TomDLT
Copy link
Member

TomDLT commented Jun 11, 2020

C=100 does fix the problem, if you also change the tolerance to tol=1e-6.


The intercept scaling is weirder, as larger values (as low as 100) makes the intercept much slower to converge. Moreover, in the first example (where the intercept should be 0), the intercept takes the value +-1 after 1 iteration, and then is slow to converge to 0. Here is an plot to show the slow convergence of the intercept:

Figure_1

I don't know if this is a bug, or if this is a limit of the intercept_scaling hack.

import warnings

import numpy as np
from matplotlib import pyplot as plt
from sklearn import svm

X = [[0.25, 0], [-0.25, 0]]
y = [-1, 1]

warnings.filterwarnings("ignore")

max_iter_array = np.int_(np.logspace(0, 5, 100))
intercepts = []
for max_iter in max_iter_array:
    lclf = svm.LinearSVC(random_state=0, tol=1e-10, C=1,
                         max_iter=max_iter, intercept_scaling=100)
    lclf.fit(X, y)
    intercepts.append(lclf.intercept_)

plt.plot(max_iter_array, intercepts)
plt.xlabel('n_iter')
plt.title('self.intercept_')
plt.show()

@C4rst3n
Copy link
Author

C4rst3n commented Jun 11, 2020

@TomDLT i thought it was a problem of parameters at first, too. But when i changed them (i varied C and tol. Didn't changed max iter, since i didn't hit this limit when i looked into n_iter_) the result become kind of random result for every time i run the code. i'll try your parameters soon.

@C4rst3n
Copy link
Author

C4rst3n commented Jun 11, 2020

i test your values and on the first run i got the correct boundary. But when i run the same code multiple times, it changes randomly to this:

Figure_1

@amueller
Copy link
Member

I assume it is caused by the intercept_scaling hack leading to non-scaled data. I'm pretty sure liblinear assumes scaled data and doesn't converge well otherwise.

I think the question is whether we are ok with our optimizer being that fragile (we have similar questions for many other linear models). We could see if there have been any updates in liblinear, we're still using a very old version, and our convergence criteria might not be great.

@TomDLT
Copy link
Member

TomDLT commented Jun 18, 2020

I assume it is caused by the intercept_scaling hack leading to non-scaled data. I'm pretty sure liblinear assumes scaled data and doesn't converge well otherwise.

Exactly. We could maybe alleviate the brittleness by centering the data with sklearn.linear_model._base._preprocess_data.

About updating liblinear, I think we slowly modified the code over the years, and I wonder how much we diverged from the original code. Updating liblinear may require significant work.

@StephaneCanu
Copy link

StephaneCanu commented Aug 11, 2020

In my opinion, the problem comes from the fit_intercept option, with which w becomes [w; w_{n+1}] and x becomes [x; bias].
In that case, the intercept is also regularized so that it is shrinked towards zero, and this is not what we want (as noticed for instance in "The Elements of Statistical Learning", on the top of page 64, https://web.stanford.edu/~hastie/Papers/ESLII.pdf).

To retrieve a relevant estimation of the intercept, we know from V. Vapnik that it can be obtained by taking the average of the complimentary conditions of the support vectors (see for instance http://cs229.stanford.edu/notes/cs229-notes3.pdf, eq (11))

It seems to me that the problem can be hacked by recomputing the intercept as follows:

w = lclf.coef_[0][0]
lclf.intercept_ = -(p3+p4)/2*w

To verify the results, I've solved the same SVM problem in the primal and in the dual by using CVX.

But this fit_intercept option also modify the values of the vector parameter w.
Note also that in liblinear, for primal solvers (-s 0, 2, 5, 6, 11), an option -R is provided to remove (w_{n+1})^{}2/2
finally, the value of the intercept may be extracted form the "3 -- L2-regularized L1-loss support vector classification (dual) case, since in that case it is the Lagrange multiplier of the balancing constraint at the solution.

In conclusion, I doubt that simply preprocessing the data will solve the problem. It looks like a reasonable hack.

The doc from https://github.com/cjlin1/liblinear states for multi-class classification:
0 -- L2-regularized logistic regression (primal)
1 -- L2-regularized L2-loss support vector classification (dual)
2 -- L2-regularized L2-loss support vector classification (primal)
3 -- L2-regularized L1-loss support vector classification (dual)
4 -- support vector classification by Crammer and Singer
5 -- L1-regularized L2-loss support vector classification
6 -- L1-regularized logistic regression
7 -- L2-regularized logistic regression (dual)

from sklearn import svm
from matplotlib import pyplot as plt
import numpy as np

lclf = svm.LinearSVC(dual=True, loss='hinge',verbose=2,penalty='l2',C = 100,max_iter=100000,fit_intercept=True)
#lclf = svm.LinearSVC(dual=False, penalty='l2', loss='squared_hinge',  C = 100)
#lclf = svm.LinearSVC(C = 100)

p3 = 0.025545042466768184
p4 = 0.0293294646367189

x_min = 0
x_max = 0.035
x_h = 0.00001
y_min = -0.01
y_max = 0.01
y_h = 0.001
xx, yy = np.meshgrid(np.arange(x_min, x_max, x_h),
                    np.arange(y_min, y_max, y_h))

lclf.fit([[p3, 0.], [p4,0]], [1,-1])

# recalculation of the intercept according to complementary conditions
# see for instance http://cs229.stanford.edu/notes/cs229-notes3.pdf, eq (11)
w = lclf.coef_[0][0]
lclf.intercept_ = -(p3+p4)/2*w

plt.plot(p3, 0, "go")
plt.plot(p4, 0, "ro")

Z = lclf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.gca().contourf(xx, yy, Z, colors = ["red", "green"], alpha = 0.1)
plt.show()

print(lclf.coef_)
print(lclf.intercept_)
print(w)

Solving L2-regularized L1-loss support vector classification in dual with biais

This part require CVXPY, nag and gurobi

import cvxpy as cvx

n = 2
Ya = np.array([-1,1])
X = np.array([[p3,0],[p4,0]])
G = np.outer(Ya,Ya)*(X@X.T)
              
e = np.ones((n,1))
alpha = cvx.Variable(n)
C = cvx.Parameter()
qf = cvx.quad_form(alpha, G)
obj =  cvx.Minimize(.5*qf - e.T@alpha)
const = [Ya.T@alpha == 0, 
         alpha >= 0,
         alpha <= C]
prob = cvx.Problem(obj,const)
C.value = 100
prob.solve(solver=cvx.GUROBI)

a = alpha.value
b = const[0].dual_value
w = C.value*p4 - C.value*p3
print(w, a,b,(p3+p4)/2*w)

lclf.intercept_ = -b
plt.plot(p3, 0, "go")
plt.plot(p4, 0, "ro")
Z = lclf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.gca().contourf(xx, yy, Z, colors = ["red", "green"], alpha = 0.1)
plt.show()

## Solving  L2-regularized L1-loss support vector classification in the primal with biais

X = np.array([[p3,0.],[p4,0]])
Ya = np.array([-1,1]).reshape((2,1))

w = cvx.Variable((2,1))
e = cvx.Variable((3,1))
b = cvx.Variable()
e = cvx.pos(1 - cvx.multiply(Ya, X @ w + b))
loss = cvx.sum(e)
reg = cvx.sum_squares(w)
C = cvx.Parameter(nonneg=True)
prob = cvx.Problem(cvx.Minimize(C*loss + .5*reg))
C.value = 100
prob.solve(solver=cvx.NAG)
# it fails if C is too small

print(w.value,b.value,e.value)

plt.plot(p3, 0, "go")
plt.plot(p4, 0, "ro")
Z = (np.c_[xx.ravel(), yy.ravel()])@w.value + b.value
Z = Z.reshape(xx.shape)
plt.gca().contourf(xx, yy, np.sign(Z), colors = ["red", "green"], alpha = 0.1)
plt.show()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants