-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Linear SVM does wrong prediction #17557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This might be an issue with the intercept being penalized and the data not scaled, but I'm not sure. There could be an issue with the optimization but from this example it's not clear to me. I'm a bit surprised that setting intercept_scaling to a higher number breaks the first example... that's... odd. |
I don't think this is a bug. The default parameters cannot be good parameter for all datasets. Therefore, they are probably defined to be good parameters for data with standard scales. Between your two examples, one works and the other does not. Yet, without talking about the change of intercept, if you scale your features by a factor 0.1, you need to scale the regularization More generally, the user guide states:
In the given example, changing to |
@TomDLT I was confused that setting But you're right, it might be just a tol issue here. |
The intercept scaling is weirder, as larger values (as low as 100) makes the intercept much slower to converge. Moreover, in the first example (where the intercept should be 0), the intercept takes the value +-1 after 1 iteration, and then is slow to converge to 0. Here is an plot to show the slow convergence of the intercept: I don't know if this is a bug, or if this is a limit of the import warnings
import numpy as np
from matplotlib import pyplot as plt
from sklearn import svm
X = [[0.25, 0], [-0.25, 0]]
y = [-1, 1]
warnings.filterwarnings("ignore")
max_iter_array = np.int_(np.logspace(0, 5, 100))
intercepts = []
for max_iter in max_iter_array:
lclf = svm.LinearSVC(random_state=0, tol=1e-10, C=1,
max_iter=max_iter, intercept_scaling=100)
lclf.fit(X, y)
intercepts.append(lclf.intercept_)
plt.plot(max_iter_array, intercepts)
plt.xlabel('n_iter')
plt.title('self.intercept_')
plt.show() |
@TomDLT i thought it was a problem of parameters at first, too. But when i changed them (i varied C and tol. Didn't changed max iter, since i didn't hit this limit when i looked into n_iter_) the result become kind of random result for every time i run the code. i'll try your parameters soon. |
I assume it is caused by the I think the question is whether we are ok with our optimizer being that fragile (we have similar questions for many other linear models). We could see if there have been any updates in liblinear, we're still using a very old version, and our convergence criteria might not be great. |
Exactly. We could maybe alleviate the brittleness by centering the data with About updating liblinear, I think we slowly modified the code over the years, and I wonder how much we diverged from the original code. Updating liblinear may require significant work. |
In my opinion, the problem comes from the fit_intercept option, with which w becomes [w; w_{n+1}] and x becomes [x; bias]. To retrieve a relevant estimation of the intercept, we know from V. Vapnik that it can be obtained by taking the average of the complimentary conditions of the support vectors (see for instance http://cs229.stanford.edu/notes/cs229-notes3.pdf, eq (11)) It seems to me that the problem can be hacked by recomputing the intercept as follows: w = lclf.coef_[0][0]
lclf.intercept_ = -(p3+p4)/2*w To verify the results, I've solved the same SVM problem in the primal and in the dual by using CVX. But this fit_intercept option also modify the values of the vector parameter w. In conclusion, I doubt that simply preprocessing the data will solve the problem. It looks like a reasonable hack. The doc from https://github.com/cjlin1/liblinear states for multi-class classification: from sklearn import svm
from matplotlib import pyplot as plt
import numpy as np
lclf = svm.LinearSVC(dual=True, loss='hinge',verbose=2,penalty='l2',C = 100,max_iter=100000,fit_intercept=True)
#lclf = svm.LinearSVC(dual=False, penalty='l2', loss='squared_hinge', C = 100)
#lclf = svm.LinearSVC(C = 100)
p3 = 0.025545042466768184
p4 = 0.0293294646367189
x_min = 0
x_max = 0.035
x_h = 0.00001
y_min = -0.01
y_max = 0.01
y_h = 0.001
xx, yy = np.meshgrid(np.arange(x_min, x_max, x_h),
np.arange(y_min, y_max, y_h))
lclf.fit([[p3, 0.], [p4,0]], [1,-1])
# recalculation of the intercept according to complementary conditions
# see for instance http://cs229.stanford.edu/notes/cs229-notes3.pdf, eq (11)
w = lclf.coef_[0][0]
lclf.intercept_ = -(p3+p4)/2*w
plt.plot(p3, 0, "go")
plt.plot(p4, 0, "ro")
Z = lclf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.gca().contourf(xx, yy, Z, colors = ["red", "green"], alpha = 0.1)
plt.show()
print(lclf.coef_)
print(lclf.intercept_)
print(w) Solving L2-regularized L1-loss support vector classification in dual with biaisThis part require CVXPY, nag and gurobi import cvxpy as cvx
n = 2
Ya = np.array([-1,1])
X = np.array([[p3,0],[p4,0]])
G = np.outer(Ya,Ya)*(X@X.T)
e = np.ones((n,1))
alpha = cvx.Variable(n)
C = cvx.Parameter()
qf = cvx.quad_form(alpha, G)
obj = cvx.Minimize(.5*qf - e.T@alpha)
const = [Ya.T@alpha == 0,
alpha >= 0,
alpha <= C]
prob = cvx.Problem(obj,const)
C.value = 100
prob.solve(solver=cvx.GUROBI)
a = alpha.value
b = const[0].dual_value
w = C.value*p4 - C.value*p3
print(w, a,b,(p3+p4)/2*w)
lclf.intercept_ = -b
plt.plot(p3, 0, "go")
plt.plot(p4, 0, "ro")
Z = lclf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.gca().contourf(xx, yy, Z, colors = ["red", "green"], alpha = 0.1)
plt.show()
## Solving L2-regularized L1-loss support vector classification in the primal with biais
X = np.array([[p3,0.],[p4,0]])
Ya = np.array([-1,1]).reshape((2,1))
w = cvx.Variable((2,1))
e = cvx.Variable((3,1))
b = cvx.Variable()
e = cvx.pos(1 - cvx.multiply(Ya, X @ w + b))
loss = cvx.sum(e)
reg = cvx.sum_squares(w)
C = cvx.Parameter(nonneg=True)
prob = cvx.Problem(cvx.Minimize(C*loss + .5*reg))
C.value = 100
prob.solve(solver=cvx.NAG)
# it fails if C is too small
print(w.value,b.value,e.value)
plt.plot(p3, 0, "go")
plt.plot(p4, 0, "ro")
Z = (np.c_[xx.ravel(), yy.ravel()])@w.value + b.value
Z = Z.reshape(xx.shape)
plt.gca().contourf(xx, yy, np.sign(Z), colors = ["red", "green"], alpha = 0.1)
plt.show() |
Describe the bug
The prediction for specific train data is wrong
Steps/Code to Reproduce
You can run the following code. While the first decision limit is correct, for the 2nd pair of points the decision limit is totally wrong placed
Example:
Expected Results
The decision boarder should be centered between the two train points
Actual Results
the boarder is on the left of both points
Versions
System:
python: 3.7.5 (tags/v3.7.5:5c02a39a0b, Oct 15 2019, 00:11:34) [MSC v.1916 64 bit (AMD64)]
executable: C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\python.exe
machine: Windows-10-10.0.18362-SP0
Python dependencies:
pip: 20.0.2
setuptools: 46.1.3
sklearn: 0.22.2.post1
numpy: 1.18.3
scipy: 1.4.1
Cython: None
pandas: 1.0.3
matplotlib: 3.2.1
joblib: 0.14.1
Built with OpenMP: True
The text was updated successfully, but these errors were encountered: