-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Closed
Labels
Description
Hi,
Description
It appears LinearSVC ignores (or suppresses) sample weights, and the model remains the same regardless of the sample weight input.
This can be demonstrated when comparing a LinearSVC model to an SVC model with a linear kernel.
Steps/Code to Reproduce
Extension of the example in:
http://scikit-learn.org/stable/auto_examples/svm/plot_weighted_samples.html#sphx-glr-auto-examples-svm-plot-weighted-samples-py)
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
def plot_decision_function(classifier, sample_weight, axis, title):
# plot the decision function
xx, yy = np.meshgrid(np.linspace(-4, 5, 500), np.linspace(-4, 5, 500))
Z = classifier.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
# plot the line, the points, and the nearest vectors to the plane
axis.contourf(xx, yy, Z, alpha=0.75, cmap=plt.cm.bone)
axis.scatter(X[:, 0], X[:, 1], c=y, s=100 * sample_weight, alpha=0.9,
cmap=plt.cm.bone, edgecolors='black')
axis.axis('off')
axis.set_title(title)
# we create 20 points
np.random.seed(0)
X = np.r_[np.random.randn(10, 2) + [1, 1], np.random.randn(10, 2)]
y = [1] * 10 + [-1] * 10
sample_weight_last_ten = abs(np.random.randn(len(X)))
sample_weight_constant = np.ones(len(X))
# and bigger weights to some outliers
sample_weight_last_ten[15:] *= 5
sample_weight_last_ten[9] *= 15
# for reference, first fit without class weights
fig, axes = plt.subplots(1, 4, figsize=(22, 6))
# fit the model SVC
clf_weights = svm.SVC(kernel='linear')
clf_weights.fit(X, y, sample_weight=sample_weight_last_ten)
clf_no_weights = svm.SVC(kernel='linear')
clf_no_weights.fit(X, y)
plot_decision_function(clf_no_weights, sample_weight_constant, axes[0],
"SVC Constant weights")
plot_decision_function(clf_weights, sample_weight_last_ten, axes[1],
"SVC Modified weights")
# fit the model LinearSVC
clf_weights2 = svm.LinearSVC()
clf_weights2.fit(X, y, sample_weight=sample_weight_last_ten)
clf_no_weights2 = svm.LinearSVC()
clf_no_weights2.fit(X, y)
plot_decision_function(clf_no_weights2, sample_weight_constant, axes[2],
"LinearSVC Constant weights")
plot_decision_function(clf_weights2, sample_weight_last_ten, axes[3],
"LinearSVC Modified weights")
plt.show()
Results
In the 4 plots, you can see that the SVC with the linear kernel is affected by the sample weight, while the LinearSVC model is not.
Versions
Windows-10-10.0.16299-SP0
Python 3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
NumPy 1.14.0
SciPy 1.0.0
Scikit-Learn 0.19.1
Thanks!
lunluen and bmaneesh