Skip to content

LinearSVC ignores sample weights #10873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nimrodta opened this issue Mar 26, 2018 · 10 comments · Fixed by #15038
Closed

LinearSVC ignores sample weights #10873

nimrodta opened this issue Mar 26, 2018 · 10 comments · Fixed by #15038

Comments

@nimrodta
Copy link

Hi,

Description

It appears LinearSVC ignores (or suppresses) sample weights, and the model remains the same regardless of the sample weight input.
This can be demonstrated when comparing a LinearSVC model to an SVC model with a linear kernel.

Steps/Code to Reproduce

Extension of the example in:
http://scikit-learn.org/stable/auto_examples/svm/plot_weighted_samples.html#sphx-glr-auto-examples-svm-plot-weighted-samples-py)

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm


def plot_decision_function(classifier, sample_weight, axis, title):
    # plot the decision function
    xx, yy = np.meshgrid(np.linspace(-4, 5, 500), np.linspace(-4, 5, 500))

    Z = classifier.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # plot the line, the points, and the nearest vectors to the plane
    axis.contourf(xx, yy, Z, alpha=0.75, cmap=plt.cm.bone)
    axis.scatter(X[:, 0], X[:, 1], c=y, s=100 * sample_weight, alpha=0.9,
                 cmap=plt.cm.bone, edgecolors='black')

    axis.axis('off')
    axis.set_title(title)


# we create 20 points
np.random.seed(0)
X = np.r_[np.random.randn(10, 2) + [1, 1], np.random.randn(10, 2)]
y = [1] * 10 + [-1] * 10
sample_weight_last_ten = abs(np.random.randn(len(X)))
sample_weight_constant = np.ones(len(X))
# and bigger weights to some outliers
sample_weight_last_ten[15:] *= 5
sample_weight_last_ten[9] *= 15

# for reference, first fit without class weights

fig, axes = plt.subplots(1, 4, figsize=(22, 6))

# fit the model SVC
clf_weights = svm.SVC(kernel='linear')
clf_weights.fit(X, y, sample_weight=sample_weight_last_ten)

clf_no_weights = svm.SVC(kernel='linear')
clf_no_weights.fit(X, y)


plot_decision_function(clf_no_weights, sample_weight_constant, axes[0],
                       "SVC Constant weights")
plot_decision_function(clf_weights, sample_weight_last_ten, axes[1],
                       "SVC Modified weights")

# fit the model LinearSVC
clf_weights2 = svm.LinearSVC()
clf_weights2.fit(X, y, sample_weight=sample_weight_last_ten)

clf_no_weights2 = svm.LinearSVC()
clf_no_weights2.fit(X, y)

plot_decision_function(clf_no_weights2, sample_weight_constant, axes[2],
                       "LinearSVC Constant weights")
plot_decision_function(clf_weights2, sample_weight_last_ten, axes[3],
                       "LinearSVC Modified weights")

plt.show()

Results

In the 4 plots, you can see that the SVC with the linear kernel is affected by the sample weight, while the LinearSVC model is not.

Versions

Windows-10-10.0.16299-SP0
Python 3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
NumPy 1.14.0
SciPy 1.0.0
Scikit-Learn 0.19.1

Thanks!

@jnothman
Copy link
Member

jnothman commented Mar 26, 2018 via email

@jnothman
Copy link
Member

jnothman commented Mar 26, 2018 via email

@nimrodta
Copy link
Author

nimrodta commented Jan 6, 2019

Hi,

It's an old issue, but it still persists in sklearn version 0.20.1
Is there any plan/possibility to look into this?

The alternative (for SVM-based models) is to run SVC with a linear kernel, however it is much slower.

Thanks!

@melnikovsky
Copy link

Support for the sample weights in the primal problem seems to be easy, see the patch attached.
L2R_L2LOSS_SVC.sample_weight.txt

Dual problem is not that straightforward.

@amueller
Copy link
Member

@melnikovsky do you want to send a PR with a test?

@amueller
Copy link
Member

Don't we have common tests for sample weights? This seems really bad :-/

@glemaitre
Copy link
Member

The fix in #15018 is right. I also think that we should raise a NotImplementedError for the configurations which are not supporting sample_weight for the moment. If someone knows how to handle sample_weight solving the problem in the dual, we could make it.

@glemaitre
Copy link
Member

liblinear/libsvm provides a version supporting sample_weight:
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances

We should indeed synchronize our code to these versions for the sample_weight support

@anbai106
Copy link

@glemaitre I have encountered the same issue, LinearSVC ignores the sample_weight, which should be fixed as in libsvm https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances.

@statmlben
Copy link

I made a Python library to solve the SVM with sample weight, based on the same algorithm with LibLinear, see https://github.com/statmlben/Variant-SVM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants