Skip to content

[MRG+1] FIX Negative or null sample_weights in SVM #14286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Sep 24, 2019

Conversation

alexshacked
Copy link
Contributor

@alexshacked alexshacked commented Jul 7, 2019

closes #9674
closes #9494

Issue #9494 surfaced when training svm.SVC classifier with negative sample weights. There is a PR that attempted to fix this issue - PR #9674. SVC algorithm is part of five (5) svm algorithms that are implemented in svm.cpp: SVC, NuSVC, SVR, NuSVR and OneClassSVM. Function PREFIX(train) in svm.cpp implements all 5 algorithms. This function handles the samples with negative/zero weights by removing them from the training set.This behavior can result in an invalid model after fit() returns if too many samples are removed. The problem occurs in algorithms: SVC, SVR, NuSVR and OneClassSVM. NuSVC is the only class that currently detects this and fails fit() with an exception.This PR resolves this problem by detecting this situation in all classes and behaving like NuSVC- throwing an Exception during fit(). Please see a more detailed analysis at the bottom of issue #9494

@rth
Copy link
Member

rth commented Jul 8, 2019

Thanks for tackling this, @alexshacked ! Wouldn't enforcing that all sample_weights are positive on the Python side (and raising an error if they are not) be a simpler solution?

@alexshacked
Copy link
Contributor Author

alexshacked commented Jul 8, 2019

Thanks for tackling this, @alexshacked ! Wouldn't enforcing that all sample_weights are positive on the Python side (and raising an error if they are not) be a simpler solution?

Yes, @rth. enforcing that all sample_weights are positive on the python side, in BaseLibSVM.fit() for example would be simpler. This was my original intention.
Until I saw that in the algorithm implementation in svm.cpp:2342, the negative weights are actually handled.
PREFIX(model) *PREFIX(train)(const PREFIX(problem) *prob, const svm_parameter *param, int *status) { PREFIX(problem) newprob; remove_zero_weight(&newprob, prob);
This is the current implementation in the code base. There was a concious decision by the SVM implementor to handle input with negative/zero weights, by removing this samples from the input, and training the model on the remaining samples. The SVM implementor was not shy about rejecting invalid input and in function check_parameter() it does just that for many scenarios. But in the case of negative weights the decision was not to reject the input but to handle it by removing samples with negative weights.
The only thing is that while doing so the SVM implementor ommited some border cases that arise, like the one discovered by bug #9494.
My thought was that a less intrusive fix would be to teach the implementation to handle the border cases without changing current behavior of the SVM algorithm which handles input with negative weights.
Hence, the present solution.
If on the other hand, the decision for SVM algorithms would be not to handle input where part of the samples are zero/negative anymore, I would be happy to implement. On the python side naturally:-)

@rth rth changed the title [MRG] #9494 FIX Fail to train SVM ... [MRG] #9494 FIX Negative or null sample_weights in SVM Jul 8, 2019
@rth rth changed the title [MRG] #9494 FIX Negative or null sample_weights in SVM [MRG] FIX Negative or null sample_weights in SVM Jul 8, 2019
@glemaitre glemaitre self-requested a review July 8, 2019 16:04
@alexshacked
Copy link
Contributor Author

alexshacked commented Jul 8, 2019

Hi @glemaitre. I don't know what self-request a review means. But, I"ll be glad if you will review my PR.
I see that you also have worked with sample weights lately
If you need me to do something to enable you to review please tell me.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am proposing some changes in the tests:

  • use pytest
  • improve some previous tests which were using sample_weight

The tests will fail because the ValueError raised does not match the correct error message (which is hidden when using assert_raises).

Even if I agree with the changes in the cpp file, I think that we should implement what was proposed by @rth. Validate and failing early will be better. So I would add the checking in the cpp and change slightly the tests that I propose.

@glemaitre
Copy link
Member

self-request a review means

@alexshacked it is just for me to track the PR which I am going to review :)

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style change in the CPP file

alexshacked and others added 14 commits July 9, 2019 11:58
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Aug 5, 2020
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Oct 15, 2020
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Nov 28, 2020
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Dec 12, 2020
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Feb 25, 2021
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Feb 25, 2021
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Mar 27, 2021
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Jun 15, 2021
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Jul 23, 2021
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Nov 10, 2021
ivannz added a commit to ivannz/scikit-learn that referenced this pull request May 15, 2022
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Jun 14, 2022
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Aug 29, 2022
ivannz added a commit to ivannz/scikit-learn that referenced this pull request Sep 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Fail to train SVM when got "warning: class label 0 specified in weight is not found"
7 participants