-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] FIX Negative or null sample_weights in SVM #14286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for tackling this, @alexshacked ! Wouldn't enforcing that all |
Yes, @rth. enforcing that all sample_weights are positive on the python side, in BaseLibSVM.fit() for example would be simpler. This was my original intention. |
Hi @glemaitre. I don't know what self-request a review means. But, I"ll be glad if you will review my PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am proposing some changes in the tests:
- use pytest
- improve some previous tests which were using sample_weight
The tests will fail because the ValueError
raised does not match the correct error message (which is hidden when using assert_raises
).
Even if I agree with the changes in the cpp
file, I think that we should implement what was proposed by @rth. Validate and failing early will be better. So I would add the checking in the cpp
and change slightly the tests that I propose.
@alexshacked it is just for me to track the PR which I am going to review :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style change in the CPP file
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
closes #9674
closes #9494
Issue #9494 surfaced when training svm.SVC classifier with negative sample weights. There is a PR that attempted to fix this issue - PR #9674. SVC algorithm is part of five (5) svm algorithms that are implemented in svm.cpp: SVC, NuSVC, SVR, NuSVR and OneClassSVM. Function PREFIX(train) in svm.cpp implements all 5 algorithms. This function handles the samples with negative/zero weights by removing them from the training set.This behavior can result in an invalid model after fit() returns if too many samples are removed. The problem occurs in algorithms: SVC, SVR, NuSVR and OneClassSVM. NuSVC is the only class that currently detects this and fails fit() with an exception.This PR resolves this problem by detecting this situation in all classes and behaving like NuSVC- throwing an Exception during fit(). Please see a more detailed analysis at the bottom of issue #9494