-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Inflated results on random-data with SVM #25631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The way you are evaluating is equivalent to using It is known to not be appropriate to evaluate a model: Warning: Note on inappropriate usage of The result of cross_val_predict may be different from those obtained using cross_val_score as the elements are grouped in different ways. The function cross_val_score takes an average over cross-validation folds, whereas cross_val_predict simply returns the labels (or probabilities) from several distinct models undistinguished. Thus, cross_val_predict is not an appropriate measure of generalization error. So I think that what you observe here is this problem. |
Hi, thanks for your fast reply. However, I don't think this is the issue. If I read To verify this, I rewrote the code (lot simpler), to this (which should be better right?):
but the issue still persists. And only if the mean of Could it be something with the internal cross-validation of the SVM because of |
This is strange. When replacing
with
the results are as one would expect, while if I understand the documentation correctly, this is what |
This uses the same Platt method. However, |
I think the difference is that one of them (I think scikit-learn?) averages the cross-validated results while libsvm refits the svm on the whole data and uses the fitted sigmoid model? Though that wouldn't explain the mismatch I think? libsvm's platt scaling had some interesting edge cases I think, but I don't remember which one would explain this behavior. Also see #16145. My confusion on the same issue four years ago can be found here: #13662 (comment) |
I think the conclusion there was that CalibratedClassifierCV uses stratified sampling and libsvm does not, and the LOO is indeed the culprit here. |
thanks for these links! While these are definitely useful and closely related, they do not seem to mention this specific issue raised here though (but I might be missing/misunderstanding something of course). What I think happens at a higher level, is that there is something wrong/strange in the libsvm's platt scaling in these specific circumstances (low C, LOO-CV, balanced I was wondering how it is possible that with random data, that is not passed to
Hope this makes sense? EDIT: and maybe good to emphasize, with higher values for |
I think this is closely related to an issue that AutoGluon has seen in their stacking: Essentially calibration is stacking, and we're facing the same information leakage here. |
Describe the bug
When trying to train/evaluate a support vector machine in scikit-learn, I am experiencing some unexpected behaviour and I am wondering whether I am doing something wrong or that this is a possible bug.
In a very specific subset of circumstances, namely:
LeaveOneOut()
is used as cross-validation procedureprobability = True
and a smallC
such as0.01
The results of the trained SVM are very good on randomly generated data - while they should be near chance. If the y labels are a bit different, or the SVM is swapped out for a
LogisticRegression
, it gives expected results (Brier of 0.25, AUC near 0.5).But for the named circumstances, the Brier is roughly 0.10 - 0.15 and AUC > 0.9 if the y labels are balanced.
Steps/Code to Reproduce
Expected Results
I would expect that all results would be somewhat similar, with a Brier ~0.25 and AUC ~0.5.
Actual Results
Here, you can see that if the
np.mean
of they_labels
is 0.5, the results are actually really really good.While the data is randomly generated for 500 times
Versions
The text was updated successfully, but these errors were encountered: