Description
Describe the bug
Calling CalibratedClassifierCV()
with cv=KFold(n_samples=n)
(where n is the number of samples) can give different results than using cv=LeaveOneOut()
, but the docs for LeaveOneOut()
say these should be equivalent.
In particular, the KFold
class has an "n_splits"
attribute, which means this branch runs when setting up sigmoid calibration, and then this error can be thrown. With LeaveOneOut()
, n_folds
is set to None
and that error is never hit.
I'm not sure whether that error is correct/desirable in every case (see the code to reproduce for my use case where I think(?) the error may be unnecessary) but, either way, the two different cv
values seem like they should behave equivalently.
Steps/Code to Reproduce
from sklearn.pipeline import make_pipeline
from sklearn.calibration import CalibratedClassifierCV
from sklearn.model_selection import KFold, LeaveOneOut
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=20, random_state=42)
pipeline = make_pipeline(
StandardScaler(),
CalibratedClassifierCV(
SVC(probability=False),
ensemble=False,
cv=LeaveOneOut()
)
)
pipeline.fit(X, y)
pipeline2 = make_pipeline(
StandardScaler(),
CalibratedClassifierCV(
SVC(probability=False),
ensemble=False,
cv=KFold(n_splits=20, shuffle=True)
)
)
pipeline2.fit(X, y)
Expected Results
pipeline
and pipeline2
should function identically. Instead, pipeline.fit()
succeeds and pipeline2.fit()
throws.
Actual Results
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/python3.11/site-packages/sklearn/base.py", line 1152, in wrapper
return fit_method(estimator, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python3.11/site-packages/sklearn/pipeline.py", line 427, in fit
self._final_estimator.fit(Xt, y, **fit_params_last_step)
File "/python3.11/site-packages/sklearn/base.py", line 1152, in wrapper
return fit_method(estimator, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python3.11/site-packages/sklearn/calibration.py", line 419, in fit
raise ValueError(
ValueError: Requesting 20-fold cross-validation but provided less than 20 examples for at least one class.
Versions
System:
python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:34:54) [Clang 16.0.6 ]
machine: macOS-14.4.1-arm64-arm-64bit
Python dependencies:
sklearn: 1.3.2
pip: 24.0
setuptools: 69.0.2
numpy: 1.26.2
scipy: 1.11.4
Cython: None
pandas: 2.1.3
matplotlib: 3.8.2
joblib: 1.3.2
threadpoolctl: 3.2.0
Built with OpenMP: True
threadpoolctl info:
user_api: openmp
internal_api: openmp
num_threads: 12
prefix: libomp
version: None
user_api: blas
internal_api: openblas
num_threads: 12
prefix: libopenblas
version: 0.3.23.dev
threading_layer: pthreads
architecture: armv8
user_api: blas
internal_api: openblas
num_threads: 12
prefix: libopenblas
version: 0.3.21.dev
threading_layer: pthreads
architecture: armv8