Skip to content

KFold(n_samples=n) not equivalent to LeaveOneOut() cv in CalibratedClassifierCV() #29000

Closed
@ethanresnick

Description

@ethanresnick

Describe the bug

Calling CalibratedClassifierCV() with cv=KFold(n_samples=n) (where n is the number of samples) can give different results than using cv=LeaveOneOut(), but the docs for LeaveOneOut() say these should be equivalent.

In particular, the KFold class has an "n_splits" attribute, which means this branch runs when setting up sigmoid calibration, and then this error can be thrown. With LeaveOneOut(), n_folds is set to None and that error is never hit.

I'm not sure whether that error is correct/desirable in every case (see the code to reproduce for my use case where I think(?) the error may be unnecessary) but, either way, the two different cv values seem like they should behave equivalently.

Steps/Code to Reproduce

from sklearn.pipeline import make_pipeline
from sklearn.calibration import CalibratedClassifierCV
from sklearn.model_selection import KFold, LeaveOneOut
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=20, random_state=42)

pipeline = make_pipeline(
    StandardScaler(),
    CalibratedClassifierCV(
        SVC(probability=False),
        ensemble=False,
        cv=LeaveOneOut()
    )
)
pipeline.fit(X, y)

pipeline2 = make_pipeline(
    StandardScaler(),
    CalibratedClassifierCV(
        SVC(probability=False),
        ensemble=False,
        cv=KFold(n_splits=20, shuffle=True)
    )
)
pipeline2.fit(X, y)

Expected Results

pipeline and pipeline2 should function identically. Instead, pipeline.fit() succeeds and pipeline2.fit() throws.

Actual Results

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/python3.11/site-packages/sklearn/base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/python3.11/site-packages/sklearn/pipeline.py", line 427, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/python3.11/site-packages/sklearn/base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/python3.11/site-packages/sklearn/calibration.py", line 419, in fit
    raise ValueError(
ValueError: Requesting 20-fold cross-validation but provided less than 20 examples for at least one class.

Versions

System:
    python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:34:54) [Clang 16.0.6 ]
   machine: macOS-14.4.1-arm64-arm-64bit

Python dependencies:
      sklearn: 1.3.2
          pip: 24.0
   setuptools: 69.0.2
        numpy: 1.26.2
        scipy: 1.11.4
       Cython: None
       pandas: 2.1.3
   matplotlib: 3.8.2
       joblib: 1.3.2
threadpoolctl: 3.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
    num_threads: 12
         prefix: libomp
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 12
         prefix: libopenblas
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: armv8

       user_api: blas
   internal_api: openblas
    num_threads: 12
         prefix: libopenblas
        version: 0.3.21.dev
threading_layer: pthreads
   architecture: armv8

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions