Skip to content

check_estimator() not respecting custom estimator constraints via check_estimators_fit_returns_self() #23885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
AlexVialaBellander opened this issue Jul 12, 2022 · 1 comment
Labels
Bug Developer API Third party developer API related module:test-suite everything related to our tests

Comments

@AlexVialaBellander
Copy link

Describe the bug

check_estimators_fit_returns_self() does not pass with n_feature constraints

Building a custom estimator extending BaseEstimator with certain constraints such as:

Example of constraints in the test.

X = check_array(
    X, 
    accept_sparse = False,
    accept_large_sparse = False,
    ensure_min_samples = 2,
    ensure_min_features = 3,
    force_all_finite = 'allow-nan')

The input and output of the estimator results in error thrown within check_estimator.

It appears that the tests run by check_estimator still check with input data with n_features = 2 for instance. Thus, the check_array with the above constraints will throw an error. This is not limited to the number of features but also whether force_all_finite allows nan or not. Some tests check with this value as True.

The test check_estimators_fit_returns_self() generates blobs with n_features = 2. Thus, this test will fail for any estimator that has feature_constraints != 2.

See: https://github.com/scikit-learn/scikit-learn/blob/baf0ea25d/sklearn/utils/estimator_checks.py#L2581-L2595

TL:DR

It appears that despite setting certain constraints of the input X,y. Such as force_all_finite and ensure_min_features. Some tests still use the default values for the check_array function and fail the check_estimator test due to this.

The full error

Full error
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 check_estimator(model)

File ~/miniforge3/lib/python3.9/site-packages/sklearn/utils/estimator_checks.py:623, in check_estimator(estimator, generate_only, Estimator)
    621 for estimator, check in checks_generator():
    622     try:
--> 623         check(estimator)
    624     except SkipTest as exception:
    625         # SkipTest is thrown when pandas can't be imported, or by checks
    626         # that are in the xfail_checks tag
    627         warnings.warn(str(exception), SkipTestWarning)

File ~/miniforge3/lib/python3.9/site-packages/sklearn/utils/_testing.py:318, in _IgnoreWarnings.__call__.<locals>.wrapper(*args, **kwargs)
    316 with warnings.catch_warnings():
    317     warnings.simplefilter("ignore", self.category)
--> 318     return fn(*args, **kwargs)

File ~/miniforge3/lib/python3.9/site-packages/sklearn/utils/estimator_checks.py:2595, in check_estimators_fit_returns_self(name, estimator_orig, readonly_memmap)
   2592     X, y = create_memmap_backed_data([X, y])
   2594 set_random_state(estimator)
-> 2595 assert estimator.fit(X, y) is estimator

File ~/Documents/VOIDEV/tandem-riding-detection-service/service/source/models/physics/_base.py:42, in PhysicsRegressor.fit(self, X, y)
     20 """A fitting function which only checks the format of X and y.
     21 
     22 About the expected input data
   (...)
     38     Returns self.
     39 """
     40 self.n_features_in_ = 3
---> 42 X, y = check_X_y(
     43     X,
     44     y,
     45     accept_sparse = False,
     46     accept_large_sparse = False,
     47     ensure_min_samples = 2,
     48     ensure_min_features = 3,
     49     force_all_finite = True)
     51 #assert X.shape[-1] == 3, f"Expected 3 features. Received {X.shape[-1]}"
     53 return self

File ~/miniforge3/lib/python3.9/site-packages/sklearn/utils/validation.py:1074, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
   1069         estimator_name = _check_estimator_name(estimator)
   1070     raise ValueError(
   1071         f"{estimator_name} requires y to be passed, but the target y is None"
   1072     )
-> 1074 X = check_array(
   1075     X,
   1076     accept_sparse=accept_sparse,
   1077     accept_large_sparse=accept_large_sparse,
   1078     dtype=dtype,
   1079     order=order,
   1080     copy=copy,
   1081     force_all_finite=force_all_finite,
   1082     ensure_2d=ensure_2d,
   1083     allow_nd=allow_nd,
   1084     ensure_min_samples=ensure_min_samples,
   1085     ensure_min_features=ensure_min_features,
   1086     estimator=estimator,
   1087     input_name="X",
   1088 )
   1090 y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric, estimator=estimator)
   1092 check_consistent_length(X, y)

File ~/miniforge3/lib/python3.9/site-packages/sklearn/utils/validation.py:918, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
    916     n_features = array.shape[1]
    917     if n_features < ensure_min_features:
--> 918         raise ValueError(
    919             "Found array with %d feature(s) (shape=%s) while"
    920             " a minimum of %d is required%s."
    921             % (n_features, array.shape, ensure_min_features, context)
    922         )
    924 if copy and np.may_share_memory(array, array_orig):
    925     array = np.array(array, dtype=dtype, order=order)

ValueError: Found array with 2 feature(s) (shape=(21, 2)) while a minimum of 3 is required.

Reproduction

Let's create a very basic estimator which only takes 3 features. Then run check_estimator on this.

class DummyEstimator(BaseEstimator):
    def __init__(self):
        pass

    def fit(self, X, y=None):
        X, y = check_X_y(
            X,
            y,
            accept_sparse = False,
            accept_large_sparse = False,
            ensure_min_features = 3,
            force_all_finite = 'allow-nan')
        return self

    def predict(self, X):
        X  = check_array(
            X,
            accept_sparse = False,
            accept_large_sparse = False,
            ensure_min_features = 3,
            force_all_finite = 'allow-nan')
        
        # very dumb example but it does not matter
        y_pred = X[:,0] + X[:,1] + X[:,3]
        
        y_pred  = check_array(
            y_pred,
            accept_sparse = False,
            accept_large_sparse = False,
            ensure_min_features = 3,
            ensure_2d = False,
            force_all_finite = 'allow-nan')
        
        return y_pred

Error produced:
ValueError: Found array with 2 feature(s) (shape=(21, 2)) while a minimum of 3 is required.

Steps/Code to Reproduce

Run a check_estimator on an instance of this dummy estimator

class DummyEstimator(BaseEstimator):
    def __init__(self):
        pass

    def fit(self, X, y=None):
        X, y = check_X_y(
            X,
            y,
            accept_sparse = False,
            accept_large_sparse = False,
            ensure_min_features = 3,
            force_all_finite = 'allow-nan')
        return self

    def predict(self, X):
        X  = check_array(
            X,
            accept_sparse = False,
            accept_large_sparse = False,
            ensure_min_features = 3,
            force_all_finite = 'allow-nan')
        
        # very dumb example but it does not matter
        y_pred = X[:,0] + X[:,1] + X[:,3]
        
        y_pred  = check_array(
            y_pred,
            accept_sparse = False,
            accept_large_sparse = False,
            ensure_min_features = 3,
            ensure_2d = False,
            force_all_finite = 'allow-nan')
        
        return y_pred

Expected Results

Passing all tests.

Actual Results

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 check_estimator(DummyEstimator())

File ~/miniforge3/lib/python3.9/site-packages/sklearn/utils/estimator_checks.py:623, in check_estimator(estimator, generate_only, Estimator)
    621 for estimator, check in checks_generator():
    622     try:
--> 623         check(estimator)
    624     except SkipTest as exception:
    625         # SkipTest is thrown when pandas can't be imported, or by checks
    626         # that are in the xfail_checks tag
    627         warnings.warn(str(exception), SkipTestWarning)

File ~/miniforge3/lib/python3.9/site-packages/sklearn/utils/_testing.py:318, in _IgnoreWarnings.__call__.<locals>.wrapper(*args, **kwargs)
    316 with warnings.catch_warnings():
    317     warnings.simplefilter("ignore", self.category)
--> 318     return fn(*args, **kwargs)

File ~/miniforge3/lib/python3.9/site-packages/sklearn/utils/estimator_checks.py:2595, in check_estimators_fit_returns_self(name, estimator_orig, readonly_memmap)
   2592     X, y = create_memmap_backed_data([X, y])
   2594 set_random_state(estimator)
-> 2595 assert estimator.fit(X, y) is estimator

File ~/Documents/VOIDEV/tandem-riding-detection-service/service/source/models/physics/_base.py:15, in DummyEstimator.fit(self, X, y)
     14 def fit(self, X, y=None):
---> 15     X, y = check_X_y(
     16         X,
     17         y,
     18         accept_sparse = False,
     19         accept_large_sparse = False,
     20         ensure_min_features = 3,
     21         force_all_finite = 'allow-nan')
     22     return self

File ~/miniforge3/lib/python3.9/site-packages/sklearn/utils/validation.py:1074, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
   1069         estimator_name = _check_estimator_name(estimator)
   1070     raise ValueError(
   1071         f"{estimator_name} requires y to be passed, but the target y is None"
   1072     )
-> 1074 X = check_array(
   1075     X,
   1076     accept_sparse=accept_sparse,
   1077     accept_large_sparse=accept_large_sparse,
   1078     dtype=dtype,
   1079     order=order,
   1080     copy=copy,
   1081     force_all_finite=force_all_finite,
   1082     ensure_2d=ensure_2d,
   1083     allow_nd=allow_nd,
   1084     ensure_min_samples=ensure_min_samples,
   1085     ensure_min_features=ensure_min_features,
   1086     estimator=estimator,
   1087     input_name="X",
   1088 )
   1090 y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric, estimator=estimator)
   1092 check_consistent_length(X, y)

File ~/miniforge3/lib/python3.9/site-packages/sklearn/utils/validation.py:918, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
    916     n_features = array.shape[1]
    917     if n_features < ensure_min_features:
--> 918         raise ValueError(
    919             "Found array with %d feature(s) (shape=%s) while"
    920             " a minimum of %d is required%s."
    921             % (n_features, array.shape, ensure_min_features, context)
    922         )
    924 if copy and np.may_share_memory(array, array_orig):
    925     array = np.array(array, dtype=dtype, order=order)

ValueError: Found array with 2 feature(s) (shape=(21, 2)) while a minimum of 3 is required.

Versions

System:
    python: 3.9.7 (default, Sep 16 2021, 23:53:23)  [Clang 12.0.0 ]
executable: /Users/alexandervialabellander/miniforge3/bin/python
   machine: macOS-12.2.1-arm64-arm-64bit

Python dependencies:
      sklearn: 1.1.1
          pip: 21.2.4
   setuptools: 58.0.4
        numpy: 1.22.3
        scipy: 1.7.3
       Cython: None
       pandas: 1.3.5
   matplotlib: 3.5.0
       joblib: 1.1.0
threadpoolctl: 2.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
         prefix: libomp
       filepath: /Users/alexandervialabellander/miniforge3/lib/python3.9/site-packages/sklearn/.dylibs/libomp.dylib
        version: None
    num_threads: 8

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /Users/alexandervialabellander/miniforge3/lib/python3.9/site-packages/numpy/.dylibs/libopenblas64_.0.dylib
        version: 0.3.18
threading_layer: pthreads
   architecture: armv8
    num_threads: 8

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /Users/alexandervialabellander/miniforge3/lib/libopenblasp-r0.3.17.dylib
        version: 0.3.17
threading_layer: pthreads
   architecture: armv8
    num_threads: 8
@AlexVialaBellander AlexVialaBellander added Bug Needs Triage Issue requires triage labels Jul 12, 2022
@thomasjpfan thomasjpfan added module:test-suite everything related to our tests and removed Needs Triage Issue requires triage labels Jul 19, 2022
@thomasjpfan
Copy link
Member

I agree this is an issue with check_estimator and is related to the strictness of estimator checks: #16241. The test suite needs a refactor so it can accommodate for estimators with restrictions such as ensure_min_features=3 in your case.

@adrinjalali adrinjalali added the Developer API Third party developer API related label Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Developer API Third party developer API related module:test-suite everything related to our tests
Projects
None yet
Development

No branches or pull requests

3 participants