Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StandardScaler is stateless #30840

Open
benHeid opened this issue Feb 15, 2025 · 4 comments
Open

StandardScaler is stateless #30840

benHeid opened this issue Feb 15, 2025 · 4 comments
Labels

Comments

@benHeid
Copy link

benHeid commented Feb 15, 2025

Describe the bug

The StandardScaler seems to be stateless in version 1.6.1. But fit changes the state of the StandardScaler if I got it correctly.

Steps/Code to Reproduce

StandardScaler()._get_tags()["stateless"]

Expected Results

False

Actual Results

True

Versions

System:
    python: 3.10.14 (main, Jul 18 2024, 22:40:44) [Clang 15.0.0 (clang-1500.1.0.2.5)]
executable: ****/python
   machine: macOS-15.2-arm64-arm-64bit

Python dependencies:
      sklearn: 1.6.1
          pip: 24.1.2
   setuptools: 71.0.3
        numpy: 1.26.4
        scipy: 1.13.1
       Cython: 3.0.11
       pandas: 2.2.3
   matplotlib: 3.9.2
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: ****.dylib
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: armv8

       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: *****
        version: 0.3.27
threading_layer: pthreads
   architecture: neoversen1

       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libomp
       filepath: *****
        version: None
@benHeid benHeid added Bug Needs Triage Issue requires triage labels Feb 15, 2025
@StefanieSenger
Copy link
Contributor

Hello @benHeid,

thanks for reporting.

I have checked and would also think it is a bug. This code returned False up until version 1.5.2 and since version 1.6 it does return True:

from sklearn.preprocessing import StandardScaler, MinMaxScaler
print(StandardScaler()._get_tags()["stateless"])
print(MinMaxScaler()._get_tags()["stateless"])

As a workaround please use the "requires_fit" tag, which is supposed to replace the "stateless" tag.

The issue is related to #30327.

@glemaitre
Copy link
Member

As a workaround please use the "requires_fit" tag

Actually it is the right way to do with the new tag infrastructure

from sklearn.utils import get_tags
from sklearn.preprocessing import StandardScaler

get_tags(StandardScaler()).requires_fit

And indeed, there is a bug with the conversion if the old tag infrastructure that we need to solve.

@glemaitre
Copy link
Member

So the bug is here:

https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/_tags.py#L590

We should change to:

        "stateless": not new_tags.requires_fit,

@StefanieSenger
Copy link
Contributor

@EmilyXinyi, would you like to take care of that? Don't feel obliged though, only if you like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants