Skip to content

StandardScaler is stateless #30840

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
benHeid opened this issue Feb 15, 2025 · 8 comments
Closed

StandardScaler is stateless #30840

benHeid opened this issue Feb 15, 2025 · 8 comments

Comments

@benHeid
Copy link

benHeid commented Feb 15, 2025

Describe the bug

The StandardScaler seems to be stateless in version 1.6.1. But fit changes the state of the StandardScaler if I got it correctly.

Steps/Code to Reproduce

StandardScaler()._get_tags()["stateless"]

Expected Results

False

Actual Results

True

Versions

System:
    python: 3.10.14 (main, Jul 18 2024, 22:40:44) [Clang 15.0.0 (clang-1500.1.0.2.5)]
executable: ****/python
   machine: macOS-15.2-arm64-arm-64bit

Python dependencies:
      sklearn: 1.6.1
          pip: 24.1.2
   setuptools: 71.0.3
        numpy: 1.26.4
        scipy: 1.13.1
       Cython: 3.0.11
       pandas: 2.2.3
   matplotlib: 3.9.2
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: ****.dylib
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: armv8

       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: *****
        version: 0.3.27
threading_layer: pthreads
   architecture: neoversen1

       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libomp
       filepath: *****
        version: None
@benHeid benHeid added Bug Needs Triage Issue requires triage labels Feb 15, 2025
@StefanieSenger
Copy link
Contributor

Hello @benHeid,

thanks for reporting.

I have checked and would also think it is a bug. This code returned False up until version 1.5.2 and since version 1.6 it does return True:

from sklearn.preprocessing import StandardScaler, MinMaxScaler
print(StandardScaler()._get_tags()["stateless"])
print(MinMaxScaler()._get_tags()["stateless"])

As a workaround please use the "requires_fit" tag, which is supposed to replace the "stateless" tag.

The issue is related to #30327.

@glemaitre
Copy link
Member

As a workaround please use the "requires_fit" tag

Actually it is the right way to do with the new tag infrastructure

from sklearn.utils import get_tags
from sklearn.preprocessing import StandardScaler

get_tags(StandardScaler()).requires_fit

And indeed, there is a bug with the conversion if the old tag infrastructure that we need to solve.

@glemaitre
Copy link
Member

So the bug is here:

https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/_tags.py#L590

We should change to:

        "stateless": not new_tags.requires_fit,

@StefanieSenger
Copy link
Contributor

@EmilyXinyi, would you like to take care of that? Don't feel obliged though, only if you like.

@EmilyXinyi
Copy link
Contributor

Hi @StefanieSenger thanks for flagging! I can probably take care of this next weekend, but if it's an urgent change that needs to go out before then, I most likely won't have time during the work week.
(I will come back to this issue next weekend to check for updates and possible put in a fix)

@EmilyXinyi
Copy link
Contributor

I just put in a PR for a fix, but out of curiosity, what is the purpose of this tag thing? I have not used this functionality before, I can't seem to understand it very well just by reading the associated code, so if someone could please enlighten me that would be greatly appreciated 😸

@glemaitre
Copy link
Member

Maybe the documentation can help: https://scikit-learn.org/stable/developers/develop.html#estimator-tags

In short, the first use of it was to give some information regarding the capabilities of an estimator and it was used in the common tests to know whether an estimator is compatible with the scikit-learn API.

Here, the stateless tag enforce a specific behaviour of the estimator that is specifically tested in the common test: an estimator can be used by calling transform without a previous call to fit.

Nowadays, we sometime use the tag in the source code as well (not only in test).

@glemaitre
Copy link
Member

Closing since we end the deprecation cycle and it will be fixed in the up coming 1.7 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants