StandardScaler is `stateless` #30840

benHeid · 2025-02-15T18:58:02Z

Describe the bug

The StandardScaler seems to be stateless in version 1.6.1. But fit changes the state of the StandardScaler if I got it correctly.

Steps/Code to Reproduce

StandardScaler()._get_tags()["stateless"]

Expected Results

False

Actual Results

True

Versions

System:
    python: 3.10.14 (main, Jul 18 2024, 22:40:44) [Clang 15.0.0 (clang-1500.1.0.2.5)]
executable: ****/python
   machine: macOS-15.2-arm64-arm-64bit

Python dependencies:
      sklearn: 1.6.1
          pip: 24.1.2
   setuptools: 71.0.3
        numpy: 1.26.4
        scipy: 1.13.1
       Cython: 3.0.11
       pandas: 2.2.3
   matplotlib: 3.9.2
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: ****.dylib
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: armv8

       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: *****
        version: 0.3.27
threading_layer: pthreads
   architecture: neoversen1

       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libomp
       filepath: *****
        version: None

The text was updated successfully, but these errors were encountered:

StefanieSenger · 2025-02-16T07:35:39Z

Hello @benHeid,

thanks for reporting.

I have checked and would also think it is a bug. This code returned False up until version 1.5.2 and since version 1.6 it does return True:

from sklearn.preprocessing import StandardScaler, MinMaxScaler
print(StandardScaler()._get_tags()["stateless"])
print(MinMaxScaler()._get_tags()["stateless"])

As a workaround please use the "requires_fit" tag, which is supposed to replace the "stateless" tag.

The issue is related to #30327.

glemaitre · 2025-02-16T09:19:25Z

As a workaround please use the "requires_fit" tag

Actually it is the right way to do with the new tag infrastructure

from sklearn.utils import get_tags
from sklearn.preprocessing import StandardScaler

get_tags(StandardScaler()).requires_fit

And indeed, there is a bug with the conversion if the old tag infrastructure that we need to solve.

glemaitre · 2025-02-16T09:27:03Z

So the bug is here:

https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/_tags.py#L590

We should change to:

        "stateless": not new_tags.requires_fit,

StefanieSenger · 2025-02-18T10:55:09Z

@EmilyXinyi, would you like to take care of that? Don't feel obliged though, only if you like.

EmilyXinyi · 2025-02-24T12:47:11Z

Hi @StefanieSenger thanks for flagging! I can probably take care of this next weekend, but if it's an urgent change that needs to go out before then, I most likely won't have time during the work week.
(I will come back to this issue next weekend to check for updates and possible put in a fix)

EmilyXinyi · 2025-03-03T01:19:56Z

I just put in a PR for a fix, but out of curiosity, what is the purpose of this tag thing? I have not used this functionality before, I can't seem to understand it very well just by reading the associated code, so if someone could please enlighten me that would be greatly appreciated 😸

glemaitre · 2025-03-03T07:57:25Z

Maybe the documentation can help: https://scikit-learn.org/stable/developers/develop.html#estimator-tags

In short, the first use of it was to give some information regarding the capabilities of an estimator and it was used in the common tests to know whether an estimator is compatible with the scikit-learn API.

Here, the stateless tag enforce a specific behaviour of the estimator that is specifically tested in the common test: an estimator can be used by calling transform without a previous call to fit.

Nowadays, we sometime use the tag in the source code as well (not only in test).

glemaitre · 2025-05-05T16:27:59Z

Closing since we end the deprecation cycle and it will be fixed in the up coming 1.7 release.

benHeid added Bug Needs Triage Issue requires triage labels Feb 15, 2025

glemaitre removed the Needs Triage Issue requires triage label Feb 16, 2025

ogrisel added the Regression label Feb 24, 2025

EmilyXinyi mentioned this issue Mar 3, 2025

FIX stateless tag default value for estimator. #30925

Closed

glemaitre closed this as completed May 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StandardScaler is `stateless` #30840

StandardScaler is `stateless` #30840

benHeid commented Feb 15, 2025

StefanieSenger commented Feb 16, 2025

glemaitre commented Feb 16, 2025

glemaitre commented Feb 16, 2025

StefanieSenger commented Feb 18, 2025

EmilyXinyi commented Feb 24, 2025

EmilyXinyi commented Mar 3, 2025

glemaitre commented Mar 3, 2025

glemaitre commented May 5, 2025

StandardScaler is stateless #30840

StandardScaler is stateless #30840

Comments

benHeid commented Feb 15, 2025

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

StefanieSenger commented Feb 16, 2025

glemaitre commented Feb 16, 2025

glemaitre commented Feb 16, 2025

StefanieSenger commented Feb 18, 2025

EmilyXinyi commented Feb 24, 2025

EmilyXinyi commented Mar 3, 2025

glemaitre commented Mar 3, 2025

glemaitre commented May 5, 2025

StandardScaler is `stateless` #30840

StandardScaler is `stateless` #30840