Skip to content

Consistency issue in StandardScaler #11234

Closed
@glemaitre

Description

@glemaitre

There is an issue of consistency with StandardScaler with with_mean=False and with_std=False between the sparse and dense case.

  1. Does it make sense to support this case. It will return the identity matrix which is not the use case for the StandardScaler. If we wish a transformer to do so, one should use the FunctionTransformer I assume.

  2. If we consider this behaviour normal, we need to:

    • In the dense case, force self.mean_ to be None after each iteration of partial_fit.
    • In the sparse case, compute the non-NaNs values and update self.n_samples_seen_ which is not computed. It leads currently to an error if calling twice fit (i.e. del self.n_samples_seen_ will fail).

IMO, we should make a checking at fit raise an error.

@jnothman @ogrisel WDYT?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions