Skip to content

preprocessing.scale provides consistent results on arrays with zero variance #3722

Closed
@maximsch2

Description

@maximsch2

I'm using Python 2.7, NumPy 1.8.2 and scikit-learn 0.14.1 on x64 linux (all installed through Anaconda) and getting very inconsistent results for preprocessing.scale function:

print preprocessing.scale(np.zeros(6) + np.log(1e-5))
[ 0. 0. 0. 0. 0. 0.]

print preprocessing.scale(np.zeros(8) + np.log(1e-5))
[-1. -1. -1. -1. -1. -1. -1. -1.]

print preprocessing.scale(np.zeros(22) + np.log(1e-5))
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

I would guess this is not is supposed to be happening. Quick investigation, points to the fact that np.std() of second and third array is not exactly zero, but very close to machine zero. sklearn still uses it to divide data (it doesn't go into the "std == 0.0" case in the code).

Note that in the case of the array, this can be easily fixed by passing with_std=False, but when that happens for one of the many features in 2D matrix this is not an option.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugEasyWell-defined and straightforward way to resolve

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions