preprocessing.scale provides consistent results on arrays with zero variance

I'm using Python 2.7, NumPy 1.8.2 and scikit-learn 0.14.1 on x64 linux (all installed through Anaconda) and getting very inconsistent results for preprocessing.scale function:

> print preprocessing.scale(np.zeros(6) + np.log(1e-5))
> [ 0.  0.  0.  0.  0.  0.]
> 
> print preprocessing.scale(np.zeros(8) + np.log(1e-5))
> [-1. -1. -1. -1. -1. -1. -1. -1.]
> 
> print preprocessing.scale(np.zeros(22) + np.log(1e-5))
> [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]

I would guess this is not is supposed to be happening. Quick investigation, points to the fact that np.std() of second and third array is not exactly zero, but very close to machine zero. sklearn still uses it to divide data (it doesn't go into the "std == 0.0" case in the code).

Note that in the case of the array, this can be easily fixed by passing with_std=False, but when that happens for one of the many features in 2D matrix this is not an option.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

preprocessing.scale provides consistent results on arrays with zero variance #3722

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

preprocessing.scale provides consistent results on arrays with zero variance #3722

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions