StandardScaler fit overflows on float16 #13007

baluyotraf · 2019-01-18T00:07:44Z

Description

When using StandardScaler on a large float16 numpy array the mean and std calculation overflows. I can convert the array to a larger precision but when working with a larger dataset the memory saved by using float16 on smaller numbers kind of matter. The error is mostly on numpy. Adding the dtype on the mean/std calculation does it but I'm not sure if that how people here would like to do it.

Steps/Code to Reproduce

from sklearn.preprocessing import StandardScaler

sample = np.full([10_000_000, 1], 10.0, dtype=np.float16)
StandardScaler().fit_transform(sample)

Expected Results

The normalized array

Actual Results

/opt/conda/lib/python3.6/site-packages/numpy/core/_methods.py:36: RuntimeWarning: overflow encountered in reduce
  return umr_sum(a, axis, dtype, out, keepdims, initial)
/opt/conda/lib/python3.6/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/opt/conda/lib/python3.6/site-packages/numpy/core/_methods.py:36: RuntimeWarning: overflow encountered in reduce
  return umr_sum(a, axis, dtype, out, keepdims, initial)
/opt/conda/lib/python3.6/site-packages/sklearn/preprocessing/data.py:765: RuntimeWarning: invalid value encountered in true_divide
  X /= self.scale_

array([[nan],
       [nan],
       [nan],
       ...,
       [nan],
       [nan],
       [nan]], dtype=float16)

Versions

System:
    python: 3.6.6 |Anaconda, Inc.| (default, Oct  9 2018, 12:34:16)  [GCC 7.3.0]
executable: /opt/conda/bin/python
   machine: Linux-4.9.0-5-amd64-x86_64-with-debian-9.4

BLAS:
    macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
  lib_dirs: /opt/conda/lib
cblas_libs: mkl_rt, pthread

Python deps:
       pip: 18.1
setuptools: 39.1.0
   sklearn: 0.20.2
     numpy: 1.16.0
     scipy: 1.1.0
    Cython: 0.29.2
    pandas: 0.23.4

The text was updated successfully, but these errors were encountered:

jnothman · 2019-01-18T04:04:48Z

If adding dtype on the mean calculation is sufficient, that's probably a good idea. Pull request?

…umulators (scikit-learn#13007)

…#13007)

…tils.extmath. Also fixed some line lengths to fit the 80 limit (scikit-learn#13007)

…ator on the test (scikit-learn#13007)

…ult with respect to their precisions (scikit-learn#13007)

baluyotraf added a commit to baluyotraf/scikit-learn that referenced this issue Jan 18, 2019

Fixed overflows on float16 when working with operations involving acc…

8b7a240

…umulators (scikit-learn#13007)

baluyotraf added a commit to baluyotraf/scikit-learn that referenced this issue Jan 18, 2019

Fixed overflows on float16 when working with operations involving acc…

cdd00c3

…umulators (scikit-learn#13007)

baluyotraf mentioned this issue Jan 18, 2019

[MRG] Fix for float16 overflow on accumulator operations #13010

Merged

baluyotraf added a commit to baluyotraf/scikit-learn that referenced this issue Jan 19, 2019

Added test for checking StandardScaler float16 overflow (scikit-learn…

2af26a5

…#13007)

baluyotraf added a commit to baluyotraf/scikit-learn that referenced this issue Jan 19, 2019

Renamed safe_acc_op to _safe_accumulator_op and moved it to sklearn.u…

fd85cc2

…tils.extmath. Also fixed some line lengths to fit the 80 limit (scikit-learn#13007)

baluyotraf added a commit to baluyotraf/scikit-learn that referenced this issue Jan 20, 2019

Changed multilines to parentheses and removed underscore number separ…

0a39f2c

…ator on the test (scikit-learn#13007)

baluyotraf added a commit to baluyotraf/scikit-learn that referenced this issue Jan 21, 2019

Added a test to verify that both the float64 and float16 has same res…

2aee838

…ult with respect to their precisions (scikit-learn#13007)

rth closed this as completed in #13010 Jan 26, 2019

DeastinY mentioned this issue Aug 16, 2021

Scaling Issues sktime/pytorch-forecasting#528

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StandardScaler fit overflows on float16 #13007

StandardScaler fit overflows on float16 #13007

baluyotraf commented Jan 18, 2019 •

edited

Loading

jnothman commented Jan 18, 2019 via email

StandardScaler fit overflows on float16 #13007

StandardScaler fit overflows on float16 #13007

Comments

baluyotraf commented Jan 18, 2019 • edited Loading

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

jnothman commented Jan 18, 2019 via email

baluyotraf commented Jan 18, 2019 •

edited

Loading