Skip to content

MaxAbsScaler Upcasts Pandas to float64 #15093

Closed
@danhitchcock

Description

@danhitchcock

Description

I am working with the Column transformer, and for memory issues, am trying to produce a float32 sparse matrix. Unfortunately, regardless of pandas input type, the output is always float64.

I've identified one of the Pipeline scalers, MaxAbsScaler, as being the culprit. Other preprocessing functions, such as OneHotEncoder, have an optional dtype argument. This argument does not exist in MaxAbsScaler (among others). It appears that the upcasting happens when check_array is executed.

Is it possible to specify a dtype? Or is there a commonly accepted practice to do so from the Column Transformer?

Thank you!

Steps/Code to Reproduce

Example:

import pandas as pd
from sklearn.preprocessing import MaxAbsScaler

df = pd.DataFrame({
    'DOW': [0, 1, 2, 3, 4, 5, 6],
    'Month': [3, 2, 4, 3, 2, 6, 7],
    'Value': [3.4, 4., 8, 5, 3, 6, 4]
})
df = df.astype('float32')
print(df.dtypes)
a = MaxAbsScaler()
scaled = a.fit_transform(df) # providing df.values will produce correct response
print('Transformed Type: ', scaled.dtype)

Expected Results

DOW      float32
Month    float32
Value    float32
dtype: object
Transformed Type: float32

Actual Results

DOW      float32
Month    float32
Value    float32
dtype: object
Transformed Type: float64

Versions

Darwin-18.7.0-x86_64-i386-64bit
Python 3.6.7 | packaged by conda-forge | (default, Jul 2 2019, 02:07:37)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.17.1
SciPy 1.3.1
Scikit-Learn 0.20.3
Pandas 0.25.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions