Skip to content

check_array unexpectedly upcasts numeric types in pandas Series #25145

Closed
@benfogelson

Description

@benfogelson

Describe the bug

This is an unexpected (and I would argue undesirable) behavior change introduced in 1.2.0 by #25080

The issue is that check_array applied to a pandas series of dtype bool upcasts the returned series to dtype float64. I would guess that there is related upcasting behavior for other numeric dtypes. This is a change from version 1.1.3 with the potential to cause unexpected downstream failures (I found it because I tried to use the invert operator ~ on the series returned by check_array, which works for bool but not float64).

Steps/Code to Reproduce

from sklearn.utils import check_array
import pandas as pd

ser = pd.Series([False, True])

Expected Results

I would expect the dtype to be preserved (it is preserved in 1.1.3)

> print(check_array(ser, ensure_2d=False, force_all_finite=False, dtype=None).dtype)
bool

Actual Results

The series is upcast from bool to float64:

> print(check_array(ser, ensure_2d=False, force_all_finite=False, dtype=None).dtype)
float64

Versions

System:
    python: 3.9.14 (main, Oct 14 2022, 16:22:46)  [Clang 14.0.0 (clang-1400.0.29.102)]
executable: /Users/ben.fogelson/.pyenv/versions/sklearn-bug/bin/python
   machine: macOS-12.6.1-x86_64-i386-64bit

Python dependencies:
      sklearn: 1.2.0
          pip: 22.3.1
   setuptools: 65.6.3
        numpy: 1.23.5
        scipy: 1.9.3
       Cython: None
       pandas: 1.5.2
   matplotlib: None
       joblib: 1.2.0
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
         prefix: libomp
       filepath: /Users/ben.fogelson/.pyenv/versions/3.9.14/envs/sklearn-bug/lib/python3.9/site-packages/sklearn/.dylibs/libomp.dylib
        version: None
    num_threads: 12

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /Users/ben.fogelson/.pyenv/versions/3.9.14/envs/sklearn-bug/lib/python3.9/site-packages/numpy/.dylibs/libopenblas64_.0.dylib
        version: 0.3.20
threading_layer: pthreads
   architecture: Haswell
    num_threads: 6

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /Users/ben.fogelson/.pyenv/versions/3.9.14/envs/sklearn-bug/lib/python3.9/site-packages/scipy/.dylibs/libopenblas.0.dylib
        version: 0.3.18
threading_layer: pthreads
   architecture: Haswell
    num_threads: 6

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions