check_array
can call array.astype(None)
, raising ValueError if pandas extension types are present in a pd.DataFrame array
#25798
Labels
Describe the bug
At check_array,
dtype_orig
is determined forarray
objects that are pandas DataFrames by checkingall(isinstance(dtype_iter, np.dtype) for dtype_iter in dtypes_orig)
. This excludes the pandas nullable extension types such asboolean
,Int64
, andFloat64
, resulting in adtype_orig
ofNone
.If
pandas_requires_conversion
, then there ends up being a call toarray = array.astype(None)
, which pandas will take to mean a conversion tofloat64
should be attempted. If non numeric/boolean data is present inarray
, this can result in aValueError: could not convert string to float:
being raised if the data has theobject
dtype with string data orValueError: Cannot cast object dtype to float64
if the data has thecategory
dtype withobject
categories.I first found this in using the imblearn
SMOTEN
andSMOTENC
oversamplers, but this could happen from other uses ofcheck_array
.Steps/Code to Reproduce
Reproduction via oversamplers
Reproduction via check_array directly
Expected Results
We should get the same behavior that's seen with the non nullable equivalents
["bool", "int64", "float64"]
, which is no error.Actual Results
The actual results is a
ValueError: could not convert string to float:
being raised if the data has theobject
dtype with string data orValueError: Cannot cast object dtype to float64
if the data has thecategory
dtype withobject
categories.Versions
The text was updated successfully, but these errors were encountered: