Support nullable pandas dtypes in `unique_labels` #25634

tamargrey · 2023-02-17T19:07:11Z

Describe the workflow you want to enable

I would like to be able to pass the nullable pandas dtypes ("Int64", "Float64", "boolean") into sklearn's unique_labels function. Because the dtypes become object dtype when converted to numpy arrays we get ValueError: Mix type of y not allowed, got types {'binary', 'unknown'}:

Repro with sklearn 1.2.1

    import pandas as pd
    import pytest
    from sklearn.utils.multiclass import unique_labels
    
    for dtype in ["Int64", "Float64", "boolean"]:
        y_true = pd.Series([1, 0, 0, 1, 0, 1, 1, 0, 1], dtype=dtype)
        y_predicted = pd.Series([0, 0, 1, 1, 0, 1, 1, 1, 1], dtype="int64")

        with pytest.raises(ValueError, match="Mix type of y not allowed, got types"):
            unique_labels(y_true, y_predicted)

Describe your proposed solution

We should get the same behavior as when int64, float64, and bool dtypes are used, which is no error:

    import pandas as pd
    from sklearn.utils.multiclass import unique_labels
    
    for dtype in ["int64", "float64", "bool"]:
        y_true = pd.Series([1, 0, 0, 1, 0, 1, 1, 0, 1], dtype=dtype)
        y_predicted = pd.Series([0, 0, 1, 1, 0, 1, 1, 1, 1], dtype="int64")

        unique_labels(y_true, y_predicted)

Describe alternatives you've considered, if relevant

Our current workaround is to convert the data to numpy arrays with the corresponding dtype that works prior to passing it into unique_labels.

Additional context

No response

The text was updated successfully, but these errors were encountered:

thomasjpfan · 2023-02-17T22:20:51Z

Thank you for opening the issue! The underlying issue comes from how pandas exposes __array__ to be object dtypes with nullable dtypes:

import pandas as pd

y_true = pd.Series([1, 0, 0, 1, 0, 1, 1, 0, 1], dtype="Int64")
print(np.asarray(y_true).dtype)
# object

y_pred = pd.Series([1, 0, 0, 1, 0, 1, 1, 0, 1], dtype="int64")
print(np.asarray(y_pred).dtype)
# int64

Scikit-learn already has check_array that handles converting pandas nullable dtypes into their corresponding NumPy dytpes, so I opened #25638 to use check_array to fix this issue.

tamargrey added Needs Triage Issue requires triage New Feature labels Feb 17, 2023

tamargrey mentioned this issue Feb 17, 2023

confusion_matrix: Remove nullable type handling when possible alteryx/evalml#4020

Closed

thomasjpfan mentioned this issue Feb 17, 2023

ENH Allows target to be pandas nullable dtypes #25638

Merged

thomasjpfan added Pandas compatibility and removed Needs Triage Issue requires triage labels Feb 17, 2023

This was referenced Feb 17, 2023

Support nullable pandas dtypes in LabelBinarizer #25637

Closed

Support nullable pandas dtypes in confusion_matrix #25635

Closed

lorentzenchr closed this as completed in #25638 Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support nullable pandas dtypes in `unique_labels` #25634

Support nullable pandas dtypes in `unique_labels` #25634

tamargrey commented Feb 17, 2023

thomasjpfan commented Feb 17, 2023

Support nullable pandas dtypes in unique_labels #25634

Support nullable pandas dtypes in unique_labels #25634

Comments

tamargrey commented Feb 17, 2023

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

thomasjpfan commented Feb 17, 2023

Support nullable pandas dtypes in `unique_labels` #25634

Support nullable pandas dtypes in `unique_labels` #25634