Skip to content

Support nullable pandas dtypes in confusion_matrix #25635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tamargrey opened this issue Feb 17, 2023 · 1 comment · Fixed by #25638
Closed

Support nullable pandas dtypes in confusion_matrix #25635

tamargrey opened this issue Feb 17, 2023 · 1 comment · Fixed by #25638

Comments

@tamargrey
Copy link

Describe the workflow you want to enable

I would like to be able to pass the nullable pandas dtypes ("Int64", "Float64", "boolean") into sklearn's confusion_matrix function. Because the dtypes become object dtype when converted to numpy arrays we get ValueError: Classification metrics can't handle a mix of unknown and binary targets:

Repro with sklearn 1.2.1:

    import pandas as pd
    import pytest
    from sklearn.metrics import confusion_matrix

    for dtype in ["Int64", "Float64", "boolean"]:
        y_true = pd.Series([1, 0, 0, 1, 0, 1, 1, 0, 1], dtype=dtype)
        y_predicted = pd.Series([0, 0, 1, 1, 0, 1, 1, 1, 1], dtype="int64")

        with pytest.raises(ValueError, match="Classification metrics can't handle a mix of unknown and binary targets"):
            confusion_matrix(y_true, y_predicted)

Describe your proposed solution

We should get the same behavior as when int64, float64, and bool dtypes are used, which is no error:

    import pandas as pd
    from sklearn.metrics import confusion_matrix

    for dtype in ["int64", "float64", "bool"]:
        y_true = pd.Series([1, 0, 0, 1, 0, 1, 1, 0, 1], dtype=dtype)
        y_predicted = pd.Series([0, 0, 1, 1, 0, 1, 1, 1, 1], dtype="int64")

        confusion_matrix(y_true, y_predicted)

Describe alternatives you've considered, if relevant

Our current workaround is to convert the data to numpy arrays with the corresponding dtype that works prior to passing it into confusion_matrix

Additional context

No response

@thomasjpfan
Copy link
Member

As noted in #25634 (comment), I opened #25638 to resolve this issue.

@thomasjpfan thomasjpfan added Pandas compatibility and removed Needs Triage Issue requires triage labels Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants