Skip to content

FIX Convert boolean pd.Series to boolean ndarrays #25147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 12, 2022

Conversation

betatim
Copy link
Member

@betatim betatim commented Dec 9, 2022

For types that are not pandas extension dtypes, we should ask numpy to tell us the best dtype, so that we preserve the behaviour of boolean Series being converted to boolean arrays.

The story is a bit confused by categorical dtypes :-/ So while this fixes the regression and doesn't break any existing tests in test_validation.py, it feels like we are adding a layer on top of several layers of "fixes" and exceptions in the conversion logic. Ideas welcome.

Closes #25145

For types that are not pandas extension dtypes, we should ask numpy to
tell us the best dtype, so that we preserve the behaviour of boolean
Series being converted to boolean arrays.
@betatim betatim changed the title Convert boolean pd.Series to boolean ndarrays FIX Convert boolean pd.Series to boolean ndarrays Dec 9, 2022
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! This needs a whats_new entry in 1.2.1.

@thomasjpfan thomasjpfan added this to the 1.2.1 milestone Dec 9, 2022
@betatim
Copy link
Member Author

betatim commented Dec 9, 2022

Comments done and what's new added

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update! LGTM

@betatim betatim added the Waiting for Second Reviewer First reviewer is done, need a second one! label Dec 9, 2022
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is complex but I am not sure how to do better. Also it matches what the docstring says for dtype="numeric" which is the default so LGTM.

Thanks for the fix.

@ogrisel ogrisel merged commit a576bcc into scikit-learn:main Dec 12, 2022
@ogrisel ogrisel added To backport PR merged in master that need a backport to a release branch defined based on the milestone. and removed Waiting for Second Reviewer First reviewer is done, need a second one! labels Dec 12, 2022
Vincent-Maladiere pushed a commit to Vincent-Maladiere/scikit-learn that referenced this pull request Dec 14, 2022
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 3, 2023
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:utils To backport PR merged in master that need a backport to a release branch defined based on the milestone.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

check_array unexpectedly upcasts numeric types in pandas Series
3 participants