Skip to content

TST mark test as xfail due to bug fix in pandas-dev #26344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

glemaitre
Copy link
Member

Partially address #26154

Solving the issue pointed out here: #26154 (comment)

In short, pandas will better infer type during DataFrame concatenation with missing values. Previously, due to the way we read by chunk in the liac-arff parser, we could end up with None and np.nan in the same column. The new version of pandas will identify both values are missing values.

Since the new behaviour is what one would expect but we cannot make a backport, a way is to mark the test as xfail.

@adrinjalali
Copy link
Member

So this means we won't be really supporting as_frame=True with parser="liac-arff", right? Should we then at least deprecated that usage?

@glemaitre
Copy link
Member Author

So this means we won't be really supporting as_frame=True with parser="liac-arff", right?

This does not change. Here, it is just that the number of column detected as numerical or categorical will changed depending if None will be map to a proper missing value (which was not in the passed).

@thomasjpfan
Copy link
Member

I think we can adjust the implementation to infer better dtypes. I opened #26386 as an alternative to this PR.

@lesteve
Copy link
Member

lesteve commented May 23, 2023

Closing since #26386 has been merged

@lesteve lesteve closed this May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants