Skip to content

MNT Ignore pandas 2.2 warnings including Pyarrow dependency warnings #28305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jan 31, 2024

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Jan 29, 2024

Maybe slightly simpler alternative to #28258. I think I found a way to ignore the warning in sklearn/conftest.py. This is a bit hacky (yet another place for warning handling on top of setup.cfg and CI test_script.sh), but this may be a reasonable trade-off in the short to medium term until the pandas Pyarrow dependency is a bit more clear.

@lesteve lesteve changed the title MNT Ignore pandas Pyarrow DeprecationWarning in sklearn/conftest.py MNT Yet another way to ignore pandas Pyarrow DeprecationWarning Jan 29, 2024
Copy link

github-actions bot commented Jan 29, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: f5ec234. Link to the linter CI: here

@adrinjalali
Copy link
Member

This is spreading warning ignore code even more in different places. I like that in #28258 they're all moving to the same place.

@lesteve
Copy link
Member Author

lesteve commented Jan 29, 2024

Yes completely agreed on yet another place for handling warnings, I was editing my message at the same time 😉.

IMO we spent already way too much time for something that should have been easy (ignoring a warning).

I am kind of hoping that the pandas situation improves in the short-term, either they remove the newline in the warning which makes it easy to ignore with -W or they go back on requiring Pyarrow, or something else.

@adrinjalali
Copy link
Member

Yes, but I really don't want to add more tech-debt here for warnings.

@adrinjalali
Copy link
Member

wanna push to my branch instead? we're doing the same thing it seems 😁 and I'm -1 on ignoring the warning this way.

@lesteve
Copy link
Member Author

lesteve commented Jan 29, 2024

Personally I would avoid running locally with warnings turned into errors. It could lead to contributors (including us) finding some combination of packages versions that create warnings and us to chase not so important things (as we are doing right now IMO 😉) ...

I would prioritise unlocking the situation to get to up-to-date lock-files with a green CI and that issues can be solved separately and things can be cleaned-up soonish

  • pandas 2.2 warnings (actually this was done in this PR)
  • pytest 8 behaviour changes
  • python 3.12 datetime.datetime warnings (I think you are seeing them in your branch somehow, this has been known locally too for some time)
  • having cython 3.0.8 as the min version

@adrinjalali
Copy link
Member

Everytime we do a "quick fix" on the CI, next time it takes longer, and the bus factor goes lower. Our CI is already quite complicated, and I'm trying to make it a bit easier for all of us to understand it and replicate it locally.

@lesteve
Copy link
Member Author

lesteve commented Jan 29, 2024

I don't know about this, I am cleaning lock-files regularly for temporary work-arounds, maybe I am the only one that cares but I still do it 😉 ...

@lesteve
Copy link
Member Author

lesteve commented Jan 29, 2024

I'm trying to make it a bit easier for all of us to understand it and replicate it locally.

Yeah, I agree that this is definitely something we want!

@lesteve
Copy link
Member Author

lesteve commented Jan 29, 2024

In the meantime, CI is green 🎉. I'd try to look at #28258 to see what can be back-ported, it seems at least some tweaks in examples to avoid pandas 2.2 warnings.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm -1 on this as a permanent solution, but happy to merge this, and try to fix #28258 tomorrow and merge that one. And I don't think the pandas issue is going away soon.

@@ -80,7 +80,10 @@

docstring_test_dependencies = ["sphinx", "numpydoc"]

default_package_constraints = {}
default_package_constraints = {
# XXX: Temporary work-around for pytest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to an issue for what's happening?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added with reference to Pytest PR. IMO this change makes sense, we need to tweak slightly our tests to take it into account.

Comment on lines +107 to +112
X["weather"] = (
X["weather"]
.astype(object)
.replace(to_replace="heavy_rain", value="rain")
.astype("category")
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or

Suggested change
X["weather"] = (
X["weather"]
.astype(object)
.replace(to_replace="heavy_rain", value="rain")
.astype("category")
)
X.loc[X["weather"] == "heavy_rain", "col"] = "rain"
X["weather"] = X["weather"].astype("category").cat.remove_unused_categories()

but don't know which one's better

Copy link
Member Author

@lesteve lesteve Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like mine, but I haven't found anything better after looking for 20 minutes on how to merge two categorical values ...

I find the flow is more readable somehow than your suggestion. I agree it is not super clear why you need .astype(object) and then .astype('category'). If you want to know: avoid yet another pandas warning, replace is not supposed to change categories.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I found this which maybe is a bit simpler?

X["weather"] = X["weather"].cat.remove_categories("heavy_rain").fillna("rain")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are already missing values, this removes them though. So I don't think we'd want to encourage the pattern in our examples.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm good point actually ... I reverted this commit and went with my first proposal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lesteve lesteve changed the title MNT Yet another way to ignore pandas Pyarrow DeprecationWarning MNT ignore pandas 2.2 warnings, including Pyarrow dependency warnings Jan 30, 2024
@lesteve
Copy link
Member Author

lesteve commented Jan 30, 2024

FYI, I have some local fixes for Pytest 8, and I will open a PR with them and the lock-file update, when this PR gets merged.

@lesteve lesteve changed the title MNT ignore pandas 2.2 warnings, including Pyarrow dependency warnings MNT Ignore pandas 2.2 warnings, including Pyarrow dependency warnings Jan 30, 2024
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this, and then we need to figure out why the debian run on my PR fails.

@lesteve lesteve changed the title MNT Ignore pandas 2.2 warnings, including Pyarrow dependency warnings MNT Ignore pandas 2.2 warnings including Pyarrow dependency warnings Jan 31, 2024
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with this version because it is green.

@glemaitre glemaitre merged commit bb87768 into scikit-learn:main Jan 31, 2024
@glemaitre
Copy link
Member

Thanks @lesteve

@lesteve lesteve deleted the ignore-pandas-pyarrow-warning branch January 31, 2024 11:00
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Feb 10, 2024
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Feb 13, 2024
glemaitre pushed a commit that referenced this pull request Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants