-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX Do not use deprecated API in fetch_20newsgroups_vectorized #21216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Do not use deprecated API in fetch_20newsgroups_vectorized #21216
Conversation
"col1": pd.arrays.SparseArray([0, 1, 0], dtype=ntype1, fill_value=0), | ||
"col2": pd.arrays.SparseArray([1, 0, 1], dtype=ntype2, fill_value=0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are causing issues on pandas because the fill value is different depending on the dtype.
And this is raising a ValueError
on the dev version of pandas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does your change then fix the issue? Can't tell from your explanation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would guess that this fixes the issue but to double-check I pushed an empty commit with [scipy-dev]
to trigger the scipy-dev nightly build in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea it fixes the original issue.
There were two issues. With the fix to fetch_20newsgroups_vectorized
, it alloed the tests to run, which revealed the pandas issue.
Also @thomasjpfan I am quite curious how you made the connection between the original cryptic pytest error and |
Turned off pytest-xdist by setting |
…t-learn#21216) * FIX Do not use deprecated API in fetch_20newsgroups_vectorized * BLD [scipy-dev] * TST Be explicit about fill value [scipy-dev] * TST Fixes tests for fill value * [scipy-dev] trigger nightly build Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
* FIX Do not use deprecated API in fetch_20newsgroups_vectorized * BLD [scipy-dev] * TST Be explicit about fill value [scipy-dev] * TST Fixes tests for fill value * [scipy-dev] trigger nightly build Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
…t-learn#21216) * FIX Do not use deprecated API in fetch_20newsgroups_vectorized * BLD [scipy-dev] * TST Be explicit about fill value [scipy-dev] * TST Fixes tests for fill value * [scipy-dev] trigger nightly build Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Reference Issues/PRs
Fixes #21212
What does this implement/fix? Explain your changes.
Errored during test collection time because of the deprecation warning.
Any other comments?
During test collection time and
SKLEARN_SKIP_NETWORK_TESTS='0'
, datasets are downloaded first to cache them. This is done because when multiple processes callfetch_*
they can both write to the same location on the filesystem, which use to cause errors.