Skip to content

FIX Do not use deprecated API in fetch_20newsgroups_vectorized #21216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Oct 5, 2021

Conversation

thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Oct 1, 2021

Reference Issues/PRs

Fixes #21212

What does this implement/fix? Explain your changes.

Errored during test collection time because of the deprecation warning.

Any other comments?

During test collection time and SKLEARN_SKIP_NETWORK_TESTS='0', datasets are downloaded first to cache them. This is done because when multiple processes call fetch_* they can both write to the same location on the filesystem, which use to cause errors.

@thomasjpfan thomasjpfan added this to the 1.0.1 milestone Oct 1, 2021
Comment on lines +1417 to +1418
"col1": pd.arrays.SparseArray([0, 1, 0], dtype=ntype1, fill_value=0),
"col2": pd.arrays.SparseArray([1, 0, 1], dtype=ntype2, fill_value=0),
Copy link
Member Author

@thomasjpfan thomasjpfan Oct 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are causing issues on pandas because the fill value is different depending on the dtype.

https://pandas.pydata.org/docs/reference/api/pandas.arrays.SparseArray.html#pandas-arrays-sparsearray

And this is raising a ValueError on the dev version of pandas

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does your change then fix the issue? Can't tell from your explanation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess that this fixes the issue but to double-check I pushed an empty commit with [scipy-dev] to trigger the scipy-dev nightly build in this PR.

Copy link
Member Author

@thomasjpfan thomasjpfan Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea it fixes the original issue.

There were two issues. With the fix to fetch_20newsgroups_vectorized, it alloed the tests to run, which revealed the pandas issue.

@lesteve
Copy link
Member

lesteve commented Oct 4, 2021

Also @thomasjpfan I am quite curious how you made the connection between the original cryptic pytest error and get_features_names ...

@thomasjpfan
Copy link
Member Author

thomasjpfan commented Oct 4, 2021

Also @thomasjpfan I am quite curious how you made the connection between the original cryptic pytest error and get_features_names ...

Turned off pytest-xdist by setting PYTEST_XDIST_VERSION="none" in azure-pipelines.yml. (It would be nice to have pytest-xdist show the original exception.)

@adrinjalali adrinjalali merged commit 6f91cbe into scikit-learn:main Oct 5, 2021
@glemaitre glemaitre mentioned this pull request Oct 23, 2021
10 tasks
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Oct 23, 2021
…t-learn#21216)

* FIX Do not use deprecated API in fetch_20newsgroups_vectorized

* BLD [scipy-dev]

* TST Be explicit about fill value [scipy-dev]

* TST Fixes tests for fill value

* [scipy-dev] trigger nightly build

Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
glemaitre pushed a commit that referenced this pull request Oct 25, 2021
* FIX Do not use deprecated API in fetch_20newsgroups_vectorized

* BLD [scipy-dev]

* TST Be explicit about fill value [scipy-dev]

* TST Fixes tests for fill value

* [scipy-dev] trigger nightly build

Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
…t-learn#21216)

* FIX Do not use deprecated API in fetch_20newsgroups_vectorized

* BLD [scipy-dev]

* TST Be explicit about fill value [scipy-dev]

* TST Fixes tests for fill value

* [scipy-dev] trigger nightly build

Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Linux_Nightly pylatest_pip_scipy_dev failure
3 participants