TST Suppress multiple active versions of dataset warnings in `test_openml.py` #19373

mjkanji · 2021-02-06T10:26:15Z

Reference Issues/PRs

References #19349

What does this implement/fix? Explain your changes.

Suppresses UserWarning: Multiple active versions of the dataset ... exist warnings, based on @glemaitre's recommendation.

Any other comments?

Two other UserWarnings still remain.

UserWarning: Version 1 of dataset Australian is inactive, meaning that issues have been found in the dataset.

These stem from the call to fetch the Australian dataset in lines and 511 and 531.

reshamas · 2021-02-06T10:29:18Z

#DataUmbrella sprint

ogrisel · 2021-02-06T10:34:59Z

sklearn/datasets/tests/test_openml.py

+@pytest.mark.filterwarnings(
+    "ignore:Multiple active versions of the dataset matching the name"
+)


I think we should explain why it's fine to ignore this warning with a comment such as:

Suggested change

@pytest.mark.filterwarnings(

"ignore:Multiple active versions of the dataset matching the name"

)

@pytest.mark.filterwarnings(

# This test is intentionally passing both data_id and data_name + version in

# which case and this causes _fetch_dataset_from_openml to trigger a spurious

# warning.

"ignore:Multiple active versions of the dataset matching the name"

)

@glemaitre I am not sure about this explanation. You investigating the cause.

Isn't there a simple way to just avoid triggering the warning in the first place? If no please expand on the suggestion explanation because I do not find it satisfying.

Is the problem related to monkeypatching?

Also do we really want to pass both data_id and data_name + versions in those tests? Why not just data_id and put the name as a comment?

Actually after discussion with @glemaitre I think we should instead change _fetch_dataset_from_openml to avoid raising that specific warning in the first place:

with with warnings.catch_warnings(): warnings.filterwarnings("ignore", category=UserWarning) fetch_openml(...)

Then we will also need to update test_fetch_openml_iris to directly call fetch_openml("iris") instead of using _fetch_dataset_from_openml.

Assuming the tests pass, this looks much better to me :)

Two more warnings remain, though, regarding the Australian dataset's version 1 being outdated.

ogrisel

Assuming the tests pass, this looks much better to me :)

sklearn/datasets/tests/test_openml.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

sklearn/datasets/tests/test_openml.py

jnothman · 2021-02-07T01:47:22Z

Is the right fix to specify a version for the dataset that is triggering that warning? Otherwise this PR looks alright.

glemaitre · 2021-02-07T10:42:19Z

Is the right fix to specify a version for the dataset that is triggering that warning? Otherwise this PR looks alright.

True but here our test should be refactored after this PR.

_fetch_dataset_from_openml is not only used to fetch a dataset. It is actually checking that three different ways are working to fetch the dataset (with one passing the name without version that might raise a warning) and make some additional checks.

So this function is more of a checker than a fetcher and it might do a bit too much. We could think about breaking down the fetching and make tests that would only check that the fetching would work and properly manage the warning management. But it would be in another PR.

glemaitre

If you can change the order of the warnings, it would be good to be merged.

sklearn/datasets/tests/test_openml.py

glemaitre · 2021-02-07T11:02:33Z

@mjkanji Thank you. This is good to be merged.

ignore multiple active version warnings

f384cb7

github-actions bot added the module:datasets label Feb 6, 2021

mjkanji changed the title ~~ignore multiple active version warnings~~ Suppress multiple active versions of dataset warnings in test_openml.py Feb 6, 2021

ogrisel reviewed Feb 6, 2021

View reviewed changes

Changed the way fetch_openl tests for multiple version of dataset

fe47489

ogrisel approved these changes Feb 6, 2021

View reviewed changes

ogrisel reviewed Feb 6, 2021

View reviewed changes

sklearn/datasets/tests/test_openml.py Outdated Show resolved Hide resolved

Adds as_frame=False flag to ensure pandas tests don't fail

b4a3437

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

lorentzenchr reviewed Feb 6, 2021

View reviewed changes

sklearn/datasets/tests/test_openml.py Show resolved Hide resolved

sklearn/datasets/tests/test_openml.py Outdated Show resolved Hide resolved

glemaitre changed the title ~~Suppress multiple active versions of dataset warnings in test_openml.py~~ TST Suppress multiple active versions of dataset warnings in test_openml.py Feb 7, 2021

glemaitre reviewed Feb 7, 2021

View reviewed changes

sklearn/datasets/tests/test_openml.py Outdated Show resolved Hide resolved

Fixed import order.

f2b1bf4

glemaitre merged commit 4220043 into scikit-learn:main Feb 7, 2021

glemaitre mentioned this pull request Apr 22, 2021

Release 0.24.2 #19954

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TST Suppress multiple active versions of dataset warnings in `test_openml.py` #19373

TST Suppress multiple active versions of dataset warnings in `test_openml.py` #19373

Uh oh!

mjkanji commented Feb 6, 2021

Uh oh!

reshamas commented Feb 6, 2021

Uh oh!

ogrisel Feb 6, 2021

Uh oh!

ogrisel Feb 6, 2021

Uh oh!

ogrisel Feb 6, 2021 •

edited

Loading

Uh oh!

mjkanji Feb 6, 2021

Uh oh!

ogrisel left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jnothman commented Feb 7, 2021

Uh oh!

glemaitre commented Feb 7, 2021

Uh oh!

glemaitre left a comment

Uh oh!

Uh oh!

glemaitre commented Feb 7, 2021

Uh oh!

Uh oh!

Uh oh!

TST Suppress multiple active versions of dataset warnings in test_openml.py #19373

TST Suppress multiple active versions of dataset warnings in test_openml.py #19373

Uh oh!

Conversation

mjkanji commented Feb 6, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

reshamas commented Feb 6, 2021

Uh oh!

ogrisel Feb 6, 2021

Choose a reason for hiding this comment

Uh oh!

ogrisel Feb 6, 2021

Choose a reason for hiding this comment

Uh oh!

ogrisel Feb 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mjkanji Feb 6, 2021

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jnothman commented Feb 7, 2021

Uh oh!

glemaitre commented Feb 7, 2021

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glemaitre commented Feb 7, 2021

Uh oh!

Uh oh!

TST Suppress multiple active versions of dataset warnings in `test_openml.py` #19373

TST Suppress multiple active versions of dataset warnings in `test_openml.py` #19373

ogrisel Feb 6, 2021 •

edited

Loading