Skip to content

TST Suppress multiple active versions of dataset warnings in test_openml.py #19373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 7, 2021

Conversation

mjkanji
Copy link
Contributor

@mjkanji mjkanji commented Feb 6, 2021

Reference Issues/PRs

References #19349

What does this implement/fix? Explain your changes.

Suppresses UserWarning: Multiple active versions of the dataset ... exist warnings, based on @glemaitre's recommendation.

Any other comments?

Two other UserWarnings still remain.

UserWarning: Version 1 of dataset Australian is inactive, meaning that issues have been found in the dataset.

These stem from the call to fetch the Australian dataset in lines and 511 and 531.

@mjkanji mjkanji changed the title ignore multiple active version warnings Suppress multiple active versions of dataset warnings in test_openml.py Feb 6, 2021
@reshamas
Copy link
Member

reshamas commented Feb 6, 2021

#DataUmbrella sprint

Comment on lines 800 to 802
@pytest.mark.filterwarnings(
"ignore:Multiple active versions of the dataset matching the name"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should explain why it's fine to ignore this warning with a comment such as:

Suggested change
@pytest.mark.filterwarnings(
"ignore:Multiple active versions of the dataset matching the name"
)
@pytest.mark.filterwarnings(
# This test is intentionally passing both data_id and data_name + version in
# which case and this causes _fetch_dataset_from_openml to trigger a spurious
# warning.
"ignore:Multiple active versions of the dataset matching the name"
)

@glemaitre I am not sure about this explanation. You investigating the cause.

Isn't there a simple way to just avoid triggering the warning in the first place? If no please expand on the suggestion explanation because I do not find it satisfying.

Is the problem related to monkeypatching?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also do we really want to pass both data_id and data_name + versions in those tests? Why not just data_id and put the name as a comment?

Copy link
Member

@ogrisel ogrisel Feb 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually after discussion with @glemaitre I think we should instead change _fetch_dataset_from_openml to avoid raising that specific warning in the first place:

with with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=UserWarning)
    fetch_openml(...)

Then we will also need to update test_fetch_openml_iris to directly call fetch_openml("iris") instead of using _fetch_dataset_from_openml.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the tests pass, this looks much better to me :)

Two more warnings remain, though, regarding the Australian dataset's version 1 being outdated.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the tests pass, this looks much better to me :)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@jnothman
Copy link
Member

jnothman commented Feb 7, 2021

Is the right fix to specify a version for the dataset that is triggering that warning? Otherwise this PR looks alright.

@glemaitre
Copy link
Member

Is the right fix to specify a version for the dataset that is triggering that warning? Otherwise this PR looks alright.

True but here our test should be refactored after this PR.

_fetch_dataset_from_openml is not only used to fetch a dataset. It is actually checking that three different ways are working to fetch the dataset (with one passing the name without version that might raise a warning) and make some additional checks.

So this function is more of a checker than a fetcher and it might do a bit too much. We could think about breaking down the fetching and make tests that would only check that the fetching would work and properly manage the warning management. But it would be in another PR.

@glemaitre glemaitre changed the title Suppress multiple active versions of dataset warnings in test_openml.py TST Suppress multiple active versions of dataset warnings in test_openml.py Feb 7, 2021
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can change the order of the warnings, it would be good to be merged.

@glemaitre glemaitre merged commit 4220043 into scikit-learn:main Feb 7, 2021
@glemaitre
Copy link
Member

@mjkanji Thank you. This is good to be merged.

@glemaitre glemaitre mentioned this pull request Apr 22, 2021
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants