-
-
Notifications
You must be signed in to change notification settings - Fork 26k
TST Suppress multiple active versions of dataset warnings in test_openml.py
#19373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
test_openml.py
#DataUmbrella sprint |
@pytest.mark.filterwarnings( | ||
"ignore:Multiple active versions of the dataset matching the name" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should explain why it's fine to ignore this warning with a comment such as:
@pytest.mark.filterwarnings( | |
"ignore:Multiple active versions of the dataset matching the name" | |
) | |
@pytest.mark.filterwarnings( | |
# This test is intentionally passing both data_id and data_name + version in | |
# which case and this causes _fetch_dataset_from_openml to trigger a spurious | |
# warning. | |
"ignore:Multiple active versions of the dataset matching the name" | |
) |
@glemaitre I am not sure about this explanation. You investigating the cause.
Isn't there a simple way to just avoid triggering the warning in the first place? If no please expand on the suggestion explanation because I do not find it satisfying.
Is the problem related to monkeypatching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also do we really want to pass both data_id and data_name + versions in those tests? Why not just data_id and put the name as a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually after discussion with @glemaitre I think we should instead change _fetch_dataset_from_openml
to avoid raising that specific warning in the first place:
with with warnings.catch_warnings():
warnings.filterwarnings("ignore", category=UserWarning)
fetch_openml(...)
Then we will also need to update test_fetch_openml_iris
to directly call fetch_openml("iris")
instead of using _fetch_dataset_from_openml
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming the tests pass, this looks much better to me :)
Two more warnings remain, though, regarding the Australian dataset's version 1 being outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming the tests pass, this looks much better to me :)
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Is the right fix to specify a version for the dataset that is triggering that warning? Otherwise this PR looks alright. |
True but here our test should be refactored after this PR.
So this function is more of a checker than a fetcher and it might do a bit too much. We could think about breaking down the fetching and make tests that would only check that the fetching would work and properly manage the warning management. But it would be in another PR. |
test_openml.py
test_openml.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can change the order of the warnings, it would be good to be merged.
@mjkanji Thank you. This is good to be merged. |
Reference Issues/PRs
References #19349
What does this implement/fix? Explain your changes.
Suppresses
UserWarning: Multiple active versions of the dataset ... exist
warnings, based on @glemaitre's recommendation.Any other comments?
Two other UserWarnings still remain.
These stem from the call to fetch the Australian dataset in lines and 511 and 531.