Skip to content

ENH Add a retry mechanism in fetch_openml #21901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 22, 2021

Conversation

Rileran
Copy link
Contributor

@Rileran Rileran commented Dec 6, 2021

Reference Issues/PRs

Fixes #21397
Retry mechanism for other fetch_* functions : #21691

What does this implement/fix? Explain your changes.

Added a retry mechanism for the function fetch_openml in case of a network error.

This is done by adding n_retries and delay arguments to the function. (similar to #21691).

If a call to urlopen would result in a network error (URLError), we will call the function again, up to n_retries times and with a delay between each call.

Any other comments?

As this is specific to OpenML, the retry mechanism will be bypassed if the network error has a status code 412, as it is the generic error returned by the OpenML API.

@ogrisel
Copy link
Member

ogrisel commented Dec 10, 2021

I have not checked this PR but I recently discovered that there is already a "retry once" mechanism named _retry_with_clean_cache.

I am not sure it plays well with the new concurrency aware download and cache mechanism of #21833 though.

@thomasjpfan
Copy link
Member

The concurrency aware download works as expected with _retry_with_clean_cache. If _open_openml_url fails for a non-HTTPError reason, _retry_with_clean_cache will try again. The purpose of _retry_with_clean_cache was to catch errors from data corruption.

As for this PR, _retry_on_network_error will work with _retry_with_clean_cache, as long as we update:

except HTTPError:
raise

to catch URLError (which is the parent class of HTTPError). This way _retry_on_network_error can retry and if it fails _retry_with_clean_cache will not retry and reraise the error.

@Rileran
Copy link
Contributor Author

Rileran commented Dec 12, 2021

Thank you @thomasjpfan for clarifying the purpose of those two decorators. I have applied the correct changes and now _retry_with_clean_cache should re raise the error if the function fails because of a network error.

I have looked into the failed pipeline and I am not sure about the trouble. The error message caught by pytest look like one from another test.

+ unclosed file <_io.FileIO name='/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-0/popen-gw2/test_fetch_openml_verify_check0/test_invalid_checksum.arff' mode='rb' closefd=True>

Is it possible that another test is failing to close a file and that the error message is caught by our test ? I am not very familiar with pytest and therefore am having trouble isolating the problem.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update @Rileran!

I opened #22005 to address the FileIO issue.

Comment on lines 1541 to 1550
for r in record:
assert (
r.message.args[0]
== "A network error occured while downloading a file. Retrying..."
)
assert len(record) == 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since match= it set in the context manager, the message is already matched. Here only need to check the number of retries:

assert len(record) == 3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected in last commit. Thanks a lot for reviewing my work !

@ogrisel
Copy link
Member

ogrisel commented Dec 17, 2021

I merged #22005 in main, so it should be possible to merge an updated main into this branch to get a clean CI.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the comments below and the one remaining by @thomasjpfan (https://github.com/scikit-learn/scikit-learn/pull/21901/files#r770951394) are addressed.

@Rileran Rileran force-pushed the feat/retry-in-fetch-openml branch from 5e14d8e to a48d74c Compare December 17, 2021 09:09
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Rileran Rileran force-pushed the feat/retry-in-fetch-openml branch from 9460520 to 46f6190 Compare December 17, 2021 10:59
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update @Rileran!

A minor comment, otherwise LGTM!

@thomasjpfan thomasjpfan changed the title [MRG] Add a retry mechanism in fetch_openml ENH Add a retry mechanism in fetch_openml Dec 17, 2021
@Rileran Rileran force-pushed the feat/retry-in-fetch-openml branch 2 times, most recently from a6f9302 to 6691632 Compare December 22, 2021 10:45
FIX Changed default parameter type to float

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
@Rileran Rileran force-pushed the feat/retry-in-fetch-openml branch from 6691632 to d6ea453 Compare December 22, 2021 10:51
@thomasjpfan thomasjpfan merged commit 0882bd3 into scikit-learn:main Dec 22, 2021
venkyyuvy pushed a commit to venkyyuvy/scikit-learn that referenced this pull request Jan 1, 2022
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
@lesteve lesteve mentioned this pull request May 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retry mechanism in fetch_openml
3 participants