-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
fetch_openml can raise "PermissionError: [WinError 32] The process cannot access the file because it is being used by another process" #21798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am not sure how this code works, but to make the
|
The retry mechanism for |
I do not think any of the |
I can not reproduce the error in Linux. Is the error Windows specific? Other than pytest -n itself is there any scenario that any of fetch_* would need concurrent run in sklearn? |
If a user spawns multiple Python processes (using |
Any function, class or IO operation in sklearn or any other code might be used concurrently and implemented in different ways and a large proportion of them might not be thread safe. My idea is the following path: Instead of changing the original code and moving this level of complexity to fetch_* , deal with it in part of code that creates this problem: tests. |
We have introduced some complexity in our test code for https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/conftest.py#L81-L82 This code downloads all the necessary files before
We have to choose what we want to be threadsafe and I would prefer to have Given all that, I think it is important to fix the tests so the CI is stable. I opened #21806 as a quick workaround to fix the tests. |
So far the script I've come up with for tempfile naming is: from threading import get_native_id
from string import ascii_lowercase, digits
from random import Random
def _get_tempfile_local_path(dir_name: str) -> str:
rng = Random(get_native_id())
tempfile_name = ''.join(rng.choices(ascii_lowercase + digits, k=10))
tempfile_local_path = os.path.join(dir_name, tempfile_name + '.gz')
return tempfile_local_path
|
The code was for a naming scheme that would be reproducible from the same thread but it is unnecessary. |
I think this issue was fixed in #21833 |
Describe the bug
On windows, if
fetch_openml
is run concurrently in 2 processes, for instance when running the test with pytest-xdist, one sometimes get errors such as:Full error log:
https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=35377&view=logs&j=18b0749f-dd9a-5274-d197-77895e43d4e4&t=ba53dc33-2c0b-592b-6f69-b1c7af7ca977
Steps/Code to Reproduce
Run
pytest -x -n 4 --pyargs sklearn
many times.Expected Results
No crash, the
fetch_openml
should be concurrent safe.Actual Results
See error report above.
Versions
The text was updated successfully, but these errors were encountered: