-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[Windows] PermissionError in datasets fetchers when trying to remove the downloaded archive #9820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Figured out what the problem is, suggested a pull request to fix it. |
Thank you.
|
I am going to reopen this issue. @lucianocrt we tend to leave the issue open until the PR has been fixed. By using "Fix #issueNumber" in your PR description, the associated issue gets closed automatically when the PR is merged. For more details, look at this. |
BTW this could be a problem that exists on Windows in other datasets fetchers, we should probably double-check all of them. |
I quickly checked with a snippet I used for the figshare work, the fetchers affected by this are:
@massich maybe you could fix them since you worked on the fetchers when moving them to figshare. I think we should use context managers (e.g. |
Well, the right way to create a file and remove it is using a context
manager over a tempfile.NamedTemporaryFile or a tempfile.TemporaryFile.
… |
@lesteve, you've marked this for 0.19.1. Is it a regression? |
I wanted to double-check this and forgot sorry. This is a regression for |
So then we would like this resolved within the next few days ideally as I think we should be releasing. Is @massich taking this up? I'll mark it as needing contributor in any case. |
I haven't heard from him yet. I'll chat with him to see whether he has some spare bandwith to tackle this one. |
Hy, I'm a bit out. But I'm looking into this one. |
OK thanks, I can do it, if you don't have the time, just le me know. My understanding is that Joel was hoping to do 0.19.1 kind of this week. |
That's fine, if I'm not mistaken something like this should do the trick fileobj = tempfile.NamedTemporaryFile()
with tarfile.open(mode="r:gz", name=archive_path).extractfile(
'CaliforniaHousing/cal_housing.data') as f:
shutil.copyfileobj(f, fileobj)
fileobj.seek(0)
remove(archive_path)
cal_housing = np.loadtxt(fileobj, delimiter=',')
fileobj.close() But I think that it should be a way to make it nicer, either move the remove to the end and using a named tempfile to delete afterwards or something. I'll make a PR tonight. |
Looks like this could work. Maybe this can be simplified a bit, can you not just do something like: with tarfile.open(...).extractfile(...) as f:
np.loadtxt(f, delimiter=',') |
so i have the sane problem - I am assuming we just have to wait for this to be fixed externally for it to work then? |
You will have to wait for 0.19.1 to be released. In the mean time as a work-around you can find the file and comment out the line with the For example this is the stack-trace I get when trying to use
This tells me that to work around the problem I have to comment out the line Notes:
|
Version: scikit-learn (0.19.0) |
As I mentioned in my comment above, please use scikit-learn 0.19.1. Patching your code was just a temporary work-around that is no longer needed if you use scikit-learn 0.19.1. |
Description
I have just intalled scikit-learn though pip. I am trying to load a dataset (california_housing) but I keep getting a permission error when scikit-learn tries to delete the file.
I can delete the file myself from the file explorer, which suggests that it is python that locks the file.
Steps/Code to Reproduce
import numpy as np
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()
Expected Results
No error should be thrown and data should be loaded correctly
Actual Results
Downloading Cal. housing from https://ndownloader.figshare.com/files/5976036 to C:\Users\lucia\scikit_learn_data
Traceback (most recent call last):
File "test.py", line 4, in
housing = fetch_california_housing()
File "C:\Users\lucia\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\datasets\california_housing.py", line 109, in fetch_california_housing
remove(archive_path)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\lucia\scikit_learn_data\cal_housing.tgz'
Versions
Windows-10-10.0.15063-SP0
Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)]
NumPy 1.13.1
SciPy 0.19.1
Scikit-Learn 0.19.0
The text was updated successfully, but these errors were encountered: