-
Notifications
You must be signed in to change notification settings - Fork 24.9k
Open
Labels
high prioritymodule: dataloaderRelated to torch.utils.data.DataLoader and SamplerRelated to torch.utils.data.DataLoader and Samplermodule: dependency bugProblem is not caused by us, but caused by an upstream library we useProblem is not caused by us, but caused by an upstream library we usemodule: memory usagePyTorch is using more memory than it should, or it is leaking memoryPyTorch is using more memory than it should, or it is leaking memorymodule: molly-guardFeatures which help prevent users from committing common mistakesFeatures which help prevent users from committing common mistakesmodule: multiprocessingRelated to torch.multiprocessingRelated to torch.multiprocessingtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
Editor note: There is a known workaround further down on this issue, which is to NOT use Python lists, but instead using something else, e.g., torch.tensor directly. See #13246 (comment) . You can use a numpy array, but it only fixes the issue for the fork
start method. See #13246 (comment) for more details
🐛 Bug
CPU memory will leak if the DataLoader num_workers > 0
.
To Reproduce
Run the following snippet:
from torch.utils.data import Dataset, DataLoader
from PIL import Image
from torchvision import transforms
import os
class DataIter(Dataset):
def __init__(self):
path = "path/to/data"
self.data = []
for cls in os.listdir(path):
for img in os.listdir(os.path.join(path, cls)):
self.data.append(os.path.join(path, cls, img))
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
with Image.open(self.data[idx]) as img:
img = img.convert('RGB')
return transforms.functional.to_tensor(img)
train_data = DataIter()
train_loader = DataLoader(train_data, batch_size=300,
shuffle=True,
drop_last=True,
pin_memory=False,
num_workers=18)
for i, item in enumerate(train_loader):
if i % 200 == 0:
print(i)
Expected behavior
CPU memory will gradually start increasing, eventually filling up the whole RAM. E.g., the process starts with around 15GB and fills up the whole 128GB available on the system.
When the num_workers=0
, RAM usage is constant.
Environment
PyTorch version: 1.0.0.dev20181028
Is debug build: No
CUDA used to build PyTorch: 9.0.176
OS: Ubuntu 16.04.4 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.5.1
Python version: 3.5
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
Nvidia driver version: 390.67
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.1.4
Versions of relevant libraries:
[pip] Could not collect
[conda] Could not collect
PIL.__version__
'5.3.0'
Additional info
There are around 24 million images in the dataset and all image paths are loaded into a single list as presented in the above code snippet.
I have also tried multiple Pytorch (0.4.0 and 0.4.1) versions and the effect is the same.
cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @ssnl @VitalyFedyunin @ejguan
amitabul, harpone, RayXu14, qbx2, Ali2500 and 128 moresemaphore-egg
Metadata
Metadata
Assignees
Labels
high prioritymodule: dataloaderRelated to torch.utils.data.DataLoader and SamplerRelated to torch.utils.data.DataLoader and Samplermodule: dependency bugProblem is not caused by us, but caused by an upstream library we useProblem is not caused by us, but caused by an upstream library we usemodule: memory usagePyTorch is using more memory than it should, or it is leaking memoryPyTorch is using more memory than it should, or it is leaking memorymodule: molly-guardFeatures which help prevent users from committing common mistakesFeatures which help prevent users from committing common mistakesmodule: multiprocessingRelated to torch.multiprocessingRelated to torch.multiprocessingtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module