MemoryError #224

venkatesh71097 · 2020-05-31T18:54:05Z

While I'm trying to read waveform data using rdsamp / rdrecord, I encounter this 'MemoryError' issue many a times. I'm using Python 64 bit and am clueless on how to solve it. Any leads?

Lucas-Mc · 2020-06-01T11:57:00Z

That is really interesting... Do you have a stack trace?

venkatesh71097 · 2020-06-01T12:07:42Z

Hello, thank you so much for getting back! This is the code. I'm trying to read a waveform file which is very huge. When I tried to set sampto = 50500000, I can read the file. But, I need the entire data file. This is the code:
signals, fields = wfdb.rdsamp('../../physionet.org/files/mimic3wdb-matched/1.0/p00/p002029/p002029-2160-02-27-17-40')

Error trace:

MemoryError Traceback (most recent call last)
in ()
9 import wfdb
10
---> 11 signals, fields = wfdb.rdsamp('../../physionet.org/files/mimic3wdb-matched/1.0/p00/p002029/p002029-2160-02-27-17-40')

~/anaconda3/envs/python3/lib/python3.6/site-packages/wfdb/io/record.py in rdsamp(record_name, sampfrom, sampto, channels, pb_dir, channel_names, warn_empty)
1392 sampto=sampto, channels=channels, physical=True,
1393 pb_dir=pb_dir, m2s=True, channel_names=channel_names,
-> 1394 warn_empty=warn_empty)
1395
1396 signals = record.p_signal

~/anaconda3/envs/python3/lib/python3.6/site-packages/wfdb/io/record.py in rdrecord(record_name, sampfrom, sampto, channels, physical, pb_dir, m2s, smooth_frames, ignore_skew, return_res, force_channels, channel_names, warn_empty)
1301 os.path.join(dir_name, record.seg_name[seg_num]),
1302 sampfrom=seg_ranges[i][0], sampto=seg_ranges[i][1],
-> 1303 channels=seg_channels[i], physical=physical, pb_dir=pb_dir)
1304
1305 # Arrange the fields of the layout specification segment, and

~/anaconda3/envs/python3/lib/python3.6/site-packages/wfdb/io/record.py in rdrecord(record_name, sampfrom, sampto, channels, physical, pb_dir, m2s, smooth_frames, ignore_skew, return_res, force_channels, channel_names, warn_empty)
1239 if physical:
1240 # Perform inplace dac to get physical signal
-> 1241 record.dac(expanded=False, return_res=return_res, inplace=True)
1242
1243 # Return each sample of the signals with multiple samples per frame

~/anaconda3/envs/python3/lib/python3.6/site-packages/wfdb/io/_signal.py in dac(self, expanded, return_res, inplace)
485 # Do float conversion immediately to avoid potential under/overflow
486 # of efficient int dtype
--> 487 self.d_signal = self.d_signal.astype(floatdtype, copy=False)
488 np.subtract(self.d_signal, self.baseline, self.d_signal)
489 np.divide(self.d_signal, self.adc_gain, self.d_signal)

MemoryError:

Lucas-Mc · 2020-06-01T12:53:32Z

Hey @venkatesh71097, I ran this locally and it crashed like yours did on the first try then I closed out a lot of programs and got it to run successfully though it was a wild ride!

I first ran this command:
signals, fields = wfdb.rdsamp('p002029-2160-02-27-17-40', pb_dir='mimic3wdb/matched/p00/p002029')
Look at the memory usage (quickly going up to 15 GB)!

And the size!

I thought it was a numpy error at first with large arrays and I think it still is.. I think I need to work on optimizing the computations and doing it in sections if memory issues are expected. If you still can't get it to work, maybe try reading each individual header file for now until the memory usage is improved? You could also edit the main file and have it read half of it at a time. Hope this helps!

venkatesh71097 · 2020-06-01T14:25:44Z

Hey lucas!

Thank you so much! I am running on AWS, and I still get this error even after shutting down every other running thing. As you said, I can remove a few headers from the files and run them, but this error pops up in many other files as well. I'm running my code to read signals of one particular channel name on the MIMIC matched subset data (appx. 10,300 files). Can there be some way out to solve this memory error, like say, an option for precision (int/float/ as in MATLAB's WFDB), an option for downsampling the signal etc.? I am wondering if there's a way to bypass the files which entangle into MemoryError in the worst case (which I actually don't want to, as it might deprive me of important files), or stop recording samples before it encounters the memory error.

To give you a basic background of what I'm doing, I've classified the patients based on one particular disease using the icd9_code in metadata, and am running the entire dataset over this. I couldn't map the seq_num (in diagnoses_icd) to the segment number in the matched subset data as they don't match. If required, I can give more details via personal conversation. Thanks in advance!

Lucas-Mc · 2020-06-01T14:40:55Z

Hey @venkatesh71097, good idea! I was thinking this too since I think it's running at float64 right now which is wayyyy excessive especially for a large file like you are trying to read. And the good news is that it would be a "lossless" compression even to go from float64 to float16 with a 4x memory reduction! I'll open an issue that address this issue that allows the user to input the numpy datatype that they desire. I think downsampling is a good one to work on later but hopefully the datatype conversion will be a simple fix first. Thanks for posting!

venkatesh71097 · 2020-06-01T14:44:12Z

Hey lucas! Thank you so much. It'll be a big breakthrough for my research if this is sorted as soon as possible. I had been struggling with this issue over the weekend, that some files get stuck in rdrecord, and some get stuck in rdsamp. Thanks again! :)

Lucas-Mc · 2020-06-01T15:13:17Z

Some preliminary results.. hopefully this compression helps out! I'll work on this after finishing up a commit I'm working on for EDF2MIT! 👍

venkatesh71097 · 2020-06-01T15:16:51Z

Wow, awesome! Thank you so much lucas! :)

venkatesh71097 · 2020-06-01T17:40:48Z

Hey Lucas! There's one more follow-up issue for the same. For one more file, p000109, I tried reading and plotting the data from one channel, 'PLETH'. I was able to read this file once I made it read only the channel_name 'PLETH' (For many other files, as we discussed before, I'm not able to read even if I try to read only for 'PLETH' channel). But, I'm not able to plot as I'm again running into memory error. PFA the code and trace. But anyways, plotting is not very much required for my usecase, but it helps in visualizing the data. Just thought of reporting this to you as it might help others as well.

Code: signal, fields = wfdb.rdsamp('../../physionet.org/files/mimic3wdb-matched/1.0/p00/p000109/p000109-2141-10-21-02-00', channel_names = ['PLETH'])
q = signal.astype('float16')
plt.plot(q)

Trace:

MemoryError Traceback (most recent call last)
in ()
11 signal, fields = wfdb.rdsamp('../../physionet.org/files/mimic3wdb-matched/1.0/p00/p000109/p000109-2141-10-21-02-00', channel_names = ['PLETH'])
12 q = signal.astype('float16')
---> 13 plt.plot(q)

~/anaconda3/envs/python3/lib/python3.6/site-packages/matplotlib/pyplot.py in plot(scalex, scaley, data, *args, **kwargs)
2809 return gca().plot(
2810 *args, scalex=scalex, scaley=scaley, **({"data": data} if data
-> 2811 is not None else {}), **kwargs)
2812
2813

~/anaconda3/envs/python3/lib/python3.6/site-packages/matplotlib/init.py in inner(ax, data, *args, **kwargs)
1808 "the Matplotlib list!)" % (label_namer, func.name),
1809 RuntimeWarning, stacklevel=2)
-> 1810 return func(ax, *args, **kwargs)
1811
1812 inner.doc = _add_data_doc(inner.doc,

~/anaconda3/envs/python3/lib/python3.6/site-packages/matplotlib/axes/_axes.py in plot(self, scalex, scaley, *args, **kwargs)
1610
1611 for line in self._get_lines(*args, **kwargs):
-> 1612 self.add_line(line)
1613 lines.append(line)
1614

~/anaconda3/envs/python3/lib/python3.6/site-packages/matplotlib/axes/_base.py in add_line(self, line)
1893 line.set_clip_path(self.patch)
1894
-> 1895 self._update_line_limits(line)
1896 if not line.get_label():
1897 line.set_label('_line%d' % len(self.lines))

~/anaconda3/envs/python3/lib/python3.6/site-packages/matplotlib/axes/_base.py in _update_line_limits(self, line)
1915 Figures out the data limit of the given line, updating self.dataLim.
1916 """
-> 1917 path = line.get_path()
1918 if path.vertices.size == 0:
1919 return

~/anaconda3/envs/python3/lib/python3.6/site-packages/matplotlib/lines.py in get_path(self)
943 """
944 if self._invalidy or self._invalidx:
--> 945 self.recache()
946 return self._path
947

~/anaconda3/envs/python3/lib/python3.6/site-packages/matplotlib/lines.py in recache(self, always)
647 y = self._y
648
--> 649 self._xy = np.column_stack(np.broadcast_arrays(x, y)).astype(float)
650 self._x, self._y = self._xy.T # views
651

~/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/lib/shape_base.py in column_stack(tup)
367 arr = array(arr, copy=False, subok=True, ndmin=2).T
368 arrays.append(arr)
--> 369 return _nx.concatenate(arrays, 1)
370
371 def dstack(tup):

MemoryError:

Lucas-Mc · 2020-06-01T18:00:10Z

Hey @venkatesh71097! That seems like an issue with matplotlib.pyplot since the array you are trying to plot is so large.. maybe you can try to reduce its size using: plt.plot(q[::100]) or whatever resample you want? You may have to play around with it and 100 may not be large enough but it should help you out!

venkatesh71097 · 2020-06-01T18:58:40Z

Hey lucas, sure! I'll plot using smaller distributions. That must be okay. Do let me know once the datatype option has been included in the repo! Thanks in advance! :)

Lucas-Mc · 2020-06-02T14:38:30Z

Hey @venkatesh71097! Some preliminary results here (and I think you'll like them)!

Over half the time!! Pull request coming soon!

venkatesh71097 · 2020-06-02T14:57:26Z

Hey! Thank you so much. Great that you're taking computation time into consideration as it's currently taking a lot of time to process the data. Like, it took a few hours to read through certain files present in my local drive. If you could reduce the time consumption even more, that would be something really really great!

Adds a datatype parameter in the rdrecord and rdsamp functions to allow the user to increase computation speed at the expense of accuracy and significant figures of signal data. Fixes #224 and #225.

Adds datatype parameter in rdrecord/rdsamp #224 #225

Lucas-Mc self-assigned this Jun 1, 2020

Lucas-Mc added the bug label Jun 1, 2020

Lucas-Mc mentioned this issue Jun 1, 2020

Add option for user to adjust numpy datatype #225

Closed

Lucas-Mc mentioned this issue Jun 2, 2020

Adds datatype parameter in rdrecord/rdsamp #224 #225 #226

Merged

Lucas-Mc closed this as completed in #226 Jun 2, 2020

Lucas-Mc added a commit that referenced this issue Jun 2, 2020

Merge pull request #226 from MIT-LCP/edit_dtype_225

ca29df6

Adds datatype parameter in rdrecord/rdsamp #224 #225

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MemoryError #224

MemoryError #224

venkatesh71097 commented May 31, 2020

Lucas-Mc commented Jun 1, 2020

Uh oh!

venkatesh71097 commented Jun 1, 2020

Uh oh!

Lucas-Mc commented Jun 1, 2020 •

edited

Loading

Uh oh!

venkatesh71097 commented Jun 1, 2020 •

edited

Loading

Uh oh!

Lucas-Mc commented Jun 1, 2020 •

edited

Loading

Uh oh!

venkatesh71097 commented Jun 1, 2020

Uh oh!

Lucas-Mc commented Jun 1, 2020

Uh oh!

venkatesh71097 commented Jun 1, 2020

Uh oh!

venkatesh71097 commented Jun 1, 2020 •

edited

Loading

Uh oh!

Lucas-Mc commented Jun 1, 2020 •

edited

Loading

Uh oh!

venkatesh71097 commented Jun 1, 2020

Uh oh!

Lucas-Mc commented Jun 2, 2020

Uh oh!

venkatesh71097 commented Jun 2, 2020

Uh oh!

MemoryError #224

MemoryError #224

Comments

venkatesh71097 commented May 31, 2020

Lucas-Mc commented Jun 1, 2020

Uh oh!

venkatesh71097 commented Jun 1, 2020

Uh oh!

Lucas-Mc commented Jun 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

venkatesh71097 commented Jun 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lucas-Mc commented Jun 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

venkatesh71097 commented Jun 1, 2020

Uh oh!

Lucas-Mc commented Jun 1, 2020

Uh oh!

venkatesh71097 commented Jun 1, 2020

Uh oh!

venkatesh71097 commented Jun 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Trace:

Uh oh!

Lucas-Mc commented Jun 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

venkatesh71097 commented Jun 1, 2020

Uh oh!

Lucas-Mc commented Jun 2, 2020

Uh oh!

venkatesh71097 commented Jun 2, 2020

Uh oh!

Lucas-Mc commented Jun 1, 2020 •

edited

Loading

venkatesh71097 commented Jun 1, 2020 •

edited

Loading

Lucas-Mc commented Jun 1, 2020 •

edited

Loading

venkatesh71097 commented Jun 1, 2020 •

edited

Loading

Lucas-Mc commented Jun 1, 2020 •

edited

Loading