rdrecord: smooth frames by invoking smooth_frames().

Benjamin Moody · Benjamin Moody · commit ca20b1a360b1 · 2022-03-15T12:35:54.000-04:00
To simplify the implementation of _rd_segment and _rd_dat_signals, we
want to eliminate the smooth_frames argument, so that the return
values of these two functions will always have the same type (a list
of numpy arrays.)

Therefore, if the application requested frame smoothing, then instead
of calling _rd_segment with smooth_frames=True, we will call
_rd_segment with smooth_frames=False, and post-process the result by
calling Record.smooth_frames.

Record.smooth_frames (SignalMixin.smooth_frames) will give a result
equivalent to what _rd_segment gives with smooth_frames=True, but
there are likely differences in performance:

 - Record.smooth_frames performs the computation by slicing along the
   "long" axis and storing the intermediate results in an int64 numpy
   array.  _rd_dat_signals slices along the "short" axis and stores
   the intermediate results in a Python list.  Record.smooth_frames
   should therefore be faster for large inputs.

 - Record.smooth_frames only operates on the channels present in
   e_d_signal, whereas _rd_dat_signals smooths all of the signals in
   the input file.  Record.smooth_frames therefore saves memory and
   time when reading a subset of channels.

 - Record.smooth_frames always returns an int64 array, whereas
   _rd_dat_signals returns an array of the same type as the original
   data.  Record.smooth_frames therefore uses more memory in many
   cases.  (Note that rdrecord will post-process the result in any
   case, making this change invisible to applications; the issue of
   increased temporary memory usage can be addressed separately.)

 - If there are multiple channels in a signal file, then calling
   _rd_dat_signals with smooth_frames=False requires making an extra
   copy of each signal that has multiple samples per frame (because of
   the "reshape(-1)".)  (This could be addressed in the future by
   allowing _rd_segment, or at least _rd_dat_signals, to return a list
   of *two-dimensional* arrays instead.)

In order for this to work correctly, Record.smooth_frames must be
called after setting both e_d_signal and samps_per_frame.  In
particular, it must be done after Record._arrange_fields "rearranges"
samps_per_frame according to channels.

On the other hand, _arrange_fields is expected to set checksum and
init_value in different ways depending on whether the result is to be
smoothed.  (This use of checksum and init_value is somewhat dubious.)
Therefore, smooth_frames is now invoked as part of _arrange_fields,
after setting channel-specific metadata and before setting checksum
and init_value.

_arrange_fields should never be invoked other than by rdrecord; it
doesn't make any sense to call this function at other times.  Change
the signature of this function to reflect the fact that it actively
transforms the signal array, and make all arguments mandatory.
diff --git a/wfdb/io/record.py b/wfdb/io/record.py
@@ -668,7 +668,7 @@ def wrsamp(self, expanded=False, write_dir=''):
             self.wr_dats(expanded=expanded, write_dir=write_dir)
 
 
-    def _arrange_fields(self, channels, sampfrom=0, expanded=False):
+    def _arrange_fields(self, channels, sampfrom, smooth_frames):
         """
         Arrange/edit object fields to reflect user channel and/or signal
         range input.
@@ -677,10 +677,11 @@ def _arrange_fields(self, channels, sampfrom=0, expanded=False):
         ----------
         channels : list
             List of channel numbers specified.
-        sampfrom : int, optional
+        sampfrom : int
             Starting sample number read.
-        expanded : bool, optional
-            Whether the record was read in expanded mode.
+        smooth_frames : bool
+            Whether to convert the expanded signal array (e_d_signal) into
+            a smooth signal array (d_signal).
 
         Returns
         -------
@@ -693,18 +694,21 @@ def _arrange_fields(self, channels, sampfrom=0, expanded=False):
             setattr(self, field, [item[c] for c in channels])
 
         # Expanded signals - multiple samples per frame.
-        if expanded:
+        if not smooth_frames:
             # Checksum and init_value to be updated if present
             # unless the whole signal length was input
             if self.sig_len != int(len(self.e_d_signal[0]) / self.samps_per_frame[0]):
-                self.checksum = self.calc_checksum(expanded)
+                self.checksum = self.calc_checksum(True)
                 self.init_value = [s[0] for s in self.e_d_signal]
 
             self.n_sig = len(channels)
             self.sig_len = int(len(self.e_d_signal[0]) / self.samps_per_frame[0])
 
         # MxN numpy array d_signal
         else:
+            self.d_signal = self.smooth_frames('digital')
+            self.e_d_signal = None
+
             # Checksum and init_value to be updated if present
             # unless the whole signal length was input
             if self.sig_len != self.d_signal.shape[0]:
@@ -3517,7 +3521,7 @@ def rdrecord(record_name, sampfrom=0, sampto=None, channels=None,
             no_file = False
             sig_data = None
 
-        signals = _signal._rd_segment(
+        record.e_d_signal = _signal._rd_segment(
             file_name=record.file_name,
             dir_name=dir_name,
             pn_dir=pn_dir,
@@ -3531,35 +3535,29 @@ def rdrecord(record_name, sampfrom=0, sampto=None, channels=None,
             sampfrom=sampfrom,
             sampto=sampto,
             channels=channels,
-            smooth_frames=smooth_frames,
+            smooth_frames=False,
             ignore_skew=ignore_skew,
             no_file=no_file,
             sig_data=sig_data,
             return_res=return_res)
 
         # Only 1 sample/frame, or frames are smoothed. Return uniform numpy array
         if smooth_frames:
-            # Read signals from the associated dat files that contain
-            # wanted channels
-            record.d_signal = signals
-
             # Arrange/edit the object fields to reflect user channel
             # and/or signal range input
             record._arrange_fields(channels=channels, sampfrom=sampfrom,
-                                   expanded=False)
+                                   smooth_frames=True)
 
             if physical:
                 # Perform inplace dac to get physical signal
                 record.dac(expanded=False, return_res=return_res, inplace=True)
 
         # Return each sample of the signals with multiple samples per frame
         else:
-            record.e_d_signal = signals
-
             # Arrange/edit the object fields to reflect user channel
             # and/or signal range input
             record._arrange_fields(channels=channels, sampfrom=sampfrom,
-                                   expanded=True)
+                                   smooth_frames=False)
 
             if physical:
                 # Perform dac to get physical signal