-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Avoid memory peak and useless computations when printing a MaskedArray. #6748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…dArray. Ref numpy#3544. When printing a `MaskedArray`, the whole array is converted to the object dtype, whereas only a few values are printed to screen. So the approach here is to cut the array and keep only a subset that it used for the string conversion. This way the output should not change.
I'm curious about this
Without looking through the whole file, I'm guessing this is to allow printing |
Holy, wow. that is the ugliest thing, but I think that might just solve gh-6723, or maybe it was always used like that? |
Nvm (I know it probably did not make sense anyway to anyone ;)), but I got it the wrong way around, ithought the fill value was the problem, but the array is.... |
Yes, I don't know what are the other possibility to fill the masked values with |
There is a back compatibility problem with getting rid of the singleton |
@charris : ok, done. Is it enough with a comment next to the class variable ? |
# object dtype, extract the corners before the conversion. | ||
for axis in range(self.ndim): | ||
if data.shape[axis] > self._print_width: | ||
ind = np.int(self._print_width / 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do ind = self._print_width // 2
to avoid the integer cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we always use the Python 3 meaning of /
in order to be able to use the same source code for both Python 2 and 3.
Comment looks fine, but there is another nit to pick. |
Good point, thanks ! |
ENH: Avoid memory peak and useless computations when printing a MaskedArray.
Thanks Simon. |
This PR is a great idea! (at least for now) Possibly for a "real" fix (to avoid to conversion to object type) we need a new dtype system where we can override the |
* 'master' of git://github.com/numpy/numpy: (24 commits) BENCH: allow benchmark suite to run on Python 3 TST: test f2py, fallback on f2py2.7 etc., fixes numpy#6718 BUG: link cblas library if cblas is detected BUG/TST: Fix for numpy#6724, make numpy.ma.mvoid consistent with numpy.void BUG/TST: Fix numpy#6760 by correctly describing mask on nested subdtypes BUG: resizing empty array with complex dtype failed DOC: Add changelog for numpy#6734 and numpy#6748. Use integer division to avoid casting to int. Allow to change the maximum width with a class variable. Add some tests for mask creation with mask=True or False. Test that the mask dtype if MaskType before using np.zeros/ones BUG/TST: Fix for numpy#6729 ENH: Avoid memory peak and useless computations when printing a MaskedArray. ENH: Avoid memory peak when creating a MaskedArray with mask=True/False (numpy#6732). BUG: Readd fallback CBLAS detection on linux. TST: Fix travis-ci test for numpy wheels. MAINT: Localize variables only used with relaxed stride checking. BUG: Fix for numpy#6719 MAINT: enable Werror=vla in travis BUG: Include relevant files from numpy/linalg/lapack_lite in sdist. ...
Ref numpy#7621. numpy#6748 added `np.ma.MaskedArray._print_width` which is used to cut a masked array before printing it (to save memory and cpu time during the conversion to the object dtype). But this doesn't work correctly for 1D arrays, for which up to 1000 values can be printed before cutting the array. So this commit adds a new class variable `_print_width_1d` to handle the 1D case separately.
Ref numpy#7621. numpy#6748 added `np.ma.MaskedArray._print_width` which is used to cut a masked array before printing it (to save memory and cpu time during the conversion to the object dtype). But this doesn't work correctly for 1D arrays, for which up to 1000 values can be printed before cutting the array. So this commit adds a new class variable `_print_width_1d` to handle the 1D case separately.
Ref #3544. When printing a
MaskedArray
, the whole array is converted to the object dtype, whereas only a few values are printed to screen.The approach here is to cut the array along each axis and keep only a subset that it used for the string conversion. This way the output should not change. The shape used for the cut (100 values for each axis) was chosen so we have enough values when printing on large screen (and the number of printed values depend on the dtype if I understand correctly), maybe there is a better value to choose (inspired from what ndarray.str does ?).
Maybe there is a better approach, in which case I am happy to improve this PR.