Skip to content

ENH: Avoid memory peak and useless computations when printing a MaskedArray. #6748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 1, 2015

Conversation

saimn
Copy link
Contributor

@saimn saimn commented Nov 30, 2015

Ref #3544. When printing a MaskedArray, the whole array is converted to the object dtype, whereas only a few values are printed to screen.
The approach here is to cut the array along each axis and keep only a subset that it used for the string conversion. This way the output should not change. The shape used for the cut (100 values for each axis) was chosen so we have enough values when printing on large screen (and the number of printed values depend on the dtype if I understand correctly), maybe there is a better value to choose (inspired from what ndarray.str does ?).
Maybe there is a better approach, in which case I am happy to improve this PR.

…dArray.

Ref numpy#3544.

When printing a `MaskedArray`, the whole array is converted to the object dtype,
whereas only a few values are printed to screen. So the approach here is to cut
the array and keep only a subset that it used for the string conversion. This
way the output should not change.
@charris
Copy link
Member

charris commented Dec 1, 2015

I'm curious about this

# convert to object array to make filled work

Without looking through the whole file, I'm guessing this is to allow printing --- for masked values. If we could avoid the object conversion that would be a better solution. Probably more work though...

@seberg
Copy link
Member

seberg commented Dec 1, 2015

Holy, wow. that is the ugliest thing, but I think that might just solve gh-6723, or maybe it was always used like that?

@seberg
Copy link
Member

seberg commented Dec 1, 2015

Nvm (I know it probably did not make sense anyway to anyone ;)), but I got it the wrong way around, ithought the fill value was the problem, but the array is....

@saimn
Copy link
Contributor Author

saimn commented Dec 1, 2015

Without looking through the whole file, I'm guessing this is to allow printing --- for masked values. If we could avoid the object conversion that would be a better solution. Probably more work though...

Yes, I don't know what are the other possibility to fill the masked values with -- without doing it manually.
So it is still very ugly but at least it will not eat all the memory (I was always surprised that it was so long to print big arrays).

@charris
Copy link
Member

charris commented Dec 1, 2015

There is a back compatibility problem with getting rid of the singleton masked_print_option but we could probably do it. I think it should be possible to dispense with it with a bit of cleverness. However, this looks like a good workaround for what we have now. It would be good to use a more descriptive name for nval and make it a class variable with an explanation of what it determines. That way when someone complains that it is the wrong value it will be easy to find and change. Could maybe call it _print_width or some such.

@saimn
Copy link
Contributor Author

saimn commented Dec 1, 2015

@charris : ok, done. Is it enough with a comment next to the class variable ?

# object dtype, extract the corners before the conversion.
for axis in range(self.ndim):
if data.shape[axis] > self._print_width:
ind = np.int(self._print_width / 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do ind = self._print_width // 2 to avoid the integer cast.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we always use the Python 3 meaning of / in order to be able to use the same source code for both Python 2 and 3.

@charris
Copy link
Member

charris commented Dec 1, 2015

Comment looks fine, but there is another nit to pick.

@saimn
Copy link
Contributor Author

saimn commented Dec 1, 2015

Good point, thanks !

charris added a commit that referenced this pull request Dec 1, 2015
ENH: Avoid memory peak and useless computations when printing a MaskedArray.
@charris charris merged commit 11f8092 into numpy:master Dec 1, 2015
@charris
Copy link
Member

charris commented Dec 1, 2015

Thanks Simon.

@saimn saimn deleted the ma-repr-memory branch December 2, 2015 08:40
saimn added a commit to saimn/numpy that referenced this pull request Dec 2, 2015
charris added a commit that referenced this pull request Dec 2, 2015
@ahaldane
Copy link
Member

ahaldane commented Dec 2, 2015

This PR is a great idea! (at least for now)

Possibly for a "real" fix (to avoid to conversion to object type) we need a new dtype system where we can override the __repr__ function of the dtype. Currently this is only possible for the np.void dtype.

colbych added a commit to colbych/numpy that referenced this pull request Dec 6, 2015
* 'master' of git://github.com/numpy/numpy: (24 commits)
  BENCH: allow benchmark suite to run on Python 3
  TST: test f2py, fallback on f2py2.7 etc., fixes numpy#6718
  BUG: link cblas library if cblas is detected
  BUG/TST: Fix for numpy#6724, make numpy.ma.mvoid consistent with numpy.void
  BUG/TST: Fix numpy#6760 by correctly describing mask on nested subdtypes
  BUG: resizing empty array with complex dtype failed
  DOC: Add changelog for numpy#6734 and numpy#6748.
  Use integer division to avoid casting to int.
  Allow to change the maximum width with a class variable.
  Add some tests for mask creation with mask=True or False.
  Test that the mask dtype if MaskType before using np.zeros/ones
  BUG/TST: Fix for numpy#6729
  ENH: Avoid memory peak and useless computations when printing a MaskedArray.
  ENH: Avoid memory peak when creating a MaskedArray with mask=True/False (numpy#6732).
  BUG: Readd fallback CBLAS detection on linux.
  TST: Fix travis-ci test for numpy wheels.
  MAINT: Localize variables only used with relaxed stride checking.
  BUG: Fix for numpy#6719
  MAINT: enable Werror=vla in travis
  BUG: Include relevant files from numpy/linalg/lapack_lite in sdist.
  ...
jaimefrio pushed a commit to jaimefrio/numpy that referenced this pull request Mar 22, 2016
saimn added a commit to saimn/numpy that referenced this pull request May 22, 2016
Ref numpy#7621. numpy#6748 added `np.ma.MaskedArray._print_width` which is used to cut
a masked array before printing it (to save memory and cpu time during the
conversion to the object dtype). But this doesn't work correctly for 1D arrays,
for which up to 1000 values can be printed before cutting the array.

So this commit adds a new class variable `_print_width_1d` to handle the 1D case
separately.
charris pushed a commit to charris/numpy that referenced this pull request May 23, 2016
Ref numpy#7621. numpy#6748 added `np.ma.MaskedArray._print_width` which is used to cut
a masked array before printing it (to save memory and cpu time during the
conversion to the object dtype). But this doesn't work correctly for 1D arrays,
for which up to 1000 values can be printed before cutting the array.

So this commit adds a new class variable `_print_width_1d` to handle the 1D case
separately.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants