Skip to content

Two bugs in colors.BoundaryNorm #4824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 8, 2015

Conversation

fmaussion
Copy link
Contributor

Bug 1: calling BoundaryNorm with a scalar

Short discussion on the ML: http://matplotlib.1069221.n5.nabble.com/Misleading-BoundaryNorm-error-td45970.html
I added some extra tests to the previous test-suite which was quite short.

Bug 2: clipping was ignored

The previous call to np.clip was silently useless (not in place). I don't think that clipping makes much sense in the case of BoundaryNorm but still.
Before merging, I'd like to discuss the point I mention in the code comments: what about clipping the max values? Currently this is inconsistent and I'd like to change it, but I'd like to have your opinion on it.

disclaimer: this is my first PR, I hope to have done everything right. I did not see the "maintenance" branch so I branched from master. I think there are a couple of thing we can do better with these normalizing tools but I need more time to think about it...

ret = ma.array(iret, mask=mask)
if ret.shape == () and not mask:
ret = int(ret) # assume python scalar
if _x.shape == () and not mask:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The not mask part here is probably buggy. I bet we haven't tested this function with any non-trivial masked arrays. Simple masked arrays (all masked, or all unmasked) have scalar mask values, but non-trivial ones have numpy boolean arrays, which would currently emit a warning here, and later raise an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, masked arrays have not been tested yet (in fact, the whole class is not well tested yet). I will have to read about masked arrays first before I can add a few tests. I'll get back to you when it's done

@tacaswell tacaswell modified the milestones: next point release, proposed next point release Jul 30, 2015
@fmaussion
Copy link
Contributor Author

A quick question: what are we supposed to do with non-masked arrays containing NaNs? Normalize will return a NaN:

>>> import matplotlib.colors as mcolors
>>> n = mcolors.Normalize(0, 2)
>>> n([1, np.NaN])
masked_array(data = [ 0.5  nan],
             mask = False,
       fill_value = 1e+20)

Of course, the current version of BoundaryNorm will throw annoying warnings. Am I supposed to mask NaNs automatically?

@WeatherGod
Copy link
Member

I thought ma.asarray masks them automatically?

On Thu, Jul 30, 2015 at 12:10 PM, Fabien Maussion notifications@github.com
wrote:

A quick question: what are we supposed to do with non-masked arrays
containing NaNs? Normalize will return a NaN:

import matplotlib.colors as mcolors>>> n = mcolors.Normalize(0, 2)>>> n([1, np.NaN])
masked_array(data = [ 0.5 nan],
mask = False,
fill_value = 1e+20)

Of course, the current version of BoundaryNorm will throw annoying
warnings. Am I supposed to mask NaNs automatically?


Reply to this email directly or view it on GitHub
#4824 (comment)
.

@fmaussion
Copy link
Contributor Author

Nope. However, fix_invalid seems to do the job:

>>> x = [1., -1, np.nan, np.inf]
>>> np.ma.asarray(x)
masked_array(data = [  1.  -1.  nan  inf],
             mask = False,
       fill_value = 1e+20)

>>> np.ma.fix_invalid(x)
masked_array(data = [1.0 -1.0 -- --],
             mask = [False False  True  True],
       fill_value = 1e+20)

@fmaussion
Copy link
Contributor Author

Sorry to be so picky, but if we expect the following behaviour with a valid scalar:

>>> import matplotlib.colors as mcolors
>>> bn = mcolors.BoundaryNorm([1,2,3], 2)
>>> bn(1.5)
0

what should we expect from the following call?

>>> bn(np.NaN)
???

On my working version this is what I get:

>>> bn(np.NaN)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mowglie/Documents/git/matplotlib/lib/matplotlib/colors.py", line 1291, in __call__
    ret = int(ret[0])  # assume python scalar
  File "/home/mowglie/.pyvirtualenvs/mpl_devel/lib/python3.4/site-packages/numpy/ma/core.py", line 3910, in __int__
    raise MaskError('Cannot convert masked element to a Python int.')
numpy.ma.core.MaskError: Cannot convert masked element to a Python int.

@efiring
Copy link
Member

efiring commented Jul 31, 2015

Nans are presently not supported in the Normalize API. Obviously, this could be changed. It would impose a performance penalty for large arrays. Overall, this is an example of how the mpl API has evolved, without a clear statement of where operations like checking for invalid values should be done, and where they should not. If we do the checking at every level of the API, the performance penalties mount up. Certainly we want to do it at the highest level, like plot and contourf. But Normalize is at a lower level, so we could leave it as-is and just say it is the caller's responsibility to clean up the input.

Whenever it is decided that we need np.masked_invalid or np.fixed_invalid (which I think is a little more general, but slower), it would be good to use the copy kwarg to avoid a copy unless it is really needed. With fixed_invalid it is probably needed in many cases, because fixed_invalid fills the invalid values. Masked_invalid just returns a masked array without changing any underlying data values.

@fmaussion
Copy link
Contributor Author

I see what you mean. I think we shouldn't care about users doing something like BoundaryNorm(np.NaN). The current error message is clear enough.

On my working version I make a copy of the input: _x = ma.fix_invalid(x) but I am ok with changing it to: x = ma.masked_invalid(x). In that case we will change the user's input to a masked array without changing the value. Is that OK to modify user's input that way?

@efiring
Copy link
Member

efiring commented Jul 31, 2015

Even masked_invalid has a copy=True default, so to avoid a copy it
needs to be set to False. In that case we are still not changing the
user's input at all; we are just giving ourselves an object containing
the original user's data together with a mask, and methods that know
what to do with the mask. So long as we don't modify the data values,
there are no side effects.
My larger question, however, is whether any of the Normalize family
are at a level where they should be checking for nan, given that we
do this at the higher API level, and don't want to keep doing it at
each level down the line. Attn: @WeatherGod, @mdboom, @tacaswell.

@fmaussion
Copy link
Contributor Author

Is it common practice in Matplotlib to modify user input? Duck arrays such as Pandas TimeSeries and Xray DataArray will be silently turned into a masked_array...

@efiring
Copy link
Member

efiring commented Aug 1, 2015 via email

@fmaussion
Copy link
Contributor Author

Yes, I understand now. I actually come from IDL, where def f(x): x = x+1 would actually change the value of x. I tend to forgot that in python this is not the case. I just committed a new version with a few more tests for masked arrays and clipping, tell me what you think.

@fmaussion
Copy link
Contributor Author

Sorry about the PEP error. Next time I'll check it prior to the commit. I just opened an issue about a possible enhancement, tell me what you think about it: #4850

@fmaussion
Copy link
Contributor Author

I corrected the PEP error (hopefully) and replaced the call to ma.masked_invalid with a call to self.process_value in order to be consistent with all other Normalize classes. Therefore if we want to change the general behaviour to do more checks, a change to process_value will suffice.

BoundaryNorm is different from all other classes in that in returns integers and not floats. It will produce warnings if run on non-masked NaN values, but I think it's ok like this.

@fmaussion
Copy link
Contributor Author

@efiring, @tacaswell should I do anything else for the PR to be merged/closed? I am away from my mails for the next week but I gess it's not that urgent anyway.

efiring added a commit that referenced this pull request Aug 8, 2015
@efiring efiring merged commit 5d9245d into matplotlib:master Aug 8, 2015
@efiring
Copy link
Member

efiring commented Aug 8, 2015

Thanks very much!

@fmaussion fmaussion deleted the maintenance-colors branch September 7, 2015 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants