Two bugs in colors.BoundaryNorm #4824

fmaussion · 2015-07-30T10:37:02Z

Bug 1: calling BoundaryNorm with a scalar

Short discussion on the ML: http://matplotlib.1069221.n5.nabble.com/Misleading-BoundaryNorm-error-td45970.html
I added some extra tests to the previous test-suite which was quite short.

Bug 2: clipping was ignored

The previous call to np.clip was silently useless (not in place). I don't think that clipping makes much sense in the case of BoundaryNorm but still.
Before merging, I'd like to discuss the point I mention in the code comments: what about clipping the max values? Currently this is inconsistent and I'd like to change it, but I'd like to have your opinion on it.

disclaimer: this is my first PR, I hope to have done everything right. I did not see the "maintenance" branch so I branched from master. I think there are a couple of thing we can do better with these normalizing tools but I need more time to think about it...

WeatherGod · 2015-07-30T13:47:09Z

lib/matplotlib/colors.py

        ret = ma.array(iret, mask=mask)
-        if ret.shape == () and not mask:
-            ret = int(ret)  # assume python scalar
+        if _x.shape == () and not mask:


The not mask part here is probably buggy. I bet we haven't tested this function with any non-trivial masked arrays. Simple masked arrays (all masked, or all unmasked) have scalar mask values, but non-trivial ones have numpy boolean arrays, which would currently emit a warning here, and later raise an exception.

Correct, masked arrays have not been tested yet (in fact, the whole class is not well tested yet). I will have to read about masked arrays first before I can add a few tests. I'll get back to you when it's done

fmaussion · 2015-07-30T16:10:04Z

A quick question: what are we supposed to do with non-masked arrays containing NaNs? Normalize will return a NaN:

>>> import matplotlib.colors as mcolors
>>> n = mcolors.Normalize(0, 2)
>>> n([1, np.NaN])
masked_array(data = [ 0.5  nan],
             mask = False,
       fill_value = 1e+20)

Of course, the current version of BoundaryNorm will throw annoying warnings. Am I supposed to mask NaNs automatically?

WeatherGod · 2015-07-30T16:13:45Z

I thought ma.asarray masks them automatically?

On Thu, Jul 30, 2015 at 12:10 PM, Fabien Maussion notifications@github.com
wrote:

A quick question: what are we supposed to do with non-masked arrays
containing NaNs? Normalize will return a NaN:

import matplotlib.colors as mcolors>>> n = mcolors.Normalize(0, 2)>>> n([1, np.NaN])
masked_array(data = [ 0.5 nan],
mask = False,
fill_value = 1e+20)

Of course, the current version of BoundaryNorm will throw annoying
warnings. Am I supposed to mask NaNs automatically?

—
Reply to this email directly or view it on GitHub
#4824 (comment)
.

fmaussion · 2015-07-30T16:23:05Z

Nope. However, fix_invalid seems to do the job:

>>> x = [1., -1, np.nan, np.inf]
>>> np.ma.asarray(x)
masked_array(data = [  1.  -1.  nan  inf],
             mask = False,
       fill_value = 1e+20)

>>> np.ma.fix_invalid(x)
masked_array(data = [1.0 -1.0 -- --],
             mask = [False False  True  True],
       fill_value = 1e+20)

fmaussion · 2015-07-30T16:41:05Z

Sorry to be so picky, but if we expect the following behaviour with a valid scalar:

>>> import matplotlib.colors as mcolors
>>> bn = mcolors.BoundaryNorm([1,2,3], 2)
>>> bn(1.5)
0

what should we expect from the following call?

>>> bn(np.NaN)
???

On my working version this is what I get:

>>> bn(np.NaN)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mowglie/Documents/git/matplotlib/lib/matplotlib/colors.py", line 1291, in __call__
    ret = int(ret[0])  # assume python scalar
  File "/home/mowglie/.pyvirtualenvs/mpl_devel/lib/python3.4/site-packages/numpy/ma/core.py", line 3910, in __int__
    raise MaskError('Cannot convert masked element to a Python int.')
numpy.ma.core.MaskError: Cannot convert masked element to a Python int.

efiring · 2015-07-31T07:45:43Z

Nans are presently not supported in the Normalize API. Obviously, this could be changed. It would impose a performance penalty for large arrays. Overall, this is an example of how the mpl API has evolved, without a clear statement of where operations like checking for invalid values should be done, and where they should not. If we do the checking at every level of the API, the performance penalties mount up. Certainly we want to do it at the highest level, like plot and contourf. But Normalize is at a lower level, so we could leave it as-is and just say it is the caller's responsibility to clean up the input.

Whenever it is decided that we need np.masked_invalid or np.fixed_invalid (which I think is a little more general, but slower), it would be good to use the copy kwarg to avoid a copy unless it is really needed. With fixed_invalid it is probably needed in many cases, because fixed_invalid fills the invalid values. Masked_invalid just returns a masked array without changing any underlying data values.

fmaussion · 2015-07-31T08:26:08Z

I see what you mean. I think we shouldn't care about users doing something like BoundaryNorm(np.NaN). The current error message is clear enough.

On my working version I make a copy of the input: _x = ma.fix_invalid(x) but I am ok with changing it to: x = ma.masked_invalid(x). In that case we will change the user's input to a masked array without changing the value. Is that OK to modify user's input that way?

efiring · 2015-07-31T08:50:54Z

Even masked_invalid has a copy=True default, so to avoid a copy it
needs to be set to False. In that case we are still not changing the
user's input at all; we are just giving ourselves an object containing
the original user's data together with a mask, and methods that know
what to do with the mask. So long as we don't modify the data values,
there are no side effects.
My larger question, however, is whether any of the Normalize family
are at a level where they should be checking for nan, given that we
do this at the higher API level, and don't want to keep doing it at
each level down the line. Attn: @WeatherGod, @mdboom, @tacaswell.

fmaussion · 2015-08-01T08:32:03Z

Is it common practice in Matplotlib to modify user input? Duck arrays such as Pandas TimeSeries and Xray DataArray will be silently turned into a masked_array...

efiring · 2015-08-01T08:58:33Z

Depends on what you mean: we modify user input to homogenize it, transform it, etc. as it goes through the process of ending up in a plot. But the strict policy is to do this in a way that has no side effects for the user. In other words, if the user calls "plot(x)", then we are free to internally convert x to a masked array, but the next time the user uses "x", it will be unchanged. Our internal "x" is no longer the user's "x", because if we will need to change anything that the user's "x" is pointing to in memory, we will make a copy, and change only that. At the same time, we try hard not to copy arrays unless it is necessary, because making extra copies chews up time and RAM. Also, we don't claim to be able to handle any array-like input; it has to be something that np.ma.masked_invalid and similar functions can handle. As I noted earlier, even applying those sorts of homogenization functions is to be done only for the high-level parts of the API, not at every possible level. The higher the level, the more care we need to take to handle various types of user input; at lower levels, the priority shifts toward efficiency. And even at the high level, we document masked array and/or nan support in some, but not all functions.

fmaussion · 2015-08-01T14:06:49Z

Yes, I understand now. I actually come from IDL, where def f(x): x = x+1 would actually change the value of x. I tend to forgot that in python this is not the case. I just committed a new version with a few more tests for masked arrays and clipping, tell me what you think.

fmaussion · 2015-08-01T14:44:45Z

Sorry about the PEP error. Next time I'll check it prior to the commit. I just opened an issue about a possible enhancement, tell me what you think about it: #4850

fmaussion · 2015-08-03T16:12:26Z

I corrected the PEP error (hopefully) and replaced the call to ma.masked_invalid with a call to self.process_value in order to be consistent with all other Normalize classes. Therefore if we want to change the general behaviour to do more checks, a change to process_value will suffice.

BoundaryNorm is different from all other classes in that in returns integers and not floats. It will produce warnings if run on non-masked NaN values, but I think it's ok like this.

fmaussion · 2015-08-08T13:06:11Z

@efiring, @tacaswell should I do anything else for the PR to be merged/closed? I am away from my mails for the next week but I gess it's not that urgent anyway.

Two bugs in colors.BoundaryNorm

efiring · 2015-08-08T13:26:38Z

Thanks very much!

bugs in colors.BoundaryNorm

32e727b

tacaswell added the status: needs review label Jul 30, 2015

WeatherGod reviewed Jul 30, 2015
View reviewed changes

tacaswell modified the milestones: next point release, proposed next point release Jul 30, 2015

added tests for masked array and clipping

7fab129

fmaussion mentioned this pull request Aug 1, 2015

Add an "extend" keyword to colors.BoundaryNorm #4850

Closed

now consistent with other Normalize classes

be12dc7

efiring added a commit that referenced this pull request Aug 8, 2015

Merge pull request #4824 from fmaussion/maintenance-colors

5d9245d

Two bugs in colors.BoundaryNorm

efiring merged commit 5d9245d into matplotlib:master Aug 8, 2015

efiring removed the status: needs review label Aug 8, 2015

fmaussion deleted the maintenance-colors branch September 7, 2015 07:29

Uh oh!

Two bugs in colors.BoundaryNorm #4824

Two bugs in colors.BoundaryNorm #4824

Uh oh!

Conversation

fmaussion commented Jul 30, 2015

Uh oh!

WeatherGod Jul 30, 2015

Choose a reason for hiding this comment

Uh oh!

fmaussion Jul 30, 2015

Choose a reason for hiding this comment

Uh oh!

fmaussion commented Jul 30, 2015

Uh oh!

WeatherGod commented Jul 30, 2015

Uh oh!

fmaussion commented Jul 30, 2015

Uh oh!

fmaussion commented Jul 30, 2015

Uh oh!

efiring commented Jul 31, 2015

Uh oh!

fmaussion commented Jul 31, 2015

Uh oh!

efiring commented Jul 31, 2015

Uh oh!

fmaussion commented Aug 1, 2015

Uh oh!

efiring commented Aug 1, 2015 via email

Uh oh!

fmaussion commented Aug 1, 2015

Uh oh!

fmaussion commented Aug 1, 2015

Uh oh!

fmaussion commented Aug 3, 2015

Uh oh!

fmaussion commented Aug 8, 2015

Uh oh!

efiring commented Aug 8, 2015

Uh oh!

Uh oh!