-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Two bugs in colors.BoundaryNorm #4824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ret = ma.array(iret, mask=mask) | ||
if ret.shape == () and not mask: | ||
ret = int(ret) # assume python scalar | ||
if _x.shape == () and not mask: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The not mask
part here is probably buggy. I bet we haven't tested this function with any non-trivial masked arrays. Simple masked arrays (all masked, or all unmasked) have scalar mask
values, but non-trivial ones have numpy boolean arrays, which would currently emit a warning here, and later raise an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, masked arrays have not been tested yet (in fact, the whole class is not well tested yet). I will have to read about masked arrays first before I can add a few tests. I'll get back to you when it's done
A quick question: what are we supposed to do with non-masked arrays containing NaNs? >>> import matplotlib.colors as mcolors
>>> n = mcolors.Normalize(0, 2)
>>> n([1, np.NaN])
masked_array(data = [ 0.5 nan],
mask = False,
fill_value = 1e+20) Of course, the current version of |
I thought On Thu, Jul 30, 2015 at 12:10 PM, Fabien Maussion notifications@github.com
|
Nope. However, >>> x = [1., -1, np.nan, np.inf]
>>> np.ma.asarray(x)
masked_array(data = [ 1. -1. nan inf],
mask = False,
fill_value = 1e+20)
>>> np.ma.fix_invalid(x)
masked_array(data = [1.0 -1.0 -- --],
mask = [False False True True],
fill_value = 1e+20) |
Sorry to be so picky, but if we expect the following behaviour with a valid scalar: >>> import matplotlib.colors as mcolors
>>> bn = mcolors.BoundaryNorm([1,2,3], 2)
>>> bn(1.5)
0 what should we expect from the following call? >>> bn(np.NaN)
??? On my working version this is what I get: >>> bn(np.NaN)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mowglie/Documents/git/matplotlib/lib/matplotlib/colors.py", line 1291, in __call__
ret = int(ret[0]) # assume python scalar
File "/home/mowglie/.pyvirtualenvs/mpl_devel/lib/python3.4/site-packages/numpy/ma/core.py", line 3910, in __int__
raise MaskError('Cannot convert masked element to a Python int.')
numpy.ma.core.MaskError: Cannot convert masked element to a Python int. |
Nans are presently not supported in the Normalize API. Obviously, this could be changed. It would impose a performance penalty for large arrays. Overall, this is an example of how the mpl API has evolved, without a clear statement of where operations like checking for invalid values should be done, and where they should not. If we do the checking at every level of the API, the performance penalties mount up. Certainly we want to do it at the highest level, like plot and contourf. But Normalize is at a lower level, so we could leave it as-is and just say it is the caller's responsibility to clean up the input. Whenever it is decided that we need np.masked_invalid or np.fixed_invalid (which I think is a little more general, but slower), it would be good to use the copy kwarg to avoid a copy unless it is really needed. With fixed_invalid it is probably needed in many cases, because fixed_invalid fills the invalid values. Masked_invalid just returns a masked array without changing any underlying data values. |
I see what you mean. I think we shouldn't care about users doing something like On my working version I make a copy of the input: |
Even masked_invalid has a copy=True default, so to avoid a copy it |
Is it common practice in Matplotlib to modify user input? Duck arrays such as Pandas TimeSeries and Xray DataArray will be silently turned into a masked_array... |
Depends on what you mean: we modify user input to homogenize it,
transform it, etc. as it goes through the process of ending up in a
plot. But the strict policy is to do this in a way that has no side
effects for the user. In other words, if the user calls "plot(x)",
then we are free to internally convert x to a masked array, but the
next time the user uses "x", it will be unchanged. Our internal "x"
is no longer the user's "x", because if we will need to change
anything that the user's "x" is pointing to in memory, we will make a
copy, and change only that. At the same time, we try hard not to copy
arrays unless it is necessary, because making extra copies chews up
time and RAM.
Also, we don't claim to be able to handle any array-like input; it has
to be something that np.ma.masked_invalid and similar functions can
handle. As I noted earlier, even applying those sorts of
homogenization functions is to be done only for the high-level parts
of the API, not at every possible level. The higher the level, the
more care we need to take to handle various types of user input; at
lower levels, the priority shifts toward efficiency. And even at the
high level, we document masked array and/or nan support in some, but
not all functions.
|
Yes, I understand now. I actually come from IDL, where |
Sorry about the PEP error. Next time I'll check it prior to the commit. I just opened an issue about a possible enhancement, tell me what you think about it: #4850 |
I corrected the PEP error (hopefully) and replaced the call to
|
@efiring, @tacaswell should I do anything else for the PR to be merged/closed? I am away from my mails for the next week but I gess it's not that urgent anyway. |
Two bugs in colors.BoundaryNorm
Thanks very much! |
Bug 1: calling BoundaryNorm with a scalar
Short discussion on the ML: http://matplotlib.1069221.n5.nabble.com/Misleading-BoundaryNorm-error-td45970.html
I added some extra tests to the previous test-suite which was quite short.
Bug 2: clipping was ignored
The previous call to np.clip was silently useless (not in place). I don't think that clipping makes much sense in the case of BoundaryNorm but still.
Before merging, I'd like to discuss the point I mention in the code comments: what about clipping the max values? Currently this is inconsistent and I'd like to change it, but I'd like to have your opinion on it.
disclaimer: this is my first PR, I hope to have done everything right. I did not see the "maintenance" branch so I branched from master. I think there are a couple of thing we can do better with these normalizing tools but I need more time to think about it...