Skip to content

scatter not showing valid x/y points with invalid color #4354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
megies opened this issue Apr 19, 2015 · 12 comments
Closed

scatter not showing valid x/y points with invalid color #4354

megies opened this issue Apr 19, 2015 · 12 comments

Comments

@megies
Copy link
Contributor

megies commented Apr 19, 2015

scatter plots are removing points with valid x/y coordinates but invalid color value. As colormaps have a special setting for what color should be used in case of invalid data points (used e.g. in imshow array plots) maybe these points should indeed be plotted and plotted with the specified "bad value" color.

Consider this example:

import matplotlib.pyplot as plt
from matplotlib.cm import spring
import numpy as np

spring.set_bad("k", 1)

colors = np.array([1, 3, 9, np.nan])
x = [0, 1, 2, 3]

plt.figure()
plt.imshow(colors.reshape((2, 2)), interpolation="nearest", cmap=spring)

plt.figure()
plt.scatter(x, x, s=1e4, c=colors, cmap=spring)
plt.show()

The right-hand side scatter plot is not showing the x=3, y=3, color=np.nan data point:

figure_1

@tacaswell tacaswell added this to the next point release milestone Apr 19, 2015
@tacaswell
Copy link
Member

attn @efiring I think this is related to the code from #4079 .

@tacaswell
Copy link
Member

As a work around something like

import matplotlib.pyplot as plt
from matplotlib.cm import spring
import numpy as np

spring.set_bad("k", 1)
spring.set_under('k', 1)

colors = np.array([1, np.nan, 9, np.nan])
x = [0, 1, 2, 3]

plt.figure()
plt.imshow(colors.reshape((2, 2)), interpolation="nearest", cmap=spring)

plt.figure()
sc = plt.scatter(x, x, s=1e4,c=[1, 1, 10, 10], cmap=spring)
sc.set_array(colors)
plt.show()

Will work (but I have no idea why the collection is treating nan as under, not as invalid.

@efiring
Copy link
Member

efiring commented Apr 19, 2015

I think the OP is reporting a bug, or API inconsistency, that has always been present and is easily fixed. I will try that now.

@tacaswell
Copy link
Member

Sorry, I didn't mean it was caused by that code, just that it was related.

@efiring
Copy link
Member

efiring commented Apr 19, 2015

The present behavior is explicitly documented in the scatter docstring, so it was deliberate and has been this way for a very long time. The situation is similar to that with pcolor, which also presently does not plot anything for invalid points, while pcolormesh uses the result of set_bad.
If we do change the scatter behavior, then with default handling of bad values in the color array, the outline will still be drawn, but there will be no fill color (unless the user has specified one via set_bad).
I suppose we could add a kwarg to scatter to control whether the legacy behavior or the more API-consistent behavior is used; deprecate the legacy behavior; switch the default; and eventually drop the kwarg. I would prefer to avoid all that (a 3-year process?), but maybe we are stuck with it. What do you think?

@tacaswell
Copy link
Member

Where is this documented? I see the comment in Notes about masked arrays, but I read that as being actual masked arrays, not arrays with np.nan in them (or I am misunderstanding what 'masked arrays' mean in this case).

One plus side of the (some what) crazy scheme to strip all of the plotting methods off of Axes is that we can fix APIs more-or-less at will (and just give a long time before we actually remove the methods).

@efiring
Copy link
Member

efiring commented Apr 19, 2015

@tacaswell, regarding your example with nans in an array to be colormapped: this simply has never been supported, but it would be easy to do so, with some performance cost. Long ago, there was very little support for nans in Numeric, numarray, and early numpy. Therefore I pushed for matplotlib to work with masked arrays, and I still rely on them. More recently, there seems to be a trend toward using nans internally at some points, but it is inconsistent. From the external API standpoint, in many (most?) places we use ma.masked_invalid on incoming arrays so that we can handle any combination of nans and masked arrays; but your example illustrates a case where nothing catches the nans, and the color mapping logic ends up treating them as "under". I haven't looked to see why, because the solution is to add one more call to masked_invalid to ensure the right result.

@efiring
Copy link
Member

efiring commented Apr 19, 2015

You are right; the docstring is explicit about masked arrays, and the delete_masked_points utility is not calling masked_invalid first. The OP's basic point remains, however: it would be more consistent to let set_bad control masked (or invalid, if we add the check for nans) color values.
I don't know to what extent our documentation is explicit about nans; I don't think it says much, if anything, about them. The assumption has been that people will use masked arrays--but this is probably a bad assumption, given that masked arrays are not found in Matlab, for example.

@efiring
Copy link
Member

efiring commented Apr 19, 2015

Correction to my earlier comment: delete_masked_points is checking for nans, so a masked array and an ndarray with nans will yield the same result.

@efiring
Copy link
Member

efiring commented Apr 19, 2015

Correction to another of my comments: the more recent assumption actually was that for the most part we could transparently handle masked array input and arrays with nans identically, but it was OK to leave the documentation consistently talking about masked arrays.

@tacaswell tacaswell modified the milestones: proposed next point release, next point release Jul 31, 2015
@tacaswell
Copy link
Member

punted as not a blocker for 1.5.

@tacaswell tacaswell modified the milestones: 2.1 (next point release), 2.2 (next next feature release) Oct 3, 2017
QiCuiHub pushed a commit to QiCuiHub/matplotlib that referenced this issue Mar 16, 2018
QiCuiHub pushed a commit to QiCuiHub/matplotlib that referenced this issue Mar 16, 2018
QiCuiHub pushed a commit to QiCuiHub/matplotlib that referenced this issue Mar 16, 2018
efiring added a commit to efiring/matplotlib that referenced this issue Oct 6, 2018
anntzer pushed a commit to efiring/matplotlib that referenced this issue Nov 13, 2018
anntzer pushed a commit to efiring/matplotlib that referenced this issue Nov 14, 2018
anntzer pushed a commit to efiring/matplotlib that referenced this issue Dec 12, 2018
slundberg added a commit to shap/shap that referenced this issue Jan 1, 2019
Works around a bug in matplotlib when coloring with NaN values in a scatter plot matplotlib/matplotlib#4354
@ImportanceOfBeingErnest
Copy link
Member

Closing, as being fixed by #12422.

I.e. from matplotlib 3.1 on, you can use plt.scatter(...,, plotnonfinite=True) to include masked or invalid points be be colorized according to the set_bad color of the colormap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants