Skip to content

Improper handling of color in set_color when data contains NaN. #9891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amicitas opened this issue Nov 30, 2017 · 9 comments
Closed

Improper handling of color in set_color when data contains NaN. #9891

amicitas opened this issue Nov 30, 2017 · 9 comments
Labels
status: inactive Marked by the “Stale” Github Action topic: color/color & colormaps
Milestone

Comments

@amicitas
Copy link

Bug report

Bug summary

When a scatter plot is made with data that contains np.nan (NaN), and then a color array is given using set_color, the wrong colors get mapped to the data points.

Code for reproduction

from matplotlib import pyplot
from matplotlib.colors import Normalize
from matplotlib import cm
import numpy as np

x = np.linspace(0.0, 1.0, 50)
y = np.linspace(0.0, 1.0, 50)
norm = Normalize(0.0, 1.0)
colormap = cm.ScalarMappable(norm, 'gnuplot')
colors = colormap.to_rgba(y)

# Expected colors with when x and y do not contain any np.nan.
# Success.
plot = pyplot.scatter(x, y, color=colors)
pyplot.show()

# Add some nan values.  
# If the color array is given during the initial creation of the
# scatter plot, then everything works as expected.
# Success.
x[10:40] = np.nan
plot = pyplot.scatter(x, y, color=colors)
pyplot.show()

# Again add some nan values.  
# If the color array is given as a call to set_color
# then the colors no longer match the data.
# Failure.
x[10:40] = np.nan
plot = pyplot.scatter(x, y)
plot.set_color(colors)
pyplot.show()

Actual outcome

Top: Expected colors with when x and y do not contain any np.nan.

Middle: Add some nan values. If the color array is given during the initial creation of the scatter plot, then everything works as expected.

Bottom: Again add some nan values. If the color array is given as a call to set_color then the colors no longer match the data.

screen shot 2017-11-30 at 15 19 02

Expected outcome

The colors corresponding to the data points should be displayed, even when data points contain NaN and set_color is used.

Matplotlib version

  • Operating system: OSX
  • Matplotlib version: 2.0.2
  • Matplotlib backend: module://ipykernel.pylab.backend_inline & MacOSX
  • Python version: 3.6.3
  • Jupyter Notebook version: 5.2.0
  • Other libraries: None
@jklymak
Copy link
Member

jklymak commented Nov 30, 2017

Thanks for the clear report. The PathCollection object doesn't keep track of NaN data, so the set_colors method only acts on existing markers. Its debatable which is the correct behaviour, but you could just as easily see someone complain if it were setup the way you'd like it. There are two workarounds: either set the colours at draw time (your method 1), or edit the color array before sending to set_color.

@tacaswell
Copy link
Member

I have a vague memory of a fix for this going in for 2.1.

@jklymak
Copy link
Member

jklymak commented Nov 30, 2017

Sorry, should have said I reproduced in master.

@efiring
Copy link
Member

efiring commented Nov 30, 2017

To elaborate: the strategy used by scatter is to filter out all invalid data at the start, regardless of which input variable has the invalid or masked value. Then this filtered data is used to make a PathCollection which is returned. Because the invalid values can come from any combination of the input variables, including position as well as color value or size, it would be awkward to try to maintain the original data shape in the returned PathCollection. It might be possible, but I'm not convinced it would be worth the added complexity.

@amicitas
Copy link
Author

amicitas commented Dec 6, 2017

If filtering is done using a variety of input parameters, then I would think it would be even more important to keep track of the invalid data mask, so that data sets (with possible NaN values) could be used naturally with the programatic api. I think that doing so would greatly help with NaN support, but more generally help make the connection between data sets and plots simpler.

In certain types of scientific analysis NaN is a valid and meaningful floating point value that needs to be retained as part of data sets. Requiring fre-filtering before plotting is of course an reasonable option, but then it would still seem that there should be consistency between the original call to scatter and the api.

@efiring
Copy link
Member

efiring commented Dec 6, 2017

I agree that the inconsistency you point out is undesirable. The questions are, what are the options for removing the inconsistency, and is the trade-off involved in the best of these worth it?

I think the simplest way to handle this might be to give collections one more optional initialization argument, a mask, which would then have to be applied in appropriate method calls, perhaps depending on an optional kwarg in the method call itself.

A better alternative might be to warn about this inconsistency in the scatter docstring.

@github-actions
Copy link

This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help!

@github-actions github-actions bot added the status: inactive Marked by the “Stale” Github Action label Apr 26, 2023
@greglucas
Copy link
Contributor

This behaves as expected now, likely after the PR linked above.

@QuLogic QuLogic added this to the v3.1.0 milestone Apr 26, 2023
@QuLogic
Copy link
Member

QuLogic commented Apr 26, 2023

There is no PR directly linked above, but yes, it was fixed by #12422.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: inactive Marked by the “Stale” Github Action topic: color/color & colormaps
Projects
None yet
Development

No branches or pull requests

7 participants