Skip to content

scatter could not raise when colors are provided but position data are empty #14113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mwaskom opened this issue May 1, 2019 · 11 comments
Closed
Labels
Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions.
Milestone

Comments

@mwaskom
Copy link

mwaskom commented May 1, 2019

The 3.0.x series added new checks in scatter for equivalence between the length of the x, y vectors and the length of the c vectors.

These changes have broken code in seaborn that sometimes calls scatter with empty position data but non-empty color data, e.g.

import matplotlib.pyplot as plt
f, ax = plt.subplots()
ax.scatter([], [], c=["r", "b", "g"])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/anaconda/envs/seaborn-dev/lib/python3.6/site-packages/matplotlib/axes/_axes.py in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, **kwargs)
   4237                     valid_shape = False
-> 4238                     raise ValueError
   4239             except ValueError:

ValueError: 

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-2-41dfae030a6e> in <module>()
      1 import matplotlib.pyplot as plt
      2 f, ax = plt.subplots()
----> 3 ax.scatter([], [], c=["r", "b", "g"])

~/anaconda/envs/seaborn-dev/lib/python3.6/site-packages/matplotlib/__init__.py in inner(ax, data, *args, **kwargs)
   1808                         "the Matplotlib list!)" % (label_namer, func.__name__),
   1809                         RuntimeWarning, stacklevel=2)
-> 1810             return func(ax, *args, **kwargs)
   1811 
   1812         inner.__doc__ = _add_data_doc(inner.__doc__,

~/anaconda/envs/seaborn-dev/lib/python3.6/site-packages/matplotlib/axes/_axes.py in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, **kwargs)
   4243                         "acceptable for use with 'x' with size {xs}, "
   4244                         "'y' with size {ys}."
-> 4245                         .format(nc=n_elem, xs=x.size, ys=y.size)
   4246                     )
   4247                 # Both the mapping *and* the RGBA conversion failed: pretty

ValueError: 'c' argument has 3 elements, which is not acceptable for use with 'x' with size 0, 'y' with size 0.

IMO this is an unnecessary exception because it conflicts with another patten in matplotlib, which is passing empty data along with specified style attributes (e.g. to create a proxy artist). I appreciate that the c parameter occupies a fuzzy zone between data and style attribute. But specifically in the case when the position data are empty, no points will have the wrong colors.

@timhoffm
Copy link
Member

timhoffm commented May 1, 2019

Thanks for the notification. What is the purpose of specifying 3 colors? ax.scatter([], [], c=["r"]) works.

@mwaskom
Copy link
Author

mwaskom commented May 1, 2019

One particular context (in the linked issue) is following a principle in seaborn where categorical plots represent "empty" categories if they are specified but not present in (a subset of) the data.

@mwaskom
Copy link
Author

mwaskom commented May 1, 2019

Another use case is that seaborn often calls the plotting function it's going to use with empty data to hook into the internal matplotlib cycler while still honoring the user's kwargs. But this new change triggers an exception when that happens in the scatterplot function and the user has passed vector to c.

@timhoffm
Copy link
Member

timhoffm commented May 1, 2019

Can you please provide one or two seaborn code examples so that I can see how that is used in real life and track it down?

@mwaskom
Copy link
Author

mwaskom commented May 1, 2019

@efiring
Copy link
Member

efiring commented May 1, 2019

Requiring that all of the color and size handling be done when there is no position data seems like a stretch too far to be required of mpl. Since you are making a dummy plot, extracting info from it, and discarding it, and since you know how many values you are providing in your "c" argument, I suggest that you simply provide dummy x and y arrays to match.

@mwaskom
Copy link
Author

mwaskom commented May 1, 2019

OK fair enough, I don't have time to argue about this. I predicted that this would be a waste of time, and it was.

@mwaskom mwaskom closed this as completed May 1, 2019
@timhoffm
Copy link
Member

timhoffm commented May 2, 2019

Shouldn‘t we for now take the burden of creating suitable nan-valued x/y for this special case to prevent broken seaborn versions out in the field? While empty data is IMO not a valid use case, the only way to work around this for the end user is to install an older version of matplotlib; and that‘s also not what we want.

That functionality would be deprecated in 3.2 and removed eventually, but it would give time to roll out a new seaborn version that uses dummy data itself.

Reopening and marking as release-critical for 3.1, because if we want to do this, we should bring it still in 3.1. @tacaswell I think it‘s up to you to decide.

@timhoffm timhoffm reopened this May 2, 2019
@timhoffm timhoffm added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label May 2, 2019
@timhoffm timhoffm added this to the v3.1.0 milestone May 2, 2019
@efiring
Copy link
Member

efiring commented May 2, 2019

Nan-valued? Do you mean to put a block of code in the beginning that checks for empty x and y but non-empty c, and then makes nan-filled x and y with lengths matching the first dimension of c? Looking at the seaborn code that @mwaskom pointed to, I think it might have to also check for non-empty s and do the same thing. And effectively, facecolor and facecolors and color also have to be checked, so probably the block has to be a way down in the _parse_scatter_color_args function, which then has to return an additional argument to propagate out the needed replacement of x and y.

Do you really want to try to jam all this into 3.1, with tests and deprecation warnings? I don't.

@timhoffm
Copy link
Member

timhoffm commented May 2, 2019

I assume that it should be fairly straigt forward to add a block

if x.size == 0:
    # specially handle c and s with empty x, y for dummy Artists  #14113
    if s is not None:
        x = np.full_like(s, np.nan)
        y = np.full_like(s, np.nan)
    elif c is not None:
        x = np.full_like(c, np.nan)
        y = np.full_like(c, np.nan)

at

Untested, but I don't expect it to be much more than this.
I don't think we have to cope with kwargs other than c and s. @mwaskom can you confirm that handling c and s are sufficient?

Yes, I would want to jam this into 3.1. The scope of the change is fairly limited. I wouldn't be worried much that it breaks something else (of course needs tests). And it solves a real world issue out there. Deprecation would just start with 3.2, so that a new seaborn can be out before 3.2 is released.

But of course, that's all for @tacaswell to finally decide.

@jklymak
Copy link
Member

jklymak commented May 2, 2019

My vote is w/ @efiring on this. I completely empathize with folks being busy, but this seems like a work around that seaborn happened to be making use of rather than something we actually want, and its in their court to fix it. Note that this is a change introduced in 3.0; if this was in 3.1rc I'd feel a bit differently.

I think we need to strongly encourage downstream projects to test off our master branch and squak if we break something so we can start a dialogue well before the release cycle about whether a change is a good idea or not. I think cartopy and yt do this, and its been very useful to catch things we hadn't thought of.

@mwaskom mwaskom closed this as completed May 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions.
Projects
None yet
Development

No branches or pull requests

4 participants