Skip to content

scatter() behavior with different size inputs depends on the backend but both variants are inconsistent with the docs #12021

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
anntzer opened this issue Sep 5, 2018 · 7 comments
Labels
API: argument checking status: inactive Marked by the “Stale” Github Action

Comments

@anntzer
Copy link
Contributor

anntzer commented Sep 5, 2018

scatter()'s docstring states

    Parameters
    ----------
    x, y : array_like, shape (n, )
        The data positions.
    
    s : scalar or array_like, shape (n, ), optional
        The marker size in points**2.
        Default is ``rcParams['lines.markersize'] ** 2``.
    
    c : color, sequence, or sequence of color, optional, default: 'b'
        The marker color. Possible values:
    
        - A single color format string.
        - A sequence of color specifications of length n.
        - A sequence of n numbers to be mapped to colors using *cmap* and
          *norm*.
        - A 2-D array in which the rows are RGB or RGBA.
    
        Note that *c* should not be a single numeric RGB or RGBA sequence
        because that is indistinguishable from an array of values to be
        colormapped. If you want to specify the same RGB or RGBA value for
        all points, use a 2-D array with a single row.

i.e. s must have the same size as x/y (n) and c too, or c can have size 1 as well. But this is not the case: x, y, c, and s are "cycled over" until covering the longest of them, as can be shown by

scatter([0, 1], [0, 1], [100, 200, 300], alpha=.5)

(plots a second point of size 300 over (0, 0) -- but only for vector backends; for agg, the size=300 point is dropped)
or

scatter([0, 1, 2, 3], [0, 1, 2, 3], c=["r", "g", "b"])

(reuses "r" for the last point).

The difference between vector and raster backends comes from _backend_agg.h using size_t N = std::max(Npaths, Noffsets); (so not taking the size of s (i.e. transforms) into account), whereas the vector backends ultimately rely on backend_bases' _iter_collection_raw_paths, which uses N = max(Npaths, Ntransforms) and then _iter_collection, which has N = max(Npaths, Noffsets) (so all three are taken into account).

Obviously this behavior goes down what semantics we want for draw_path_collection (... having had to figure it out myself for mplcairo), but I think at least that cycling the positions when the sizes or colors are longer is misguided and would be hiding a bug most of the time (indeed, I initially missed the fact that the example in #12010 plotted 4e6 points (at 2e3 positions, each of them overlayed 2e3 times), not 2e3, due to the size of s).

I can imagine wanting to cycle over a shorter c or s, not that I particularly like it as it can also hide bugs.

If we really don't want to change anything, then the docstring should be updated.

mpl master/3.0rc and probably since a long time ago.

@jklymak
Copy link
Member

jklymak commented Sep 5, 2018

crossref #11383

@jklymak
Copy link
Member

jklymak commented Sep 5, 2018

I think c and s should either be singletons or the same size as x, y. Everything else should error. If the user wants to cycle, they can make the appropriate c and s arrays. I don't see anything wrong w/ s being a singleton and c being a vector.

Mysterious cycling souds lie API run amok to me. But if we change it someone will complain....

@anntzer
Copy link
Contributor Author

anntzer commented Sep 5, 2018

I'm going to mark this as RC for the next release (even though the bug is old), because the more I think of it the more I fear that something like the following is waiting to happen (or has already happened): someone plots some data with scatter() with accidentally mismatched sizes, visualizes it, is happy with the result, generates a pdf because that's what the journal submission requires, \includegraphics it and doesn't pay more attention to it, and ends with an incorrect figure. (Well, sure, it was also incorrect at the beginning but the pdf is not only incorrect but also different from what was visually checked.)

(I agree that c and s should just be broadcastable to x and y, which effectively means size 1 or n.)

For me it's the combination of silently dropping/inventing data and of doing so differently in two different backends that makes this RC. Feel free to unlabel if you disagree.

@anntzer anntzer added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Sep 5, 2018
@anntzer anntzer added this to the v3.1 milestone Sep 5, 2018
@timhoffm
Copy link
Member

timhoffm commented Sep 5, 2018

crossref also #11663, which is a good start for better input validation in scatter()

@anntzer anntzer modified the milestones: v3.1.0, v3.2.0 Feb 28, 2019
@anntzer anntzer removed the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Sep 10, 2019
@anntzer anntzer modified the milestones: v3.2.0, v3.3.0 Sep 10, 2019
@anntzer
Copy link
Contributor Author

anntzer commented Sep 10, 2019

@efiring fixed the thing in #13959 for scatter(), but the issue remains for PathCollection -- we may want to put the checks there instead?

@tacaswell tacaswell modified the milestones: v3.3.0, v3.4.0 May 20, 2020
@QuLogic QuLogic modified the milestones: v3.4.0, v3.5.0 Jan 27, 2021
@QuLogic QuLogic modified the milestones: v3.5.0, v3.6.0 Sep 25, 2021
@QuLogic QuLogic modified the milestones: v3.6.0, v3.7.0 Aug 24, 2022
@ksunden ksunden modified the milestones: v3.7.0, v3.7.1 Feb 14, 2023
@QuLogic QuLogic modified the milestones: v3.7.1, future releases Mar 4, 2023
Copy link

This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help!

@timhoffm
Copy link
Member

I've extracted the remaining open issue #12021 (comment) into #28833.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API: argument checking status: inactive Marked by the “Stale” Github Action
Projects
None yet
Development

No branches or pull requests

6 participants