Skip to content

scatter() behavior with different size inputs depends on the backend but both variants are inconsistent with the docs #12021

Closed
@anntzer

Description

@anntzer

scatter()'s docstring states

    Parameters
    ----------
    x, y : array_like, shape (n, )
        The data positions.
    
    s : scalar or array_like, shape (n, ), optional
        The marker size in points**2.
        Default is ``rcParams['lines.markersize'] ** 2``.
    
    c : color, sequence, or sequence of color, optional, default: 'b'
        The marker color. Possible values:
    
        - A single color format string.
        - A sequence of color specifications of length n.
        - A sequence of n numbers to be mapped to colors using *cmap* and
          *norm*.
        - A 2-D array in which the rows are RGB or RGBA.
    
        Note that *c* should not be a single numeric RGB or RGBA sequence
        because that is indistinguishable from an array of values to be
        colormapped. If you want to specify the same RGB or RGBA value for
        all points, use a 2-D array with a single row.

i.e. s must have the same size as x/y (n) and c too, or c can have size 1 as well. But this is not the case: x, y, c, and s are "cycled over" until covering the longest of them, as can be shown by

scatter([0, 1], [0, 1], [100, 200, 300], alpha=.5)

(plots a second point of size 300 over (0, 0) -- but only for vector backends; for agg, the size=300 point is dropped)
or

scatter([0, 1, 2, 3], [0, 1, 2, 3], c=["r", "g", "b"])

(reuses "r" for the last point).

The difference between vector and raster backends comes from _backend_agg.h using size_t N = std::max(Npaths, Noffsets); (so not taking the size of s (i.e. transforms) into account), whereas the vector backends ultimately rely on backend_bases' _iter_collection_raw_paths, which uses N = max(Npaths, Ntransforms) and then _iter_collection, which has N = max(Npaths, Noffsets) (so all three are taken into account).

Obviously this behavior goes down what semantics we want for draw_path_collection (... having had to figure it out myself for mplcairo), but I think at least that cycling the positions when the sizes or colors are longer is misguided and would be hiding a bug most of the time (indeed, I initially missed the fact that the example in #12010 plotted 4e6 points (at 2e3 positions, each of them overlayed 2e3 times), not 2e3, due to the size of s).

I can imagine wanting to cycle over a shorter c or s, not that I particularly like it as it can also hide bugs.

If we really don't want to change anything, then the docstring should be updated.

mpl master/3.0rc and probably since a long time ago.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions