Skip to content

Scatter plots are very slow when using multiple colors #9053

Open
@jzwinck

Description

@jzwinck

This program plots 3 million dots with random colors:

import matplotlib.pyplot as plt
import numpy as np

N = 3000000
x = np.random.random(N)
y = np.random.random(N)
c = np.random.random((N, 3)) # RGB

plt.scatter(x, y, 1, c, '.')
plt.show()

The initial display is very slow. Even more problematic: zooming is very slow. If you set c to None it will use a single color for all points and it will be fast, with zooming taking about 1 second, vs about 20 seconds with multiple colors.

If you zoom until only a few points are visible, the single-color plot will respond instantly, but the multi-color one will still take 20 seconds. It's as if all 3 million colors are being slowly remapped every time--even for points which can't be seen.

I would expect multi-color scatter plots to be only marginally slower than single-color ones. A 10x slowdown or worse makes me want to disable colors, but then I can't visualize my data properly.

#2156 (four years ago) was aimed at scatter plot performance but seems to have neglected the multi-color case, which as @ChrisBeaumont pointed out is a main use case for scatter(): #2156 (comment)

Unfortunately, the biggest speedup in this PR (the blitting) essentially replicates what plot can do already. The compelling functionality of scatter is, IMO, the ability to map color and/or size onto data. I can envision two "medium"-hanging fruit optimizations, that might push this kind of functionality into the 10^5-6 points range [...]

My real data has more points but only 12 distinct colors, so I'd be happy with a speedup even if it only applies when there are, say, up to 50 distinct colors. I also use ColorMap in my real application, but again I only take a few distinct choices from the map (whereas in the example above, every point has a unique color).

I'm using Matplotlib 2.0.2, NumPy 1.12.1, and Python 3.5.3 on 64-bit Linux with 128 GB of RAM.

Metadata

Metadata

Assignees

No one assigned

    Labels

    PerformancekeepItems to be ignored by the “Stale” Github Action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions