-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Scatter plots are very slow when using multiple colors #9053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In the case where everything is one color and 1 marker size I think we take a short-cut and fallback to the code path used for If you have order 50 colors and are not using the continuous size variability of |
@mdboom suggests this would be not so easy to do with Agg (#2156 (comment)). I don't know Agg's internal, but this issue is solved by https://github.com/anntzer/mpl_cairo (... if you can manage to install it :-)) |
@tacaswell In the typical case where all markers are the same size, there is no need to render markers that cannot be inside the viewport. Also, when zooming in, there is no need to render markers which were previously not visible (zooming in can only eliminate markers from the viewport, not introduce them). I tried your suggestion of calling
It is indeed much faster, though it produces a different Z order (before zooming in, all markers appear to be the same color, because they were drawn in color order). I can live with the Z order difference since it provides so much better performance. Thank you for the suggestion. Now that we know how to do it, what do you think about making Matplotlib enable this efficient behavior in a more obvious way? For example, the API could take a new type of color map:
The hypothetical One could also imagine transparently accelerating the existing API by first checking if the number of distinct values in |
I would quite strongly oppose adding an API that provides an approximate behavior to work around a less-than-optimal implementation. |
@anntzer I too would prefer that the existing API simply be made more efficient. Right now, its performance is quite terrible for one of its most common use cases. |
This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help! |
Generating the stamp in monochrome (or rather, alpha-only) and coloring it on-demand should definitely help; in mplcairo this is implemented in PatternCache::mask and ultimately relies on cairo_mask (https://www.cairographics.org/manual/cairo-cairo-t.html#cairo-mask) to do the coloring of the alpha mask. |
This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help! |
This program plots 3 million dots with random colors:
The initial display is very slow. Even more problematic: zooming is very slow. If you set
c
toNone
it will use a single color for all points and it will be fast, with zooming taking about 1 second, vs about 20 seconds with multiple colors.If you zoom until only a few points are visible, the single-color plot will respond instantly, but the multi-color one will still take 20 seconds. It's as if all 3 million colors are being slowly remapped every time--even for points which can't be seen.
I would expect multi-color scatter plots to be only marginally slower than single-color ones. A 10x slowdown or worse makes me want to disable colors, but then I can't visualize my data properly.
#2156 (four years ago) was aimed at scatter plot performance but seems to have neglected the multi-color case, which as @ChrisBeaumont pointed out is a main use case for
scatter()
: #2156 (comment)My real data has more points but only 12 distinct colors, so I'd be happy with a speedup even if it only applies when there are, say, up to 50 distinct colors. I also use
ColorMap
in my real application, but again I only take a few distinct choices from the map (whereas in the example above, every point has a unique color).I'm using Matplotlib 2.0.2, NumPy 1.12.1, and Python 3.5.3 on 64-bit Linux with 128 GB of RAM.
The text was updated successfully, but these errors were encountered: