-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[Bug]: Creating sub-plots is much slower than Plotly #26162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Are you triggering draws for each of these figure calls? It's hard to compare if there is no draw being made. I'd also do all this outside of vscode etc |
I was inspired to do a small profiling with a 10 x 10 subplot and on my computer creating the Interesting to note is that although only 2 x 100 Axis objects was created |
An obvious improvement may be to pass an optional argument to |
I think when we looked into these sorts of things in the past, it boiled down to the transform stack. @anntzer has some ideas to improve, but I'm not sure how much work they are. |
@jklymak The code above does not draw anything. You are right that it could be that Matplotlib calculates a lot of stuff "up-front" while Plotly defers a lot of its computations until we actually draw something. The following repeats the tests above, but this time with random line-drawings as well, and the option to save the image to an IO-stream. Note that this requires
Test 1: No SavingThis looks similar to the original test, but it could be that Plotly still doesn't actually draw anything, while Matplotlib does. Note that before each of these tests I have made a full reset of the Jupyter kernel, and only ran the code above. Test 2: Saving as SVGFor smaller numbers of sub-plots Plotly looks to be faster, while it looks like Matplotlib starts to be faster for larger numbers of sub-plots. Test 3: Saving as PNGAgain Plotly is faster for smaller numbers of sub-plots, but now Matplotlib is even faster for larger numbers of sub-plots. Apparently Plotly is slower at saving PNG compared to SVG-files, while Matplotlib apparently takes roughly the same time for PNG and SVG-files. CommentsWhen doing an "end-to-end" comparison of the whole plotting process, the differences between Matplotlib and Plotly are much smaller. Hopefully there are still some gains to be had from optimizing Matplotlib as @oscargus pointed out (thanks!) because 1.8 seconds to generate 3x10 sub-plots is quite a long time for real-time applications. Also note that I only started using Plotly yesterday. And although I've been using Matplotlib for many years, I would still consider myself a "noob" there as well :-) So whether these are fair apples-to-apples comparisons, I don't know. Perhaps we need to set options for the file-savers to make it completely fair. Please experiment with the code above, if you have an idea for making a more fair comparison. Thanks! EDIT x 3: I keep writing Pyplot instead of Plotly :-) |
Phew that is a relief that we are not ridiculously slower. Of course getting faster would be great, and indeed some of what we do seems lower level than plotly so we could conceivably be faster. For real time applications you maybe don't need to clear the whole figure and start again. You should be able to add and remove artists relatively cheaply. https://matplotlib.org/devdocs/users/explain/artists/performance.html. https://matplotlib.org/devdocs/users/explain/animations/blitting.html#sphx-glr-users-explain-animations-blitting-py may give you some ideas. |
Sorry for startling you with the unfair performance comparison! :-) Thanks to everyone for jumping on this so quickly! I am very grateful for that! I also took a peek at your PR, and I didn't understand any of it, but it's an impressive amount of changes you have made in a very short amount of time! :-) Test 4: Saving as PNG + sharex=TrueThis is an additional test for sharing the x-axis in the columns of sub-plots, which was found to cause a slow-down in #26150 which is also confirmed here, and where @tacaswell may have some ideas for a solution. Compared to Test 3 above, it seems that Plotly benefits and actually gets faster when sharing the x-axis, while Matplotlib suffers a slow-down. Changes to the code above:
|
@jklymak Thanks for the suggestion regarding blitting. However, I don't actually redraw the figures over and over - at least not in a traditional sense. The web-app is still under development, otherwise I could have shown you. But briefly explained, the user inputs various data as you would on any web-site, then they click a "process" button so the data gets sent to the web-server, which is state-less so it only gets the data the user just input and whatever data it needs to read from a database. Then it generates various plots using Matplotlib, and shows the results as SVG in a web-page that is sent back to the user's web-browser. So perhaps "low-latency" plotting is a better description for what I need than "real-time" plotting, which may imply repeated and fast updating of a figure. |
I'll agree that it would be nice if we were faster. OTOH, if you have to wait 3s for 100 plots, I can't help but think that is a small fraction of the time you will need to look at them all and understand what they are telling you. If it were me, I'd put effort into data reduction techniques. |
It would be easier to understand my use-case if I could show you what I am doing with Matplotlib, but unfortunately I can't right now. A total of 3 seconds latency would not be a problem - but one of my "features" takes 10 seconds to run from the user clicks the button until they see the results, so the user-experience is quite sluggish, and it's 90% Matplotlib because of the issues we've been discussing in the past few days. Data-reduction is unfortunately not possible. Please consider it this way: You have over 1 million installs of Matplotlib per day. That's a tremendous amount of users! If you can save 50% of the runtime through code optimizations, such as the ones in this thread, then it's not only a massive amount of runtime that is saved for all users, which could make them more productive, it would also save electricity for server-farms using Matplotlib, and it would also make Matplotlib usable for more time-critical applications, which might help invent and develop all new kinds of tools. So making Matplotlib run faster would be far more impactful in the world than just my personal needs. |
Of course making Matplotlib run faster would be nice. However, this is a hard issue. The original architecture has not been designed with the idea of several tenths of subplots in mind. Optimizing performance while keeping full backward-compatibility is very challenging. I estimate that it'd need some hundred hours of focussed time for a developer familiar with the matplotlib codebase. Unfortunately, the intersection of people who would be able to do this and who have the capacity is nearly zero. If you are interested, I can link a couple of issues and PRs to that topic. It's not that we have not looked into this, but there are no low-hanging fruits left here. |
If a fraction of the people who used Matplotlib contributed $5 a year we could hire an RSE or two to work on problems like this. |
You guys are doing a valiant effort on Matplotlib! Regarding funding of open source in general, the real problem is all the corporations and universities who use the software but don't contribute either manpower or money to its development. I lack polite words to describe that, and I think an effort should be made to "guilt" them into helping. I have been working well over a year on my current project. It's just me without any funding. It is quite possible that it will be a failure and my time has been completely wasted. But if it is even modestly successful, then at the very top of my wish-list is to donate funds to improve the performance of Matplotlib. But this could easily be a year into the future, if at all. I have done open source R&D for ... I don't know ... 15 years maybe, without being paid for it, and at tremendous expense to myself. My "famous" TensorFlow tutorials probably took me nearly 10-12 months to make, because there was very little information available at the time, which is of course why I made them. Hundreds of thousands of people learned AI from those. I probably made $2500 in donations from that - with $1000 coming from a single wealthy person. The rest of my R&D I just did on my own, and shared the results with everyone. So I will help you financially when I get the chance, but for now I can't. But I still hope that you'll find some of these code optimizations worth your time and effort, as they will probably benefit many people. It looks like you managed to improve this issue already, so I'm excited to see the result! |
Bug summary
Creating sub-plots in Matplotlib is typically 4-12x slower than Plotly. This is not a bug per se, but a serious performance issue for time-critical applications such as interactive web-apps. There are several closed GitHub issues about the slowness of creating sub-plots that go back 6-7 years, but it's still a problem.
Code for reproduction
Actual outcome
In my actual application with 3 columns and 10 rows, the time-usage for Matplotlib is consistently around 1.8 seconds, but for some reason it is only around 0.5 seconds in these tests.
This plot shows the individual time-usage for Matplotlib and Plotly, where the x-axis is the total number of sub-plots (cols * rows):
Note the jagged lines for the Matplotlib time-usage. We could average several runs to make the lines smoother, but the trend is clear, and the jaggedness is actually quite strange, that the time changes so much from run to run.
This plot shows the relative time-usage (Matplotlib time / Plotly time):
Expected outcome
I would like it to run like this - minus the crashes, please.
Additional information
Thanks again for making Matplotlib! I don't want to sound ingrateful or too demanding, as this is my second GitHub issue in a few days relating to the performance of using many sub-plots in Matplotlib. But these issues are major bottle-necks in my application that take around 90% of the runtime. I also wonder if perhaps the issues are related. (See #26150)
Is there a technical reason that Plotly is so much faster than Matplotlib when it comes to having sub-plots?
I imagine that Matplotlib has been made by many different people over a long period of time, so perhaps it is getting hard to understand what the code is doing sometimes?
Plotly runs very fast and is easy to use, but I have already made everything in Matplotlib, and I'm not even sure Plotly has all the features I need to customize the plots. So I'm hoping it would be possible to improve the speed of Matplotlib when using sub-plots.
Thanks!
Operating system
Kubuntu 22
Matplotlib Version
3.7.1
Matplotlib Backend
module://matplotlib_inline.backend_inline
Python version
3.9.12
Jupyter version
6.4.12 (through VSCode)
Installation
pip
The text was updated successfully, but these errors were encountered: