-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Rasterization creates multiple bitmap elements and large file sizes #17149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This seems reasonable - you could, during draw, split the artists to draw into two groups and do the vector and rasterized draws all at once. Then you would know how large your bitmap is. The problem though is with zorder/draw order if you have some vector objects mixed in the draw order with the rasterized objects. Another problem, I bet, is going to be animation and blitting. |
Also, I have been looking at making some image comparison tests to help me through this. I saw #16447 , does that mean I should refrain from adding any comparison tests, or just keep them in a separate commit? It is possible to point multiple tests at the same baseline image? For example the 2 functions in my previous comment should give identical output. I can put the same baseline image name into 2 tests which sort of works, but it is also the name used for the result image and the diff, so those get over written. |
You can use the figures_equal decorator, but a few image tests are OK if needed. Your approach seems fine and this would be a worthwhile change I think, but again, you may need to think about blitting for animations? |
In vector output it is possible to flag artists to be rasterized. In many cases with multiple rasterized objects there can be significant file size savings by combining the rendered bitmaps into a single bitmap. This achieved by splitting some of the work of the renderers stop_rasterizing() out into a flush_rasterizing() function. This gets called when for a non-rasterized artist, and does the flushing only if needed. It must also be called after every element in the Figure has been drawn to finalize any remaining rasterized elements. This fixes matplotlib#17149
Here is a first pass at this. It only really touches The tracks example is now down to 243 KB. check_figures_equal() decorator was just what needed. So no new baseline images. I'd still need to do the doc side of this. |
In vector output it is possible to flag artists to be rasterized. In many cases with multiple rasterized objects there can be significant file size savings by combining the rendered bitmaps into a single bitmap. This achieved by splitting some of the work of the renderers stop_rasterizing() out into a flush_rasterizing() function. This gets called when for a non-rasterized artist, and does the flushing only if needed. It must also be called after every element in the Figure has been drawn to finalize any remaining rasterized elements. This fixes matplotlib#17149
All the examples in the animation directory work, apart from double_pendulum_sgskip.py, which also fails on master with |
#17160 wasn't really fixing this issue, I don't think |
Right, it was only fixing the bug in the last comment. |
In vector output it is possible to flag artists to be rasterized. In many cases with multiple rasterized objects there can be significant file size savings by combining the rendered bitmaps into a single bitmap. This achieved by splitting some of the work of the renderers stop_rasterizing() out into a flush_rasterizing() function. This gets called when for a non-rasterized artist, and does the flushing only if needed. It must also be called after every element in the Figure has been drawn to finalize any remaining rasterized elements. This fixes matplotlib#17149
In vector output it is possible to flag artists to be rasterized. In many cases with multiple rasterized objects there can be significant file size savings by combining the rendered bitmaps into a single bitmap. This is achieved by moving the depth tracking logic from start_rasterizing() and stop_rasterizing() functions into the allow_rasterization() wrapper. This allows delaying the call to stop_rasterizing() until we are about to draw an non rasterized artist. stop_rasterizing() must also be called from the finalize() method. This fixes matplotlib#17149
In vector output it is possible to flag artists to be rasterized. In many cases with multiple rasterized objects there can be significant file size savings by combining the rendered bitmaps into a single bitmap. This is achieved by moving the depth tracking logic from start_rasterizing() and stop_rasterizing() functions into the allow_rasterization() wrapper. This allows delaying the call to stop_rasterizing() until we are about to draw an non rasterized artist. stop_rasterizing() must also be called from the finalize() method. This fixes matplotlib#17149
In vector output it is possible to flag artists to be rasterized. In many cases with multiple rasterized objects there can be significant file size savings by combining the rendered bitmaps into a single bitmap. This is achieved by moving the depth tracking logic from start_rasterizing() and stop_rasterizing() functions into the allow_rasterization() wrapper. This allows delaying the call to stop_rasterizing() until we are about to draw an non rasterized artist. The outer draw method, i.e. in Figure must be wraped with finalize_rasterization() to ensure the that rasterization is completed. Figure.suppressComposite can be used to prevent merging. This fixes matplotlib#17149
In vector output it is possible to flag artists to be rasterized. In many cases with multiple rasterized objects there can be significant file size savings by combining the rendered bitmaps into a single bitmap. This is achieved by moving the depth tracking logic from start_rasterizing() and stop_rasterizing() functions into the allow_rasterization() wrapper. This allows delaying the call to stop_rasterizing() until we are about to draw an non rasterized artist. The outer draw method, i.e. in Figure must be wraped with _finalize_rasterization() to ensure the that rasterization is completed. Figure.suppressComposite can be used to prevent merging. This fixes matplotlib#17149
Bug summary
I often work with plots have have a large number of lines, for example showing the trajectories of a large number of particles in a physics simulation. For publication it is good for the axis to be in vector format, but keeping all the tracks as vector can create large file sizes.
Rasterization of some elements in the plot can solve this. Matplotlib offers
rasterized
kwarg toplot()
, orArtist.set_rasterized()
. But for this use case it results in a separate bitmap element for each line, making the file size even larger.I'm not the first person to have issues related to this #13718
Code for reproduction
For example, this creates a 2.6MB file:
grep "<image" tracks1.svg | wc -l
shows that there are 400 image elements.Calling plot just once with all the tracks makes no difference.
RFC
This happens due to the way
start_rasterizing()
andstop_rasterizing()
get called the fromallow_rasterization()
wrapper. Ifrasterized
is set on the Line2D objects, then start and stop are called around each Line2D.draw().stop_rasterizing()
renders out to bitmap and calls_renderer.draw_image()
. The only way to stay in rasterizing mode isset_rasterized()
on the parent artist of the lines, which would be the axes, but that means that the axes labels and everything else get rasterized.I have thought of a few solutions, but I would like some feedback before starting.
I think an ideal solution would be automatic for the user. matplotlib would just merge the bitmaps in an optimal way. This would need not effect draw order, so would need to be smart about zorder. It might avoid merging 2 non-overlapping bitmaps if that increases total area of bitmap.
A possible implementation would be changes to the
allow_rasterization
wrapper so that the rasterization does not stop between consecutive rasterized artists.A more manual approach is to create an object that can contain multiple artists, and therefor keep the render in rasterization mode for drawing those artists.
Collections
don't fit this role as they have some limitations, e.g.LineCollection
does not have markers. I also looked atContainer
but that seems designed for specific use cases. So this could beArtistGroup
orArtistCollection
. It would derive from Artist, and itsdraw()
method would iterate through its children and draw them.I have a minimal prototype, that lets me do:
This results in a 304 kB file with a single
<image>
element (compared with 2.6 MB with original code). IncreasingN
makes the file size difference even larger.If this is a good approach it could maybe be used inside the
plot()
and similar commands, so that lines drawn with a single plot would end up in a single rasterization (when requested) with no further user input.Matplotlib version
Note, I have been testing with the SVG backend, because its easier to see what is going on the output files. Judging from file size and loading times all this is true in the PDF backend too.
The text was updated successfully, but these errors were encountered: