Skip to content

Rasterization creates multiple bitmap elements and large file sizes #17149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
samtygier opened this issue Apr 15, 2020 · 8 comments · Fixed by #17159 or #17160
Closed

Rasterization creates multiple bitmap elements and large file sizes #17149

samtygier opened this issue Apr 15, 2020 · 8 comments · Fixed by #17159 or #17160
Milestone

Comments

@samtygier
Copy link
Contributor

samtygier commented Apr 15, 2020

Bug summary

I often work with plots have have a large number of lines, for example showing the trajectories of a large number of particles in a physics simulation. For publication it is good for the axis to be in vector format, but keeping all the tracks as vector can create large file sizes.

Rasterization of some elements in the plot can solve this. Matplotlib offers rasterized kwarg to plot(), or Artist.set_rasterized(). But for this use case it results in a separate bitmap element for each line, making the file size even larger.

I'm not the first person to have issues related to this #13718

Code for reproduction

For example, this creates a 2.6MB file:

N = 20
rast = True

s = np.arange(100)
tracks = []

for a in np.logspace(0.001, 0.5, N):
	for t in np.linspace(0.1, pi, N):
		t = a*np.sin(s*0.05 + t)
		tracks.append(t)

for t in tracks:
	plt.plot(s, t, "-k", alpha=0.5, lw=0.5, rasterized=rast)
plt.savefig("tracks1.svg")

grep "<image" tracks1.svg | wc -l shows that there are 400 image elements.

Calling plot just once with all the tracks makes no difference.

tracks_na = np.array(tracks).T
plt.plot(s, tracks_na, "-k", alpha=0.5, lw=0.5, rasterized=rast)

RFC

This happens due to the way start_rasterizing() and stop_rasterizing() get called the from allow_rasterization() wrapper. If rasterized is set on the Line2D objects, then start and stop are called around each Line2D.draw(). stop_rasterizing() renders out to bitmap and calls _renderer.draw_image(). The only way to stay in rasterizing mode is set_rasterized() on the parent artist of the lines, which would be the axes, but that means that the axes labels and everything else get rasterized.

I have thought of a few solutions, but I would like some feedback before starting.

I think an ideal solution would be automatic for the user. matplotlib would just merge the bitmaps in an optimal way. This would need not effect draw order, so would need to be smart about zorder. It might avoid merging 2 non-overlapping bitmaps if that increases total area of bitmap.

A possible implementation would be changes to the allow_rasterization wrapper so that the rasterization does not stop between consecutive rasterized artists.

A more manual approach is to create an object that can contain multiple artists, and therefor keep the render in rasterization mode for drawing those artists. Collections don't fit this role as they have some limitations, e.g. LineCollection does not have markers. I also looked at Container but that seems designed for specific use cases. So this could be ArtistGroup or ArtistCollection. It would derive from Artist, and its draw() method would iterate through its children and draw them.

I have a minimal prototype, that lets me do:

ag = ArtistGroup()
for t in tracks:
	ag.add_child(matplotlib.lines.Line2D(s, t, color="k", linestyle="-", alpha=0.5, lw=0.5))
ag.set_rasterized(True)
plt.gca().add_artist(ag)
plt.savefig("tracks3.svg")

This results in a 304 kB file with a single <image> element (compared with 2.6 MB with original code). Increasing N makes the file size difference even larger.

If this is a good approach it could maybe be used inside the plot() and similar commands, so that lines drawn with a single plot would end up in a single rasterization (when requested) with no further user input.

Matplotlib version

  • Operating system: Linux
  • Matplotlib version: 3.0.3 and git master
  • Matplotlib backend: svg
  • Python version: 3.7.6

Note, I have been testing with the SVG backend, because its easier to see what is going on the output files. Judging from file size and loading times all this is true in the PDF backend too.

@samtygier samtygier changed the title Rasterization creates multiple bitmaps and large file sizes Rasterization creates multiple bitmap elements and large file sizes Apr 15, 2020
@jklymak
Copy link
Member

jklymak commented Apr 15, 2020

This seems reasonable - you could, during draw, split the artists to draw into two groups and do the vector and rasterized draws all at once. Then you would know how large your bitmap is. The problem though is with zorder/draw order if you have some vector objects mixed in the draw order with the rasterized objects.

Another problem, I bet, is going to be animation and blitting.

@samtygier
Copy link
Contributor Author

samtygier commented Apr 15, 2020

So the first option of making changes in the draw() routines so rasterization can be combined where appropriate with out need any extra work from the user.

Yes, it needs to not modify the draw ordering in ways that change the output. For example

t = np.arange(0,100) * (2.3)
x = np.cos(t)
y = np.sin(t)

def do_plot_1(ax):
	ax.plot(x,y, "-", c="r", rasterized=True)
	ax.plot(x+1,y, "-", c="b", rasterized=False)
	ax.plot(x+2,y, "-", c="g", rasterized=True)
	ax.plot(x+3,y, "-", c="m", rasterized=True)

def do_plot_2(ax):
	ax.plot(x,y, "-", c="r", rasterized=True, zorder=1)
	ax.plot(x+2,y, "-", c="g", rasterized=True, zorder=3)
	ax.plot(x+3,y, "-", c="m", rasterized=True, zorder=4)
	ax.plot(x+1,y, "-", c="b", rasterized=False, zorder=2)

Both need to give the same output, so it would only be possible to combine green and magenta elements.
overlap1

@samtygier
Copy link
Contributor Author

Also, I have been looking at making some image comparison tests to help me through this. I saw #16447 , does that mean I should refrain from adding any comparison tests, or just keep them in a separate commit?

It is possible to point multiple tests at the same baseline image? For example the 2 functions in my previous comment should give identical output. I can put the same baseline image name into 2 tests which sort of works, but it is also the name used for the result image and the diff, so those get over written.

@jklymak
Copy link
Member

jklymak commented Apr 15, 2020

You can use the figures_equal decorator, but a few image tests are OK if needed.

Your approach seems fine and this would be a worthwhile change I think, but again, you may need to think about blitting for animations?

@jklymak jklymak added this to the v3.4.0 milestone Apr 16, 2020
samtygier added a commit to samtygier/matplotlib that referenced this issue Apr 16, 2020
In vector output it is possible to flag artists to be rasterized. In
many cases with multiple rasterized objects there can be significant
file size savings by combining the rendered bitmaps into a single
bitmap.

This achieved by splitting some of the work of the renderers
stop_rasterizing() out into a flush_rasterizing() function. This gets
called when for a non-rasterized artist, and does the flushing only if
needed. It must also be called after every element in the Figure has
been drawn to finalize any remaining rasterized elements.

This fixes matplotlib#17149
@samtygier
Copy link
Contributor Author

Here is a first pass at this. It only really touches start_rasterizing() and stop_rasterizing() and adds flush_rasterizing(). It passes the existing test suite, and adds a couple of tests. I have not looked at animations yet.

The tracks example is now down to 243 KB.

check_figures_equal() decorator was just what needed. So no new baseline images.

I'd still need to do the doc side of this.

samtygier added a commit to samtygier/matplotlib that referenced this issue Apr 16, 2020
In vector output it is possible to flag artists to be rasterized. In
many cases with multiple rasterized objects there can be significant
file size savings by combining the rendered bitmaps into a single
bitmap.

This achieved by splitting some of the work of the renderers
stop_rasterizing() out into a flush_rasterizing() function. This gets
called when for a non-rasterized artist, and does the flushing only if
needed. It must also be called after every element in the Figure has
been drawn to finalize any remaining rasterized elements.

This fixes matplotlib#17149
@samtygier
Copy link
Contributor Author

All the examples in the animation directory work, apart from double_pendulum_sgskip.py, which also fails on master with _tkinter.TclError: bad argument error. Is there anything else you can suggest I test?

@jklymak
Copy link
Member

jklymak commented Apr 17, 2020

#17160 wasn't really fixing this issue, I don't think

@QuLogic
Copy link
Member

QuLogic commented Apr 17, 2020

Right, it was only fixing the bug in the last comment.

samtygier added a commit to samtygier/matplotlib that referenced this issue Apr 21, 2020
In vector output it is possible to flag artists to be rasterized. In
many cases with multiple rasterized objects there can be significant
file size savings by combining the rendered bitmaps into a single
bitmap.

This achieved by splitting some of the work of the renderers
stop_rasterizing() out into a flush_rasterizing() function. This gets
called when for a non-rasterized artist, and does the flushing only if
needed. It must also be called after every element in the Figure has
been drawn to finalize any remaining rasterized elements.

This fixes matplotlib#17149
samtygier added a commit to samtygier/matplotlib that referenced this issue Apr 25, 2020
In vector output it is possible to flag artists to be rasterized. In
many cases with multiple rasterized objects there can be significant
file size savings by combining the rendered bitmaps into a single
bitmap.

This is achieved by moving the depth tracking logic from
start_rasterizing() and stop_rasterizing() functions into the
allow_rasterization() wrapper. This allows delaying the call to
stop_rasterizing() until we are about to draw an non rasterized artist.
stop_rasterizing() must also be called from the finalize() method.

This fixes matplotlib#17149
samtygier added a commit to samtygier/matplotlib that referenced this issue Apr 25, 2020
In vector output it is possible to flag artists to be rasterized. In
many cases with multiple rasterized objects there can be significant
file size savings by combining the rendered bitmaps into a single
bitmap.

This is achieved by moving the depth tracking logic from
start_rasterizing() and stop_rasterizing() functions into the
allow_rasterization() wrapper. This allows delaying the call to
stop_rasterizing() until we are about to draw an non rasterized artist.
stop_rasterizing() must also be called from the finalize() method.

This fixes matplotlib#17149
samtygier added a commit to samtygier/matplotlib that referenced this issue May 3, 2020
In vector output it is possible to flag artists to be rasterized. In
many cases with multiple rasterized objects there can be significant
file size savings by combining the rendered bitmaps into a single
bitmap.

This is achieved by moving the depth tracking logic from
start_rasterizing() and stop_rasterizing() functions into the
allow_rasterization() wrapper. This allows delaying the call to
stop_rasterizing() until we are about to draw an non rasterized artist.
The outer draw method, i.e. in Figure must be wraped with
finalize_rasterization() to ensure the that rasterization is completed.

Figure.suppressComposite can be used to prevent merging.

This fixes matplotlib#17149
samtygier added a commit to samtygier/matplotlib that referenced this issue May 3, 2020
In vector output it is possible to flag artists to be rasterized. In
many cases with multiple rasterized objects there can be significant
file size savings by combining the rendered bitmaps into a single
bitmap.

This is achieved by moving the depth tracking logic from
start_rasterizing() and stop_rasterizing() functions into the
allow_rasterization() wrapper. This allows delaying the call to
stop_rasterizing() until we are about to draw an non rasterized artist.
The outer draw method, i.e. in Figure must be wraped with
_finalize_rasterization() to ensure the that rasterization is completed.

Figure.suppressComposite can be used to prevent merging.

This fixes matplotlib#17149
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants