Skip to content

Rasterization creates multiple bitmap elements and large file sizes #17149

@samtygier

Description

@samtygier

Bug summary

I often work with plots have have a large number of lines, for example showing the trajectories of a large number of particles in a physics simulation. For publication it is good for the axis to be in vector format, but keeping all the tracks as vector can create large file sizes.

Rasterization of some elements in the plot can solve this. Matplotlib offers rasterized kwarg to plot(), or Artist.set_rasterized(). But for this use case it results in a separate bitmap element for each line, making the file size even larger.

I'm not the first person to have issues related to this #13718

Code for reproduction

For example, this creates a 2.6MB file:

N = 20
rast = True

s = np.arange(100)
tracks = []

for a in np.logspace(0.001, 0.5, N):
	for t in np.linspace(0.1, pi, N):
		t = a*np.sin(s*0.05 + t)
		tracks.append(t)

for t in tracks:
	plt.plot(s, t, "-k", alpha=0.5, lw=0.5, rasterized=rast)
plt.savefig("tracks1.svg")

grep "<image" tracks1.svg | wc -l shows that there are 400 image elements.

Calling plot just once with all the tracks makes no difference.

tracks_na = np.array(tracks).T
plt.plot(s, tracks_na, "-k", alpha=0.5, lw=0.5, rasterized=rast)

RFC

This happens due to the way start_rasterizing() and stop_rasterizing() get called the from allow_rasterization() wrapper. If rasterized is set on the Line2D objects, then start and stop are called around each Line2D.draw(). stop_rasterizing() renders out to bitmap and calls _renderer.draw_image(). The only way to stay in rasterizing mode is set_rasterized() on the parent artist of the lines, which would be the axes, but that means that the axes labels and everything else get rasterized.

I have thought of a few solutions, but I would like some feedback before starting.

I think an ideal solution would be automatic for the user. matplotlib would just merge the bitmaps in an optimal way. This would need not effect draw order, so would need to be smart about zorder. It might avoid merging 2 non-overlapping bitmaps if that increases total area of bitmap.

A possible implementation would be changes to the allow_rasterization wrapper so that the rasterization does not stop between consecutive rasterized artists.

A more manual approach is to create an object that can contain multiple artists, and therefor keep the render in rasterization mode for drawing those artists. Collections don't fit this role as they have some limitations, e.g. LineCollection does not have markers. I also looked at Container but that seems designed for specific use cases. So this could be ArtistGroup or ArtistCollection. It would derive from Artist, and its draw() method would iterate through its children and draw them.

I have a minimal prototype, that lets me do:

ag = ArtistGroup()
for t in tracks:
	ag.add_child(matplotlib.lines.Line2D(s, t, color="k", linestyle="-", alpha=0.5, lw=0.5))
ag.set_rasterized(True)
plt.gca().add_artist(ag)
plt.savefig("tracks3.svg")

This results in a 304 kB file with a single <image> element (compared with 2.6 MB with original code). Increasing N makes the file size difference even larger.

If this is a good approach it could maybe be used inside the plot() and similar commands, so that lines drawn with a single plot would end up in a single rasterization (when requested) with no further user input.

Matplotlib version

  • Operating system: Linux
  • Matplotlib version: 3.0.3 and git master
  • Matplotlib backend: svg
  • Python version: 3.7.6

Note, I have been testing with the SVG backend, because its easier to see what is going on the output files. Judging from file size and loading times all this is true in the PDF backend too.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions