Skip to content

Issue with the way rasterization works #13718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pharshalp opened this issue Mar 20, 2019 · 10 comments
Closed

Issue with the way rasterization works #13718

pharshalp opened this issue Mar 20, 2019 · 10 comments
Labels
Community support Users in need of help.

Comments

@pharshalp
Copy link
Contributor

pharshalp commented Mar 20, 2019

Bug report

Bug summary
Rasterizing the marker collection results in a large file size as compared to rasterizing the whole axes. See the example below:

Code for reproduction

import matplotlib.pyplot as plt
import numpy as np

data = np.random.rand(100, 100)

fig1, ax1 = plt.subplots()
ax1.plot(data, 'o', rasterized=True)
fig1.savefig('rasterizing_markers.pdf')

fig2, ax2 = plt.subplots()
ax2.set_rasterized(True)
ax2.plot(data, 'o')
fig2.savefig('rasterizing_axes.pdf')

Actual outcome
rasterizing_axes.pdf is around 123 KB and rasterizing_markers.pdf is around 447 KB! (the only difference in those two files is that rasterizing_markers.pdf has spines, ticks and tick labels in vector format)

Expected outcome
Expecting that both files will have similar size. I am sure that the difference is not because of the embedded fonts (If I change data = np.random.rand(100, 100) to data = np.random.rand(10, 10), the file size for rasterizing_markers.pdf is 23 KB so that rules out the role of embedded fonts in explaining the ~300 KB difference).

Matplotlib version

  • Operating system: MacOS
  • Matplotlib version: 3.0.3
  • Matplotlib backend (print(matplotlib.get_backend())):
  • Python version: 3.7
  • Jupyter version (if applicable):
  • Other libraries:
@ImportanceOfBeingErnest
Copy link
Member

ImportanceOfBeingErnest commented Mar 20, 2019

ax1.plot(np.random.rand(100, 100), 'o', rasterized=True) creates 100 "lines", and hence 100 individual rasterized images. If instead you flatten the array, to create only one single "line",
ax1.plot(np.random.rand(100, 100).flatten(), 'o', rasterized=True) the filesize reduces tremendously (in my case from 800kb to 20kb).

I'd say, usually one would plot such bunchs of plots via collections. Is there any special use case that would require you to plot that many lines?

@pharshalp
Copy link
Contributor Author

In my use case I am actually not using ax.plot. I am using a scatter plot (with ax.scatter) to show 5000 points leading to a similar file size issue.

@ImportanceOfBeingErnest
Copy link
Member

Do you have a minimal example for how scatter is used in that case? ax1.scatter(*np.random.rand(2, 5000), marker='o', rasterized=True) would create a 37kb file (compared to a 64kb file in case the axes is rasterized) for me, so it doesn't look like a problem at all.

@pharshalp
Copy link
Contributor Author

pharshalp commented Mar 20, 2019

Here is an example which closely resembles my use case:

import matplotlib.pyplot as plt
import numpy as np

plt.rcParams['figure.dpi'] = 200

data = np.random.rand(6, 832)
y = np.array([100, 200, 300, 400, 500, 600])

fig1, ax1 = plt.subplots()
for ind in range(data.shape[1]):
    ax1.scatter(data[:, ind], y, marker='o', rasterized=True)
fig1.savefig('rasterizing_markers.pdf')

fig2, ax2 = plt.subplots()
ax2.set_rasterized(True)
for ind in range(data.shape[1]):
    ax2.scatter(data[:, ind], y, marker='o')
fig2.savefig('rasterizing_axes.pdf')

I think this faces the same issue as you described earlier... this is creating 832 rasterized images resulting in large file size.

I tried coming up with a workaround i.e. starting with ax2.set_rasterized(True) followed by ax2.xaxis.set_rasterized(False) etc.(before savefig but after the loop containing ax2.scatter) to disable rasterization on x/yaxis and spines. But this approach did not work.

@jklymak
Copy link
Member

jklymak commented Mar 20, 2019

Did you try:

data = np.random.rand(6, 832)
y = np.array([100, 200, 300, 400, 500, 600])
y = np.tile(y, (1, 832))

fig1, ax1 = plt.subplots()
ax1.scatter(data, y, marker='o', rasterized=True)

@jklymak jklymak added the Community support Users in need of help. label Mar 20, 2019
@pharshalp
Copy link
Contributor Author

pharshalp commented Mar 20, 2019

thanks! it worked and makes sense. Should I close the issue?

@ImportanceOfBeingErnest
Copy link
Member

Well, it depends on whether you consider this issue to be solved or if there is still something that matplotlib can or should do?

@pharshalp
Copy link
Contributor Author

the only issue that remains (which came up when I was trying out approaches to solve the current issue) is, why doesn't the following work?

fig, ax = plt.subplots()
ax.set_rasterized(True)
ax.plot([1,2,3])  # plot/scatter/whatever plotting method is needed

fig.canvas.draw()  # to trigger tick label formatters
for xlab in ax.get_xticklabels():
    xlab.set_rasterized(False)

fig.savefig('file.pdf')

xtick labels still end up being rasterized!

@tacaswell
Copy link
Member

In the last case I suspect (with out digging into the code to check) it is due to the way the draw tree works. At the top level we draw the figure by recursively walking the artist tree depth-first. We have code paths in the vector backends to switch temporarily rasterizing an Artist (and implicitly all of it's children) but not a way to switch back to using a vector backend on those children. The tick labels are children of the axis objects which are children of the Axes object.

@pharshalp
Copy link
Contributor Author

Thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community support Users in need of help.
Projects
None yet
Development

No branches or pull requests

4 participants