Skip to content

Merge consecutive rasterizations #17159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 14, 2020
Merged

Conversation

samtygier
Copy link
Contributor

@samtygier samtygier commented Apr 16, 2020

PR Summary

In vector output it is possible to flag artists to be rasterized. In many cases with multiple rasterized objects there can be significant file size savings by combining the rendered bitmaps into a single bitmap.

This achieved by splitting some of the work of the renderers stop_rasterizing() out into a flush_rasterizing() function. This gets called when for a non-rasterized artist, and does the flushing only if needed. It must also be called after every element in the Figure has been drawn to finalize any remaining rasterized elements.

This fixes #17149

Docs and further testing still TODO

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@jklymak jklymak marked this pull request as draft April 16, 2020 18:40
@jklymak jklymak added this to the v3.3.0 milestone Apr 16, 2020
@anntzer
Copy link
Contributor

anntzer commented Apr 16, 2020

In general this seems fine, but please describe in the docs what a backend implementer needs to do (if anything) when implementing (if they need to do so) flush_rasterizing. (In mplcairo https://github.com/matplotlib/mplcairo/blob/master/lib/mplcairo/base.py#L149 I found that I could just reuse the agg filter machinery and use a do-nothing filter an rasterizing end; what happens now?)

@samtygier
Copy link
Contributor Author

I've run the run-mpl-test-suite.py --tolerance=inf from mplcairo (git master). This PR does not break any existing tests (I get 2 failures that I also see with matplotlib master).

The new new test test_backend_svg.py::test_count_bitmaps fails, because the PR does not effect how mplcairo handles rasterization, i.e. the test expects the 5 raster objects to be merged to a single <image> tag in the svg file but finds 5 <image> tags.

If the backend plugs into MixedModeRenderer then everything should just work. If not then they can do nothing, and they just wont get the benefit of bitmap merging.

To support it without MixedModeRenderer they should move any finalisation from stop_rasterizing() to flush_rasterizing(), and have some protection so that the flushing process is not called when there have not been any previous rasterization. In MixedModeRenderer there is self._rasterizing which keeps track of if we are inside of any raster elements, and I added self._rasterizing_unflushed to keep track of if there is there to merger.

@anntzer
Copy link
Contributor

anntzer commented Apr 17, 2020

Thanks for the explanation. I wonder if it would be possible to achieve this without even needing a new API, by simply not calling stop_rasterizing at the end of draw(), but rather storing a _prev_artist_was_rasterized somewhere, and then when the next artist comes, if that flag is set, and this next artist is also rasterized, we simply don't call start_rasterizing(), and if the next artist isn't rasterized, we make the call to stop_rasterizing() that was missing.

@samtygier
Copy link
Contributor Author

The start_rasterizing() and stop_rasterizing() already have slightly misleading names (and comments) because they take into account the nesting depth, so if the axes is set to rasterize then all its children will be. So when you get to an element with rasterized set to false, you would still want rasterization on if the parent is rasterized. Maybe enter end exit, or push and pop would be better names.

Another way of doing this would be to pass a rasterized flag through the nested draw calls. But this could be a more invasive change.

@samtygier samtygier force-pushed the fix-17149 branch 2 times, most recently from a37c819 to 7aac9be Compare April 22, 2020 07:08
@samtygier samtygier marked this pull request as ready for review April 22, 2020 08:13
@samtygier samtygier changed the title [WIP] Merge consecutive rasterizations Merge consecutive rasterizations Apr 22, 2020
@samtygier
Copy link
Contributor Author

I've added a changes entry, and improved the comments on effected functions.

Do I need to add an API change doc? I don't think this should break anything that already exists. Possible exception is that the tests now check for merging, which the external backend mplcairo do not do.

@samtygier
Copy link
Contributor Author

Also I spotted #2816 . Currently user specified group ID are not correctly stored in the SVG for rasterized elements. In the case of merged elements, its less clear how that should work. Maybe once a fix for that lands, I could add a patch to suppress merging if the user has set a gid.

@jklymak
Copy link
Member

jklymak commented Apr 22, 2020

This should definitely get an API note because the output files are changed.

I guess a possible issue is someone may have been counting on separate rasters for some reason?

@anntzer
Copy link
Contributor

anntzer commented Apr 22, 2020

To reiterate my question above: would something along the lines of the following work (untested, but hopefully you get the idea)?

diff --git i/lib/matplotlib/artist.py w/lib/matplotlib/artist.py
index 41e890352..c3c746dad 100644
--- i/lib/matplotlib/artist.py
+++ w/lib/matplotlib/artist.py
@@ -33,8 +33,14 @@ def allow_rasterization(draw):
     @wraps(draw)
     def draw_wrapper(artist, renderer, *args, **kwargs):
         try:
-            if artist.get_rasterized():
-                renderer.start_rasterizing()
+            if renderer._owes_stop_rasterizing:
+                if artist.get_rasterized():
+                    pass  # start_r cancels previously owed stop_r.
+                else:
+                    renderer.stop_rasterizing()
+            else:
+                if artist.get_rasterized():
+                    renderer.start_rasterizing()
             if artist.get_agg_filter() is not None:
                 renderer.start_filter()
 
@@ -43,7 +49,7 @@ def allow_rasterization(draw):
             if artist.get_agg_filter() is not None:
                 renderer.stop_filter(artist.get_agg_filter())
             if artist.get_rasterized():
-                renderer.stop_rasterizing()
+                renderer._owes_stop_rasterizing = True
 
     draw_wrapper._supports_rasterization = True
     return draw_wrapper
diff --git i/lib/matplotlib/figure.py w/lib/matplotlib/figure.py
index 8f31d9f69..db1062a3e 100644
--- i/lib/matplotlib/figure.py
+++ w/lib/matplotlib/figure.py
@@ -1693,6 +1693,7 @@ default: 'top'
     def draw(self, renderer):
         # docstring inherited
         self._cachedRenderer = renderer
+        renderer._owes_stop_rasterizing = False
 
         # draw the figure bounding box, perhaps none for white figure
         if not self.get_visible():
@@ -1740,6 +1741,9 @@ default: 'top'
         finally:
             self.stale = False
 
+        if renderer._owes_stop_rasterizing:
+            renderer.stop_rasterizing()
+
         self.canvas.draw_event(renderer)
 
     def draw_artist(self, a):

Basically, don't emit stop_rasterizing() immediately, but just record that you "owe" one to the renderer, but only emit it if you know the next artist isn't going to restart rasterizing immediately after.

@tacaswell
Copy link
Member

We also need an API change note that there is a new required API on the renders.

I see the utility of this, but am a bit concerned about growing the API and how brittle this seems. I'm mixed (but mostly ok) about the sensitive dependence on draw order. We do have a deterministic draw order, but something just feels off about it (I agree that is very subjective).

I like @anntzer 's proposal because it does not extend the API and we have to muck with Figure.draw either way.

We already have some prior-art in the library of pre-compositing

def _draw_list_compositing_images(
renderer, parent, artists, suppress_composite=None):
"""
Draw a sorted list of artists, compositing images into a single
image where possible.
For internal Matplotlib use only: It is here to reduce duplication
between `Figure.draw` and `Axes.draw`, but otherwise should not be
generally useful.
"""
which squashes images together. Following that pattern of pre-processing the artists lists I think could get this with no API changes or extra accounting (but require changing the draw methods of Figure and Axes.

def _group_rasters(renderer, list_of_artists):
    raster_group = []

    def flush():
        if raster_group:
            renderer.start_rasterizing()
            try:
                for _ in raster_group:
                    _.draw(renderer)
            finally:
                renderer.stop_rasterizing()
                del raster_group[:]

    for a in list_of_artists:
        if a.get_rasterized():
            raster_group.append(a)
            continue
        else:
            flush()
            a.draw(renderer)
    flush()

@tacaswell tacaswell modified the milestones: v3.3.0, v3.4.0 Apr 22, 2020
@tacaswell
Copy link
Member

I pushed this to 3.4 as I am not sure we can get this settled in the next week, but if it is ready no problem with it going in for 3.3.

@samtygier
Copy link
Contributor Author

To reiterate my question above: would something along the lines of the following work (untested, but hopefully you get the idea)?

Basically, don't emit stop_rasterizing() immediately, but just record that you "owe" one to the renderer, but only emit it if you know the next artist isn't going to restart rasterizing immediately after.

That does seem a bit neater. My original thoughts were that if the self._rasterizing_unflushed (or _owes_stop_rasterizing) was in the renderer then the logic should live there too. But maybe having it in allow_rasterization() works as well. Need to check that it works across nested artists.

We already have some prior-art in the library of pre-compositing

So this proposal would be to move the logic out of draw_wrapper and get rid of allow_rasterization. Then do the merging in the draw calls in Figure and Axes. To allow merging between different nested artist Figure could flatten the whole hierarchy of Artists into a single list.

There are a few draw() functions in the library that are not wrapped, but its not clear to me why.

@samtygier samtygier marked this pull request as draft April 22, 2020 20:40
@samtygier
Copy link
Contributor Author

Just spotted an issue with the current pull. We can't call renderer.flush_rasterizing() in Figure.draw(), because it is also wrapped by allow_rasterization. Instead MixedModeRenderer() would need a finalize to do the flushing.

@samtygier
Copy link
Contributor Author

I'm struggling to get the _owes_stop_rasterizing approach to work.

It needs to prevent stop_rasterizing() being called before every non raster artist that follows a raster one. So reset to renderer._owes_stop_rasterizing = False after calling renderer.stop_rasterizing().

If there are nested raster artists, then start_rasterizing() can be called multiple times before _owes_stop_rasterizing gets set true. Previous change means stop_rasterizing() only gets called once. stop_rasterizing() only flushes after a matched number of starts and stops, so output never flushed.

I'll have a think if this can be solved by moving the depth tracking logic out into allow_rasterization as well.

The original code keep starts and stops matched by the symmetry around draw. My implementation preserves this by moving the flushing to a draw_image() into its own function, that can be called often and only does work when needed.

@anntzer
Copy link
Contributor

anntzer commented Apr 23, 2020

Moving out depth tracking seems good to me. (There is only one Figure class but multiple backends, not all of which (e.g., mplcairo) rely on MixedModeRenderer, so moving out depth tracking avoids having to reimplement that logic in each backend.)

@samtygier
Copy link
Contributor Author

Latest version doing the raster depth tracking in allow_rasterization. I think this is a simplification of the original code, because the start and stop functions now actually do what they say without needing internal checks.

I've added an API change note.

I tried again with mplcairo, but it seems that GraphicsContextRendererCairo is not picking up the new attributes from RendererBase.__init__(). I guess this would need fixing in the cpp class? I have noted what is required in the API change note.

@samtygier samtygier marked this pull request as ready for review April 26, 2020 16:10
@anntzer
Copy link
Contributor

anntzer commented Apr 26, 2020

That's fine, I can handle the fix in mplcairo (which is necessary due to not so nice things done in the constructor).
I think the only question left is whether we want this behavior to be togglable. I personally think it's fine to just not bother until someone actually asks for non-merged rasterizations.

@samtygier
Copy link
Contributor Author

That's fine, I can handle the fix in mplcairo (which is necessary due to not so nice things done in the constructor).

Glad that's fixable. I guess you'd want this held of until 3.4 to avoid compatibility issues in the short term.

I think the only question left is whether we want this behavior to be togglable. I personally think it's fine to just not bother until someone actually asks for non-merged rasterizations.

The only reason to turn it off that I have thought of is if you want to attach javascript to elements in the SVG output. I think that is currently hard to do when rasterization is on because it effects the id attribute, #2816 . One solution could be that if the gid is set for an element then it individually rasterzed and gets the id passed through.

Another option is to use the Figure.suppressComposite attribute.

@anntzer
Copy link
Contributor

anntzer commented Apr 26, 2020

Suppressing grouping both for figures with suppressComposite = True and for artists with a set gid certainly seems reasonable.
I can fix mplcairo whenever, that shouldn't hold up this PR; if it's ready for 3.3 that's good for me.

@@ -34,7 +34,17 @@ def allow_rasterization(draw):
def draw_wrapper(artist, renderer, *args, **kwargs):
try:
if artist.get_rasterized():
renderer.start_rasterizing()
if renderer._raster_depth == 0 and not renderer._rasterizing:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is ok to try and monkey-patch these attributes on if they don't exist.

It is a little bit dirty, but greatly reduces the chances we break anyone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really work if the renderer class has no __dict__ (given that renderer classes can easily be defined in extension modules this is not just a hypothetical).
I think saying "you have to inherit from RendererBase" (or are responsible for tracking changes upstream) is the reasonable way to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new requirement for those attributes is noted in the API changes. Should I add the text about inheriting from RenderBase there? or somewhere else? I would think that is it implicit that if a 3rd party replaces a class without inheriting from the original that they need to track upstream changes.

In #17159 (comment) you say that mplcairo can handle the change.

@tacaswell
Copy link
Member

I think this is close, thank you @samtygier !

I'm not willing to block merging over either of my comments, but would like them to at least be considered and intentionally dismissed ;)

@samtygier
Copy link
Contributor Author

New version. Staying with the monkey patched _raster_depth and _rasterizing seems to work, and to be easy to handle in mplcairo.

I've moved the finalisation into a new decorator finalize_rasterization that wraps Figure.draw. So no need for to add a finalize method to the renderer.

It now uses suppressComposite to prevent merging. Doing it by gid does not make much sense until #2816 is fixed, which I'll have a go at.

Copy link
Contributor

@anntzer anntzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one API that needs to be private (I think), otherwise looks good.

samtygier added 3 commits May 3, 2020 17:15
In vector output it is possible to flag artists to be rasterized. In
many cases with multiple rasterized objects there can be significant
file size savings by combining the rendered bitmaps into a single
bitmap.

This is achieved by moving the depth tracking logic from
start_rasterizing() and stop_rasterizing() functions into the
allow_rasterization() wrapper. This allows delaying the call to
stop_rasterizing() until we are about to draw an non rasterized artist.
The outer draw method, i.e. in Figure must be wraped with
_finalize_rasterization() to ensure the that rasterization is completed.

Figure.suppressComposite can be used to prevent merging.

This fixes matplotlib#17149
Test that rasterization does not significantly change output.

Tests that partial rasterization does not effect draw ordering.

Tests that right number of <image> elements appear in SVG output, i.e.
bitmaps are merged when appropriate.
Document user and api changes.
@samtygier
Copy link
Contributor Author

I don't understand why the codecov/project/tests check is failing. Most of the changes it lists seem unrelated to this PR.

@jklymak
Copy link
Member

jklymak commented Jun 4, 2020

@tacaswell, as far as I can see this can be merged, but you still had some thoughts....

Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems great to me.

@jklymak
Copy link
Member

jklymak commented Jul 14, 2020

I'll merge - if @tacaswell has further thoughts we can open a new PR. There is no need for this to languish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rasterization creates multiple bitmap elements and large file sizes
4 participants