Skip to content

Agg: Remove 16-bit limits #28904

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 8, 2024
Merged

Conversation

QuLogic
Copy link
Member

@QuLogic QuLogic commented Sep 28, 2024

PR summary

When rendering objects, Agg rasterizes them into scan line objects (an x/y point, horizontal length, and colour), and the renderer class writes those to the pixels in the final buffer. Though we have determined that Agg buffers cannot be larger than 2**16, the scan line classes that we use contain 16-bit signed integers internally, cutting off positive values at half the maximum.

Since the renderer uses 32-bit integers, this can cause odd behaviour where any horizontal span that starts before 2**15 (such as a horizontal spine) is rendered correctly even if it cross that point, but those that start after (such as a vertical spine or any portion of an angled line) end up clipped. For example, see how the spines and lines break in #28893.

image

A similar problem occurs for resampled images, which also uses Agg scanlines internally. See the breakage in #26368 for an example.

image

The example in that issue also contains horizontal spines that are wider than 2**15, which also exhibit strange behaviour.

image

By moving to 32-bit scan lines, positions and lengths of the lines will no longer be clipped, and this fixes rendering on very large figures.

Now, this does mean that internal bookkeeping will use double the memory. I have not yet benchmarked the memory usage to see what effect that would have. But I have implemented this two ways to show how it might work:

  1. In _image_resample.h, the larger scan lines are used if the output image is >=2**15 in either dimension.
  2. In _backend_agg.h, the larger scan lines are always used. While this is unfortunately the more common case, it's also the harder one to conditionalize as the type is in the class, and the template is used everywhere.

That being said, having the scan lines be conditional means that double the code is emitted instead. For the Agg backend, it increases by 4096 bytes (after stripping debug info), whereas for the image extension, it increased by 65536 bytes vs no increase for just using the larger one.

However, given the breakage in #23826, it may be necessary to always use the 32-bit scan lines.

Fixes #23826
Fixes #26368
Fixes #28893

PR checklist

@tacaswell
Copy link
Member

Even if we double the book keeping cost, I would expect it to be small in absolute terms? Our output is small (on the scale of the data our users feed us, on the scale of a small of the footprint of a typical modern program and available memory on a typical machine).

@timhoffm
Copy link
Member

... and given AGG is basically unchanged since 2006 and typical RAM sizes back then were about 512MB, I believe we can afford a worst-case factor of 2 increase now after 20 years.

@QuLogic
Copy link
Member Author

QuLogic commented Oct 1, 2024

I ran tests with pytest-memray (starting with #28856 to avoid the leak), and 99% of the tests didn't change peak memory usage. About 16 were within 50K, 3 more within 100K, and 17 over 1M bytes difference (both above and below the original). Running those 17 all over again, a lot were close to zero difference the second time over. I spot checked the remaining ones and the difference was in NumPy random state somewhere. So I think almost all the differences in peak memory usage were just global state and/or due to caching from other tests running earlier depending on parallel job distribution.

@@ -699,8 +702,8 @@ static void get_filter(const resample_params_t &params,
}


template<typename color_type>
void resample(
template<typename color_type, bool is_large = false>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be a better name?

Suggested change
template<typename color_type, bool is_large = false>
template<typename color_type, bool use_32bit = false>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there's not really been much change to memory usage, I'm going to drop the template change and just always use the larger scan lines. That'll save us some code space, and template complications.

@tacaswell tacaswell added this to the v3.10.0 milestone Oct 1, 2024
When rendering objects, Agg rasterizes them into scan line objects (an
x/y point, horizontal length, and colour), and the renderer class writes
those to the pixels in the final buffer. Though we have determined that
Agg buffers cannot be larger than 2**16, the scan line classes that we
use contain 16-bit _signed_ integers internally, cutting off positive
values at half the maximum.

Since the renderer uses 32-bit integers, this can cause odd behaviour
where any horizontal span that _starts_ before 2**15 (such as a
horizontal spine) is rendered correctly even if it cross that point,
but those that start after (such as a vertical spine or any portion of
an angled line) end up clipped. For example, see how the spines and
lines break in matplotlib#28893.

A similar problem occurs for resampled images, which also uses Agg
scanlines internally. See the breakage in matplotlib#26368 for an example. The
example in that issue also contains horizontal spines that are wider
than 2**15, which also exhibit strange behaviour.

By moving to 32-bit scan lines, positions and lengths of the lines will
no longer be clipped, and this fixes rendering on very large figures.

Fixes matplotlib#23826
Fixes matplotlib#26368
Fixes matplotlib#28893
For very large Figures with a single Axes, the horizontal spines may be
long enough to need splitting into separate lines. For large enough
coordinates, the `x1+x2` calculation may overflow, flipping the sign
bit, causing the coordinates to become negative. This causes an infinite
loop as some internal condition is never met.

By dividing `x1`/`x2` by 2 first, we avoid the overflow, and can
calculate the split point correctly.
@QuLogic QuLogic force-pushed the remove-16bit-limits branch from ff86c19 to ace1cc2 Compare October 2, 2024 08:07
@QuLogic QuLogic changed the title Agg: Use 32-bit scan line classes Agg: Remove 16-bit limits Oct 2, 2024
@QuLogic
Copy link
Member Author

QuLogic commented Oct 2, 2024

After some additional checks, I was able to determine that the image dimension limit is also outdated, and we can expand it to match Agg's coordinates at 2**23 pixels. Also after some searching, I believe I've essentially re-created #3451 which was closed due to the CXX removal.

With these changes, I am able to run:

import matplotlib.pyplot as plt
import numpy as np

t = np.arange(0, 3000, 0.01)
s1 = np.sin(2 * np.pi * 20 * t)

fig, ax = plt.subplots(figsize=(83886.07, .5), layout='constrained')
ax.plo(t, s1)
ax.set(xlim=(0, 2), xticks=[], yticks=[])
ax.grid(True)

fig.savefig('lines.png')

and produce an 8388607x50-pixel (i.e., 2**23-1) image with lines that are not broken in any visible way.

Shortening the example from #26368:

import numpy as np
import matplotlib.pyplot as plt

image = np.zeros((100000, 800, 3))
linspace = np.linspace(0, 1, 800)
image[:, :, 0] = 1 - linspace  # Red channel
image[:, :, 2] = linspace  # Blue channel

plt.figure(figsize=(10000, 0.5), dpi=100)
ax = plt.axes([0, 0, 1, 1], frameon=False)
ax.set_xticks([])
ax.set_yticks([])

ax.imshow(image, aspect="auto")

plt.savefig("gradient_image.png", pad_inches=0)

which produces a 1Mx50-pixel image of a gradient. Unfortunately, the resampling step takes about 16s per 100000 pixels, or, nearly 3 minutes for 1M pixels and 5-7GiB of RAM. So I did not test out the full 8M pixel image.

I also read issues that mentioned the 32768-pixel limit as being something about sqrt(max int), i.e., total number of pixels below max int, though I'm not sure why. I believe we have changed most multiplications to upcast to AGG_INT64, and (2**23-1)**2 should be well within that. With RGBA, it's also approximately 256 TiB, so I have neither the RAM nor the time to test that.

@QuLogic QuLogic marked this pull request as ready for review October 2, 2024 08:27
With 32-bit scan lines, we are able to take advantage of the full 32-bit
range of coordinates. However, as noted in
`extern/agg24-svn/include/agg_rasterizer_scanline_aa.h`, Agg uses
24.8-bit fixed point coordinates. With one bit taken for the sign, the
maximum coordinate is now 2**23.
@QuLogic QuLogic force-pushed the remove-16bit-limits branch from ace1cc2 to 9259989 Compare October 2, 2024 08:32
@tacaswell
Copy link
Member

Does this need a rebase?

@QuLogic
Copy link
Member Author

QuLogic commented Oct 8, 2024

It could to fix the CI, but from other PRs, it seems that macOS/homebrew broke again for other reasons.

@greglucas greglucas merged commit 9b33fbd into matplotlib:main Oct 8, 2024
38 of 44 checks passed
@QuLogic QuLogic deleted the remove-16bit-limits branch October 8, 2024 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants