TST: Calculate RMS and diff image in C++ #29102

QuLogic · 2024-11-08T09:26:26Z

PR summary

The current implementation is not slow, but uses a lot of memory per image.

In compare_images, we have:

one actual and one expected image as uint8 (2×image)
both converted to int16 (though original is thrown away) (4×)

which adds up to 4× the image allocated in this function.

Then it calls calculate_rms, which has:

a difference between them as int16 (2×)
the difference cast to 64-bit float (8×)
the square of the difference as 64-bit float (though possibly the original difference was thrown away) (8×)

which at its peak has 16× the image allocated in parallel.

If the RMS is over the desired tolerance, then save_diff_image is called, which:

loads the actual and expected images again as uint8 (2× image)
converts both to 64-bit float (throwing away the original) (16×)
calculates the difference (8×)
calculates the absolute value (8×)
multiples that by 10 (in-place, so no allocation)
clips to 0-255 (8×)
casts to uint8 (1×)

which at peak uses 32× the image.

So at their peak, compare_images→calculate_rms will have 20× the image allocated, and then compare_images→save_diff_image will have 36× the image allocated. This is generally not a problem, but on resource-constrained places like WASM, it can sometimes run out of memory just in calculate_rms.

This implementation in C++ always allocates the diff image, even when not needed, but doesn't have all the temporaries, so it's a maximum of 3× the image size (plus a few scalar temporaries).

PR checklist

[n/a] "closes #0000" is in the body of the PR description to link the related issue
new and changed code is tested
[n/a] Plotting related features are demonstrated in an example
[n/a] New Features and API Changes are noted with a directive and release note
[n/a] Documentation complies with general and docstring guidelines

QuLogic · 2025-06-04T22:10:29Z

So I no longer have any memory-based skips on the PR adding WASM, but maybe we still want to do this to save memory in general?

oscargus · 2025-06-05T10:28:06Z

This seems to make sense!

Should we also use this in compare_rms? Or deprecate that?

lib/matplotlib/testing/compare.py

src/_image_wrapper.cpp

QuLogic · 2025-06-18T05:21:42Z

Put together a small benchmark:

import timeit
import tracemalloc

import numpy as np
from PIL import Image

for N in [100, 555, 1000, 2000]:
    N = int(N)
    for name in ['expected', 'actual']:
        image = (np.random.random((N, N, 3)) * 255).astype(np.uint8)
        Image.fromarray(image).save(f'{name}{N}.png')
        del image

    tracemalloc.start()
    timer = timeit.Timer(f'compare_images("expected{N}.png", "actual{N}.png", 0)',
                         setup='from matplotlib.testing.compare import compare_images')
    runtime = timer.autorange()
    print(N, runtime[1] / runtime[0], tracemalloc.get_traced_memory()[1])
    tracemalloc.stop()

which prints runtime and peak memory up to 2000x2000. I think our largest test image is probably around 1800x900, and 555x555 is approximately equal to 640x480 pixels (the default figure size.)

Time-wise, this probably doesn't work out to a lot, maybe about 80% of before on the default figure size, and maybe 55-60% at the larger end:

But memory-wise, we're at 10% at the default figure size and even less for larger figures:

src/_image_wrapper.cpp

The current implementation is not slow, but uses a lot of memory per image. In `compare_images`, we have: - one actual and one expected image as uint8 (2×image) - both converted to int16 (though original is thrown away) (4×) which adds up to 4× the image allocated in this function. Then it calls `calculate_rms`, which has: - a difference between them as int16 (2×) - the difference cast to 64-bit float (8×) - the square of the difference as 64-bit float (though possibly the original difference was thrown away) (8×) which at its peak has 16× the image allocated in parallel. If the RMS is over the desired tolerance, then `save_diff_image` is called, which: - loads the actual and expected images _again_ as uint8 (2× image) - converts both to 64-bit float (throwing away the original) (16×) - calculates the difference (8×) - calculates the absolute value (8×) - multiples that by 10 (in-place, so no allocation) - clips to 0-255 (8×) - casts to uint8 (1×) which at peak uses 32× the image. So at their peak, `compare_images`→`calculate_rms` will have 20× the image allocated, and then `compare_images`→`save_diff_image` will have 36× the image allocated. This is generally not a problem, but on resource-constrained places like WASM, it can sometimes run out of memory just in `calculate_rms`. This implementation in C++ always allocates the diff image, even when not needed, but doesn't have all the temporaries, so it's a maximum of 3× the image size (plus a few scalar temporaries).

story645

Not merging b/c you keep pushing, but you're welcome to merge when you're done tweaking. The memory improvements look awesome!

QuLogic · 2025-06-19T18:47:23Z

That was just fixing stubtest; it should be good now.

QuLogic added topic: testing Performance labels Nov 8, 2024

github-actions bot added the topic: images label Nov 8, 2024

QuLogic mentioned this pull request Nov 8, 2024

Add wasm CI #29093

Open

4 tasks

github-actions bot added the status: needs rebase label Jan 4, 2025

oscargus approved these changes Jun 5, 2025

View reviewed changes

story645 reviewed Jun 13, 2025

View reviewed changes

lib/matplotlib/testing/compare.py Outdated Show resolved Hide resolved

src/_image_wrapper.cpp Outdated Show resolved Hide resolved

src/_image_wrapper.cpp Show resolved Hide resolved

src/_image_wrapper.cpp Show resolved Hide resolved

QuLogic force-pushed the cpp-rms branch from 62a96ff to 3262fc8 Compare June 18, 2025 00:09

github-actions bot removed the status: needs rebase label Jun 18, 2025

story645 reviewed Jun 18, 2025

View reviewed changes

src/_image_wrapper.cpp Show resolved Hide resolved

QuLogic force-pushed the cpp-rms branch from 3262fc8 to 80aa299 Compare June 18, 2025 20:23

QuLogic force-pushed the cpp-rms branch from 80aa299 to b13e31a Compare June 18, 2025 20:44

story645 approved these changes Jun 18, 2025

View reviewed changes

story645 merged commit e325459 into matplotlib:main Jun 19, 2025
39 of 41 checks passed

QuLogic deleted the cpp-rms branch June 19, 2025 19:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TST: Calculate RMS and diff image in C++ #29102

TST: Calculate RMS and diff image in C++ #29102

Uh oh!

QuLogic commented Nov 8, 2024

Uh oh!

QuLogic commented Jun 4, 2025

Uh oh!

oscargus commented Jun 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

QuLogic commented Jun 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

story645 left a comment

Uh oh!

QuLogic commented Jun 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TST: Calculate RMS and diff image in C++ #29102

TST: Calculate RMS and diff image in C++ #29102

Uh oh!

Conversation

QuLogic commented Nov 8, 2024

PR summary

PR checklist

Uh oh!

QuLogic commented Jun 4, 2025

Uh oh!

oscargus commented Jun 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

QuLogic commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

story645 left a comment

Choose a reason for hiding this comment

Uh oh!

QuLogic commented Jun 19, 2025

Uh oh!

Uh oh!

Uh oh!

QuLogic commented Jun 18, 2025 •

edited

Loading