Skip to content

TST: Calculate RMS and diff image in C++ #29102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

QuLogic
Copy link
Member

@QuLogic QuLogic commented Nov 8, 2024

PR summary

The current implementation is not slow, but uses a lot of memory per image.

In compare_images, we have:

  • one actual and one expected image as uint8 (2×image)
  • both converted to int16 (though original is thrown away) (4×)

which adds up to 4× the image allocated in this function.

Then it calls calculate_rms, which has:

  • a difference between them as int16 (2×)
  • the difference cast to 64-bit float (8×)
  • the square of the difference as 64-bit float (though possibly the original difference was thrown away) (8×)

which at its peak has 16× the image allocated in parallel.

If the RMS is over the desired tolerance, then save_diff_image is called, which:

  • loads the actual and expected images again as uint8 (2× image)
  • converts both to 64-bit float (throwing away the original) (16×)
  • calculates the difference (8×)
  • calculates the absolute value (8×)
  • multiples that by 10 (in-place, so no allocation)
  • clips to 0-255 (8×)
  • casts to uint8 (1×)

which at peak uses 32× the image.

So at their peak, compare_imagescalculate_rms will have 20× the image allocated, and then compare_imagessave_diff_image will have 36× the image allocated. This is generally not a problem, but on resource-constrained places like WASM, it can sometimes run out of memory just in calculate_rms.

This implementation in C++ always allocates the diff image, even when not needed, but doesn't have all the temporaries, so it's a maximum of 3× the image size (plus a few scalar temporaries).

PR checklist

The current implementation is not slow, but uses a lot of memory per
image.

In `compare_images`, we have:

- one actual and one expected image as uint8 (2×image)
- both converted to int16 (though original is thrown away) (4×)

which adds up to 4× the image allocated in this function.

Then it calls `calculate_rms`, which has:

- a difference between them as int16 (2×)
- the difference cast to 64-bit float (8×)
- the square of the difference as 64-bit float (though possibly the
  original difference was thrown away) (8×)

which at its peak has 16× the image allocated in parallel.

If the RMS is over the desired tolerance, then `save_diff_image` is
called, which:

- loads the actual and expected images _again_ as uint8 (2× image)
- converts both to 64-bit float (throwing away the original) (16×)
- calculates the difference (8×)
- calculates the absolute value (8×)
- multiples that by 10 (in-place, so no allocation)
- clips to 0-255 (8×)
- casts to uint8 (1×)

which at peak uses 32× the image.

So at their peak, `compare_images`→`calculate_rms` will have 20× the
image allocated, and then `compare_images`→`save_diff_image` will have
36× the image allocated. This is generally not a problem, but on
resource-constrained places like WASM, it can sometimes run out of
memory just in `calculate_rms`.

This implementation in C++ always allocates the diff image, even when
not needed, but doesn't have all the temporaries, so it's a maximum of
3× the image size (plus a few scalar temporaries).
@QuLogic
Copy link
Member Author

QuLogic commented Jun 4, 2025

So I no longer have any memory-based skips on the PR adding WASM, but maybe we still want to do this to save memory in general?

@oscargus
Copy link
Member

oscargus commented Jun 5, 2025

This seems to make sense!

Should we also use this in compare_rms? Or deprecate that?

Comment on lines 400 to +401
PNG via the `.converter` dictionary. The underlying RMS is calculated
with the `.calculate_rms` function.
in a similar way to the `.calculate_rms` function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what's important here is how these methods differ? (what's the takeaway supposed to be here?)

Copy link
Member Author

@QuLogic QuLogic Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason I wrote it vaguely is that I didn't want people to think they could monkeypatch calculate_rms and expect the image_comparison decorator / compare_images to use it. The algorithm is otherwise the same, I think.

Comment on lines +210 to +228
if (expected_image.ndim() != 3) {
auto exceptions = py::module_::import("matplotlib.testing.exceptions");
auto ImageComparisonFailure = exceptions.attr("ImageComparisonFailure");
py::set_error(
ImageComparisonFailure,
"Expected image must be 3-dimensional, but is {ndim}-dimensional"_s.format(
"ndim"_a=expected_image.ndim()));
throw py::error_already_set();
}

if (actual_image.ndim() != 3) {
auto exceptions = py::module_::import("matplotlib.testing.exceptions");
auto ImageComparisonFailure = exceptions.attr("ImageComparisonFailure");
py::set_error(
ImageComparisonFailure,
"Actual image must be 3-dimensional, but is {ndim}-dimensional"_s.format(
"ndim"_a=actual_image.ndim()));
throw py::error_already_set();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be done in a loop since it's the same test/error message?

for (auto i = 0; i < height; i++) {
for (auto j = 0; j < width; j++) {
for (auto k = 0; k < depth; k++) {
auto pixel_diff = static_cast<double>(expected(i, j, k)) -
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it more efficient to case the individual values than the whole array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean here? There's no condition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo, meant cast


if (k != 3) { // Hard-code a fully solid alpha channel by omitting it.
diff(i, j, k) = static_cast<unsigned char>(std::clamp(
abs(pixel_diff) * 10, // Expand differences in luminance domain.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you only doing this is for rgba?

Copy link
Member Author

@QuLogic QuLogic Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alpha channel is ignored, just as with compare_images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants