Skip to content

Throw errors when indexing into empty array_view objects #5247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

jkseppan
Copy link
Member

A lower-level complement to #5246, raises an exception when indexing into an array_view that includes no data. Should catch issues like #5185 while not adding a big performance penalty in the normal case of nonempty arrays, but of course doesn't catch out-of-bounds errors in that case. I didn't measure the performance impact, but running the tests felt subjectively as fast as usual.

Should catch issues like matplotlib#5185 while not adding a big perf penalty
to mark empty array views
@jkseppan
Copy link
Member Author

The tests are failing because of actual problems in count_bboxes_overlapping_bbox, for which there are some suggested fixes in other PRs.

@mdboom
Copy link
Member

mdboom commented Oct 15, 2015

I think if the array is declared as empty, then size() should return 0. That should fix the issues with count_bboxes_overlapping_bbox, I think. Otherwise the onus is still on every function to determine whether an array is empty or not (by calling empty), rather than just using size.

@jkseppan
Copy link
Member Author

The way you formulated your comment makes me think you misread the code: I didn't change the definition of the empty() method, I just set m_data to NULL in cases where the array contains no data. I think you (or someone else?) commented on some other issue that it would be closer to the C++ STL convention for size() to return the length of the first axis. I think defining size() to be zero when there is really no data is probably reasonable even if it is strictly speaking incorrect for arrays that contain a nonzero number of zero-size items.

@mdboom
Copy link
Member

mdboom commented Oct 15, 2015

@jkseppan: I agree with all that. Sorry I wasn't clear. By "declare as empty" I meant "setting m_data to NULL" (like you guessed).

@jkseppan
Copy link
Member Author

About performance: I figured the following would be a good microbenchmark, since from_color_array in _image.h does a lot of indexing. The results I got on my system are 706 ms for 333136b (the version of master this was branched off of), 629 ms for 8698b5e (the latest commit on this branch). That is, it looks like this was 10% faster. Can anyone replicate this?

import matplotlib.image as mimg
import numpy as np
data = np.ones((5000, 5000, 3), dtype=np.uint8)
%timeit mimg.fromarray(data)

I'd argue that it's plausible that the checks should add very little delay, since the code has to fetch self->m_data from memory in any case, and the comparison to NULL is cheap especially with the compiler hint about which result is more likely. Since the indexing functions are inline and m_data is not modified, an optimizing compiler may even be able to hoist the check out of the loop and do it only once. But I can't really explain why the checks should make the code faster.

@mdboom
Copy link
Member

mdboom commented Oct 15, 2015

On a Mac 10.10, Python 3.4.3, I get 475ms on master, 883ms on this branch. I did have to make sure to do a clean build to get these results, as the changes are in header files. (At first, I forgot to clean and got results within 10% as you describe).

@jkseppan
Copy link
Member Author

Right. I still couldn't replicate that, but it seems that properly cleaning build products is nontrivial. In any case, I made PR #5264 with just the size() fix.

@jkseppan
Copy link
Member Author

Closing in favor of #5274.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants