Skip to content

BUG: Don't fail when lexsorting some empty arrays. #14240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 22, 2019

Conversation

batterseapower
Copy link
Contributor

np.lexsort would fail on empty arrays with non-standard strides as a result of the changes in #13691. While fixing this I tried to also solve a couple of other problems/leaks that could be triggered in low-memory situations.

Fixes #14228

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good, also nice additional error path cleanups. I noticed that there are some npy_free_cache calls below. I think it would actually make sense to endorse that and use npy_alloc_cache, but mixing the two smells like an error.

@batterseapower
Copy link
Contributor Author

Thanks, looks good, also nice additional error path cleanups. I noticed that there are some npy_free_cache calls below. I think it would actually make sense to endorse that and use npy_alloc_cache, but mixing the two smells like an error.

I also thought this was a bit suspicious. I've switched them to use PyDataMem_FREE just like the deallocations in the happy path already do.

@seberg seberg self-requested a review August 9, 2019 18:57
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine, just small things to fix up. I am still a bit torn about the special case, it might be that the sorting code is not safe for 1 (or even 0) elements in some cases, although this function likely is. If we can just remove the whole part that would be nicer, since it is not speed relevant for sure.

assert xs.strides == (8,)
assert np.lexsort((xs,)).shape[0] == 0 # Works

xs.strides = (16,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for now, so I will not press to change it. I do not like the stride manipulation as such though. What would be safest may be to use xs = np.array([1, 2, 3], dtype='i8') and then multiple slicing as in xs[:0] and xs[::2][:0] (you can assert the strides after that).

Or maybe, just for the x.strides = ... part, do x = xs[::2] instead (it should work) and just assert in case the behaviour ever changes.

Copy link
Contributor Author

@batterseapower batterseapower Aug 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's wrong with changing the strides like this? Numpy isn't going to change semantics such that this would break because it would break too much user code.

I think this way of doing it is easier to read than your suggestion because you can literally read off what the strides of the new xs will be, which is important information for reasoning about how the lexsort function will execute. With your suggestion the strides, which are what you really care about, are only updated in an indirect way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, yes, we should, and eventually probably will deprecate such usage, because that usage is broken in many ways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that an empty array should rightfully be able to have any strides at all set on it, but maybe this breaks something elsewhere in numpy?

Anyway, I don't really fancy changing this if you aren't going to require it, because I do think it is the clearest presentation of the test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the other approach is to use np.lib.stride_tricks.as_strided, but OK. Whenever we do the deprecation we can just fixup the test. Of course it is safe for empty arrays or arrays with a single element, but that is about the only case where it is sane.

@charris
Copy link
Member

charris commented Aug 19, 2019

@seberg Happy with this?

@charris charris added this to the 1.17.1 release milestone Aug 19, 2019
@seberg seberg self-requested a review August 19, 2019 23:14
@charris charris changed the title BUG: don't fail when lexsorting some empty arrays (closes #14228) BUG: don't fail when lexsorting some empty arrays. Aug 20, 2019
@seberg
Copy link
Member

seberg commented Aug 22, 2019

Added a commit to prefer npy_cache_alloc, since that is also what is used everywhere else in the sorting code (and it was inconsistent before after all). Will merge soon if tests pass.

@seberg seberg force-pushed the lexsort branch 2 times, most recently from 5c42b99 to 00069b0 Compare August 22, 2019 16:15
@charris charris changed the title BUG: don't fail when lexsorting some empty arrays. BUG: Don't fail when lexsorting some empty arrays. Aug 22, 2019
@charris
Copy link
Member

charris commented Aug 22, 2019

The Shippable failure looks spurious, restarted.

@seberg
Copy link
Member

seberg commented Aug 22, 2019

OK, added a guard on swaps as well instead, it got a bit ugly (with thread safety changes) for backporting, merging.

I would just merge it as is. Seems the shippable failed again?

@charris
Copy link
Member

charris commented Aug 22, 2019

close/reopen.

@charris charris closed this Aug 22, 2019
@charris charris reopened this Aug 22, 2019
@charris
Copy link
Member

charris commented Aug 22, 2019

Close/reopen to retrigger shippable. I think the error is spurious, we had a streak of the same a while back, but it is nice when everything passes.

@charris charris closed this Aug 22, 2019
@charris charris reopened this Aug 22, 2019
@seberg
Copy link
Member

seberg commented Aug 22, 2019

OK, let me put this in. Thanks @batterseapower.

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Aug 23, 2019
@charris charris removed this from the 1.17.1 release milestone Aug 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: np.lexsort MemoryError on empty array with non-standard strides
4 participants