-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Reset flags when Axes are removed. Array might now be 1D, or removed axe... #420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…axes of size 1 destroy contiguousity.
I guess the fail is unrelated...? Should I add a commit here in the same branch if I am unsure if it makes sense? I would suggest maybe:
Which as far as I see, will retain contiguousity if arrays are C-contiguous for:
It will work also for F-contiguous arrays, however reduce.c will always destroy F-contiguousity if they have a size one dimension (keepdims does not matter). This is because the reduce result users PyArray_CreateSortedStridePerm which special cases one sized dimensions. I am not sure if this creation of unnecessary uncontiguous (and 0 stride) arrays is an issue or not. |
@certik, in case you wonder, I think the commits fix the invalid flags issue. The rest I wrote here has nothing to do with invalid flags, just wondering if the behaviour shouldn't be improved. |
The patch as it stands seems pretty straightforward; hard to see how it could be harmful. As far as the strides discussion goes, wouldn't it make more sense to just fix the contiguity-checking code so that it can actually recognize that these contiguous arrays are contiguous? There's nothing wrong with having a 0 stride (or any stride value for that matter) on a length 1 dimension. |
@njsmith I tried that and it created errors and segfaults, though maybe I did something wrong, but it seemed like there maybe some code that expects the strides to add up correctly even when they are size 1-dims. Alternatively maybe it corrupts code if ndim > 1 arrays can be both C and F contiguous. So I think someone who knows this strides stuff would have to check if ignoring 1-sized dims for contiguousity is possible and/or where there would need to be changes to make it possible. |
@seberg or someone else: I've managed to confuse myself about the status of this PR, and I guess I'm not the only one since it's been sitting idle for so long now :-). Is it ready to commit? Does it close any outstanding bugs? And is something like the seberg/cflags branch necessary to close any outstanding bugs, or is it just a proof of concept for a possible cleanup? |
@njsmith this is ready to commit IMO, it fixes issue #387 and also flags being wrong with The cflags stuff is just a proof of concept of being able to allow such things as a 1x100x1 array to be both C and F contiguous (even though its not 1D) this has nothing to do with bugs, its more or less unrelated (and I think something that should likely wait for a more major release, since maybe someone like even numpy internals might make assumptios that would be broken by it). |
Okay, great, I'll merge it. On the cflags stuff: I'd encourage you to go ahead and propose it, it looks like a simple bug-fix to me. C and F contiguous are what they are, our flags should represent that accurately. It's possible it might break something somewhere, but every change risks that, and the end result would just be that we get a chance to discover and clean up some broken code. (If the tests pass, then that dramatically limits the scope of the possible breakage.) And realistically there is no such thing as a "more major release", numpy 2.0 is not going to happen any time soon, or possibly ever. |
Reset flags when Axes are removed. Array might now be 1D, or removed axe...
@njsmith sorry, but can you give a small pointer here? The cflags stuff needs a bit cleanup, and then looking through numpy code to fix that the more general flags are actually used everywhere. Otherwise it seems to work, just might take a bit to change things everywhere carefully. So, after cleaning up, I will just do a PR so everyone can comment and decide if this is a good idea or not? I am also a bit confused about bug fixes vs. real changes. I would now just do PR for both and if they conflict each other simply fix it later, or is there better way? |
@seberg: Yes, I'm suggesting that you do a PR with the flag calculation fixes (together with any other changes you discover are necessary). I'm not sure what you're asking in your second paragraph. PR's are the right way to submit any kind of change, whether it's to fix a bug or to clean up some ugly code or to add a fancy new feature... whatever. The ideal PR is minimal and self-contained. "Minimal" means that if you have two unrelated changes, they should go in two PRs. "Self-contained" basically means that each PR should always leave master in a releasable state -- so we don't want one PR for the new code and a different one for the tests, or one PR that breaks something and a second PR that fixes it again. Maybe that helps? If you have several conceptually unrelated changes that overlap and will create git conflicts, then there are a few options, you can submit them one at a time, or submit them all at once to let people start reviewing them and then fix up the conflicts later. Any of these can work, git makes it pretty easy to sort things out after the fact. |
This is related to Issue #387, and fixes half of it, the other half is that the strides are still 0 if keepdims=True is used, which means that the flags are unnecessarily False. For this, the strides have to be replaced if they are 0 (similar to what happens in shape.c -> PyArray_Newshape, with the code also related to Issue #380). I will probably add commits here related to this as well, or maybe someone who knows reduction.c well can see if those 0 strides cannot be set to something better easily?
One test for it would be (not sure where to put it, in test_regression?):