Skip to content

BUG: Minimize iterator size in gufunc allocation of output arrays #4458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jaimefrio
Copy link
Member

Closes #4442 (or at least mitigates it)

Prior to this PR, the iterator created by a gufunc had all core dimensions
of all output arguments appended to the broadcasted non-core dimensions.
When working with gufuncs with multiple return arguments, this could lead
to "iterator is too large" errors with moderately sized inputs.

This PR changes that mechanism, and now each distinct core-dimension is
appended only the maximum number of times that it comes up in an output
argument's signature.

As an example, take #4442, which was failing in a call to SVD with
signature (m,n)->(m,n),(n),(n,n). The iterator was created with shape
(m ,n, n, n, n) (all output arguments' core dimensions), and then the
op_axes for each of the three output parameters were set to
[0, 1, -1, -1, -1], [-1, -1, 0, -1, -1] and [-1, -1, -1, 0, 1]. After
this PR, in this same case the iterator has shape (m, n, n), and the
op_axes now are [0, 1, -1], [-1, 0, -1] and [-1, 0, 1].

The only concern is that a function with e.g. signature (a,b)->(b,a) would
have had its output created with an iterator of shape (b, a) and op_axes
set to [0, 1]. Now it will be created with shape (a, b) and op_axes set
to [1, 0]. Not sure if it will result in a different memory layout of the
allocated array.

Closes numpy#4442 (or at least mitigates it)

Prior to this PR, the iterator created by a gufunc had all core dimensions
of all output arguments appended to the broadcasted non-core dimensions.
When working with gufuncs with multiple return arguments, this would lead
to "iterator is too large" errors.

This PR changes that mechanism, and now each distinct core-dimension is
appended only the maximum number of times that it comes up in an output
argument's signature.

As an example, take numpy#4442, which was failing in a call to SVD with
signature `(m,n)->(m,n),(n),(n,n)`. The iterator was created with shape
`(m,n,n,n,n)` (all output arguments' core dimensions), and then the
`op_axes` for each of the three output parameters were set to
`[0, 1, -1, -1, -1]`, `[-1, -1, 0, -1, -1]` and `[-1, -1, -1, 0, 1]`. After
this PR, in this same case the iterator has shape `(m, n, n)`, and the
`op_axes` now are `[0, 1, -1]`, `[-1, 0, -1]` and `[-1, 0, 1]`.

The only concern is that a function with e.g. signature `(a,b)->(b,a)` would
have had its output created with an iterator of shape `(b, a)` and `op_axes`
set to `[0 ,1]`. Now it will be created with shape `(a, b)` and `op_axes` set
to `[1, 0]`. Not sure if it will result in a different memory layout of the
allocated array.
@jaimefrio
Copy link
Member Author

That seems to be the failure that always pops up these days. I don't think it has to do anything with any of my changes.

@seberg
Copy link
Member

seberg commented Mar 7, 2014

Yes, debug is broken on travis currently. I didn't read the stuff, but I understand it correctly that before a dimension was added for all output core dimension, while now you only add it as many times as the most equivalent ones occures (I bet the sentence doesn't make sense, but think you probably know what I mean ;)).

A possible hack should be to add a flag to the iterator something like NPYITER_INTERNAL_DELAY_SIZE_CHECK. I have this pull request open because about removing empty axes (ignoring the exact change there), we can recalculate the iterator size in that step. Adding such a flag feels like a hack, but it would remove the problem completly at least for now...

@seberg
Copy link
Member

seberg commented Mar 7, 2014

I just checked the linalg gufuncs... and it appears to me that SVD is the only one that should have this problem. But... calculating the SVD of a large matrix doesn't seem that absurd to me that I think we still might want to do a NPY_ITER_INTERNAL_DELAY_SIZE_CHECK hack or similar. Of course with this fix, it should be good enough on 64bits (before I am not even sure about that)...

@jaimefrio
Copy link
Member Author

Yes, I think you understand the idea, not sure if my description is well formed English either...

My biggest concern with something like NPYITER_INTERNAL_DELAY_SIZE_CHECK come from my lack of knowledge of the code: I wouldn't know when to trigger the size check, and I don't think it is a trivial question. It seems that you would have to have a flag in the NpyIter object, something like is_size_checked, and have many different functions checking it, and running whatever code is run now at instantiation time if the flag is not true (and then setting it to true). I don't know the iterator code well enough to figure out which functions can complete with a non-size-checked iterator, nor how much of the set-up of the iterator would have to be delayed.

Of course that I do not know how to go about it does not make it a bad idea! And a hack or not, it would solve the issue entirely. For SVD the iterator now has size m*n^4, and I am reducing it to m*n^2. So for a square matrix on a 32-bit system, it is basically increasing the maximum size from 2^(31/5) = 73 to 2^(31/3) = 1290. On a 64-bit system the values would be 6208 now and 2097152. So yes, it seems that this PR is not good enough for 32 bit systems, something else needs to be done there, but it would reduce the number of angry users.

@seberg
Copy link
Member

seberg commented Mar 7, 2014

Heh, I had missed the longer explanation ;), it is clear...

Well, there is a RemoveAxis step, which can fully recalculate the size (see also gh-3861 for the place though the size calculation should be stolen from the one setting the current error). So if we add a NPY_ITER_INTERNAL_NO_SIZE_CHECK or so, we can instead of giving an error in the first check, just set the size to -1 and (no matter what) set it to -1 in the RemoveAxis step if it is too large. The gufunc machinery will then have to check for a size of -1 manually, since for broadcasted operands it can still happen. (Which basically is what NO_SIZE_CHECK means after all :))

Just adding the INTERNAL because I am not sure if I like it enough to consider it public api...

@jaimefrio
Copy link
Member Author

Do you really need a new flag? I have tried to follow the use of NIT_ITERSIZE and NIT_ITEREND through the iterato construction, and I think you could have a special path whenever NPY_ITER_MULTI_INDEX is set and NPY_ITFLAG_BUFFER and NPY_ITFLAG_HASINDEX are not. This ensures that the size of the iterator is not required until you either try to get the iteration function with NpyIter_GetIterNext, or you remove an axis from the iterator with NpyIter_RemoveAxis, or you enable an external llop with NpyIter_EnableExternalLoop.

The idea would be to move the size calculation (and the check to see if NPY_ITFLAG_ONEITERATION can be set) to a separate function. If the above flags have the right values, set NIT_ITERSIZE to -1, and delay the proper computation of the size until either NpyIter_GetIterNext or NpyIter_EnableExternalLoop gets called. NpyIter_RemoveAxis would also have to be changed to not try to update the size if it is -1.

@seberg
Copy link
Member

seberg commented Mar 7, 2014

Great point. I forgot about GetIterNext and I think that really is allowed to have an error return. So that sounds good. So we can delay the check to there, at least when RemoveAxis might still be called.That sounds like it should be pretty simple, though I am not sure how ExternalLoop uses the size, but I bet it just recalculates it...

@juliantaylor
Copy link
Contributor

didn't follow everything, but as a minimal fix wouldn't it be possible to bump the itersize variable to 64 bit on all platforms?

@seberg
Copy link
Member

seberg commented Mar 7, 2014

@juliantaylor, it might be nice anyway for some einsums/reductions. The itersize is exposed, so I am not sure about compatibility (only through function access, so if we don't mind bogus results for things that currently don't work...). The underlying issue would only be mitigated, so if we assume nobody does svd's on huge arrys, we are fine...

Otherwise, we may have to swallow the bitter pill and actually consider backporting the larger change (it should not be huge, but...).

@jaimefrio do you have time to implement your suggestion (as simple as possible)? Otherwise I will probably do it tomorrow or in two days.

@jaimefrio
Copy link
Member Author

I thought it was going to be easier than it is, and have a not-yet-working implementation halfway through. I'll try to complete it tonight or over the weekend.

But even if I do get it to work, I am going to need very close supervision on this one: I am learning the iterator internals one compiler error at a time, so I have very little confidence that I am not overlooking some minor (or major) detail. I'll open a new PR as soon as I get it to work. When I do, feel free to grab it and complete it yourself if you think that is going to be faster/safer.

@charris
Copy link
Member

charris commented Mar 7, 2014

one compiler error at a time

:-)

@seberg
Copy link
Member

seberg commented Mar 7, 2014

Since I was surprised that it should be difficult, I had a stab at it. The result is here: seberg@d7ba7e7

May have missed something, but overall I would guess that it can't fail worse then current 1.8. which lacks any check ;). Is there anything bad with this solution Jaime?

@jaimefrio
Copy link
Member Author

@charris it is a much more accurate description of my workflow than I'd like to admit :__(

@seberg Ah, really nice! Your approach is much cleaner and elegant and concise and overall just plain better than the mess I was trying to put together. The iterator internals are adult material, and kids like myself shouldn't be playing with it!

At first I thought you may not be handling setting of NPY_ITFLAG_ONEITERATION, but I see that is handled, at least partially, by the code already existing in NpyIter_RemoveAxis. I don't fully understand the logic of when that flag has to be set, but it seems that NpyIter_RemoveAxis (see here) does less checks than NpyIter_AdvancedNew (see here), not sure if we may be missing a chance for optimization there. But it also would seem that it isn't going to be any worse than what we currently have for gufuncs.

@charris
Copy link
Member

charris commented Mar 10, 2014

Where do we sit with this? It would be good to have a solution for 1.8.1, either @jaimefrio or @seberg.

@jaimefrio
Copy link
Member Author

@seberg's other PR is the proper solution. This PR is just delaying the inevitable, someone will eventually come complaining again if this is all we do. The only concern is how to properly test the changes to the iterator. I think it is relatively safe, but I don't trust my judgement on this too much. So ideally let's do the other, I am keeping this open as a backup in case something nasty comes up in the other one.

@charris
Copy link
Member

charris commented Mar 12, 2014

@jaimefrio @seberg Sebastian's version has been merged, do you want to close this one?

@jaimefrio
Copy link
Member Author

Yes, probably not worth the extra code complication. After the changes to the iterator it would only help pushing errors like this one:

>>> np.linalg.svd(np.random.rand(*((1,)*28 + (3, 3))))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\WinPython-32bit-np-dev\python-2.7.5\lib\site-packages\numpy\linalg\linalg.py", line 1327, in svd
    u, s, vt = gufunc(a, signature=signature, extobj=extobj)
ValueError: too many dimensions for generalized ufunc svd_n_f

so that it would accept a couple more dimensions. Does not seem like a pressing need...

@jaimefrio jaimefrio closed this Mar 12, 2014
@jaimefrio jaimefrio deleted the gufunc-iter-alloc branch March 12, 2014 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Windows 32-bit Intel MKL segfault in pinv on 1.8.0 from canopy
4 participants