-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Bug in numpy.copy for views of structured arrays in NumPy 1.16.x #13299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree this is unexpected! Certainly does not happen for regular arrays where a view leads to gaps. cc @ahaldane - it may mean that in |
Hi @dmbelov , The issue here actually isn't in the What is tripping you up is that In the new docs (visible here) we discuss this change. In your case, I think the best solution is to use the function >>> import numpy.lib.recfunctions as rf
>>> x_i = np.array([(1, 2, 3)] * 10, dtype=[(n, 'f8') for n in ['a', 'b', 'c']])
>>> print(rf.repack_fields(x_i[['a', 'c']]).dtype)
[('a', '<f8'), ('c', '<f8')] note that |
@ahaldane - I now see that in a way this is not a regression, but I wonder still if the current behaviour is desired: it does feel like the copy should be maximally compact, just like for a strided view and not dissimilar to the fact that copy also changes the order (well, at least |
The main reason to keep the padding, in my mind, is for consistency.
If |
I do not understand the reason for keeping padding: one expects that the command |
I think your point about phrasing it about a copy behaving the same as the original is helpful, but I'm not sure I agree with the conclusion. Certainly, two dtype holding the same fields but with different offsets are not considered equal. On the other hand, two arrays with those dtypes and the same numbers are considered equal, and to me it seems more logical that that is what counts: it would seem to me the contract is that the content of the arrays that is actually visible should be the thing that is preserved, i.e., any way you access an array element you get the identical answer. I don't quite see why the dtype itself has to be identical, just like I cannot expect Also, if it does matter to preserve holes, then shouldn't be what is in them be preserved? Currently, that is not the case:
I'm not sure how that is going to help any code! Anyway, I'm not trying to conclude I'm right here, just that there is I think a valid argument for packing copies. Also I'm not sure what is the bigger risk for regressions: that copies of arrays created with dtypes with holes no longer have holes, or that, because of the change to a view, copies of parts now do have holes. Certainly, @dmbelov ran into the second problem... p.s. If we do not change the behaviour, it would be good to update the documentation of |
comment on
|
Again, I think the copy is a red herring here. It is the new indexing behavior that creates padding, not the copy operation. The copy operation never has, and still does not, remove any padding. For instance, Furthermore, I really think we should stick to the principle that a copy of an array behaves exactly like the array itself. Consider the following: def mysave(filename, arr):
if arr.itemsize <= 16:
raise Exception("not supported by my file format")
my_save_impl(filename, arr)
a = np.array([1], dtype={'names': ['f0'], 'formats': ['i4'], 'offsets':[16]}).
mysave(a) # works
mysave(a.copy()) # wouldn't work if copy removes padding
x_i = np.array([(1, 2, 3)] * 10, dtype=[(n, 'f8') for n in ['a', 'b', 'c']])
b = x_i[['a', 'c']]
mysave(b) # works in 1.16
mysave(b.copy()) # wouldn't work if copy removes padding I think it would be very strange if a function failed on a copy of an array when it worked on the array itself. If we remove padding, this example shows that would happen. |
Let's unpack this line. It is doing two things in one line: Indexing, and then a copy operation. It is equivalent to doing:
So you are not making a copy of |
Point taken about Still, your example about a routine that might break by changing the behaviour of Also, I think the comparison that @dmbelov should have been looking for is something like I do think the behaviour @dmbelov expected is something logical to support - a common use case would be opening a very large file, maybe in memory mapped mode, and just getting out the relevant information, copying to be sure it works faster and/or to make sure one can write to it without destroying the original (for this example, the fact that Anyway, perhaps this is something to bring up on the mailing list? I think the question of what the contract is that one gives the user is one that could use some broader input. |
That's a good point, the strides aren't preserved by I also agree your memmap'd file example is something that would be nice to support. One option is to add a
Also, there is an other issue/PR where we discuss |
comment on @ahaldane: |
I understand the reasons why we do not want to change behavior of
What do you think? |
@dmbelov - I think to normally have a view is really preferable in most situations, and much more consistent with how we do slicing - it is unlikely we go back to that! But as I hope is clear, you have brought up a good point about how There is a variant on your proposal that I wondered about: Anyway, I think the mailing list is a good idea here; I'll try to post tomorrow. |
Your proposal is connected to #10417, and seems quite elegant to avoid the copy issue. A real issue with that change though, besides that many people prefer views by default, is not the technical side but the ambiguous backcompat situation here. As noted in the linked email, the view change was planned since numpy 1.7 with FutureWarnings since then, so arguably the copy behavior you depend on was explicitly unsupported. (Interestingly, I think by doing an extra copy your code may have avoided the FutureWarning, though I'm not sure if your copy was in preexisting code or if you introduced it now to fix a field packing issue). In addition to all the issues brought up previously, we now have a new backcompat issue, that this change has already been publicly released in numpy 1.16. In any case its worth discussing strategies like modifying copy. Also, I'm not at all opposed to changing the behavior again if it turns out to be the right thing to do to help backcompat. |
This buggy behavior of list slicing is inconsistent with assumptions in the function Example:
This is very bad that code in numpy release is inconsistent. Do you think it is a good idea to revert the change with list view back to its original until all inconsistencies and errors are fixed? |
That is indeed a bug, thank you for finding it. A few observations: I'm still not yet convinced we should revert the copy->view changes, but that is always on the table. The example you found seems a little bit unusual/contrived, is an old bug that now is somewhat more likely to turn up, and I think we we can still fix it for numpy 1.16.4. |
Didn't bother with the whole history here: But I changed copies, structured copies do return packed structs now. |
Sorry nvm, type promotion does, copy doesn't use that though to normalize... However, the above bug looks like a duplicate anyway |
The following change
multi-field views return a view instead of a copy
in NumPy 1.16 breaks reasonable behavior of the functioncopy
: it does not return a packed array. This bug incopy
causes a lot of problems, because it copes much more data than expected.Below is an example that shows the problem.
Reproducing code example:
Numpy/Python version information:
('1.16.2', '2.7.15 |Anaconda custom (64-bit)| (default, Oct 10 2018, 21:32:13) \n[GCC 7.3.0]')
The text was updated successfully, but these errors were encountered: