-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH,API: Store exported buffer info on the array #16938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The very real alternative to this would be to just implement the dealloc/free slot of the buffer interface. That will break some weirder usage, but maybe its much less then it used to be (plus scalars may have been part of the issue before). So maybe more a major release try (I just don't know how much it would affect). |
In general I am +1 on moving the information to be local. |
8c18128
to
0ba9870
Compare
In the last triage discussion we decided to push ahead with adding a field to the struct. It is still marked "draft" and has a merge conflict. |
I didn't fix the merge conflict yet because I was hoping to do gh-17295 first. |
f9e81e2
to
85fb053
Compare
@mattip OK, just marking as ready, here are the things we need to consider:
EDIT: OK, decided to at least not tag the default NULL, so that the allocation is unchanged (can initialize to NULL). @scoder as a test, I tried to do the dirty thing and subclass NumPy in cython using:
This goes down in flames, checking |
85fb053
to
e14b63b
Compare
Hmm, PyPy crashes, I guess the EDIT: Changed to not tag NULL, I can add a test for floating the error in the buffer export path, once we decided we want to give it a shot (and whether to tag the pointer at all). |
00ae1bf
to
fcdff4c
Compare
@seberg is this ready for final review? |
Its ready for review, I will add see if I can add a test for the "error" message (only the buffer export path). EDIT: Today! |
fcdff4c
to
34281f3
Compare
@mattip the only real remaining issue that I can think of is whether we have to do something about the fact that |
34281f3
to
f90cbd2
Compare
|
||
#define NPY_SIZEOF_PYARRAYOBJECT (PyArray_Type.tp_basicsize) | ||
|
||
since the size should not be considered a compile time constant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the size should not be considered a compile time constant. | |
Making this a compile-time constant may lead to bugs, but it is part of the public headers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
This was previously a compile-time constant, which could lead to bugs when running in a different NumPy version.
One thing we could do is adding a new
#define NPY_SIZEOF_PYARRAYOBJECT_STATIC sizeof(PyArrayObject_fields)
so that it is easier to replace the safe version with the unsafe one where necessary (it would be the users job to check this on import time). But as I don't know anyone actually using it, I am somewhat OK with forcing users into sizeof(PyArrayObject_fields)
using the deprecated API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think any attempt at using this is doomed to failure. Can we deprecate the macro, with a message that it is the wrong thing to checke?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add a deprecated compiler marker I think? That would probably make sense. Should we still keep it set to PyArray_Type.tp_basicsize
? Even if it is weird probably useless in almost all cases, it will at least break loudly if used for a static allocation?
I found it used once, for a static allocation to use struct subclass {char super[NPY_SIZEOF_PYARRAYOBJECT], int my_field}
. Which will fail compiling with this change, and compile to a non-futureproof version without.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we must disallow its use. These are exactly the use cases we want to loudly fail with the new code since they are depending on something staying the same size when it doesn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattip I agree. Did you formed a clear opinion on what to do here? No NPY_SIZEOF_PYARRAYOBJECT_STATIC
sounds good to me. But I need to make a call what to do specifically for this PR.
- Additionally to making it a run-time constant (if that even matters for anyone), we add a deprecated tag in C?
- We just remove it (i.e. make it an
# error
)... The(PyArray_Type.tp_basicsize)
doesn't have much point probably? - Only deprecate it, which makes it easier to compile code that is not future proof without realizing it.
I am OK with both of the first versions, the # error
might be nice, since it allows us to write a few words about what is going on in a very clear way that can't get lost.
EDIT: Oh, I guess you can't put a #error
inside a #define
, although, I suppose you can put some invalid syntax, which is less nice though. Or maybe there is something equivalent to #error
as a function... EDIT2: Or probably, might as well just delete it in that case and change the documentation somewhere with info that can be found when googling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could try to do some static assert complicated thing, or just remove it and document the removal in the release notes. I think removing it is the easiest and will get the desired response if anyone is actually using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, the last commit just deletes this #define
and adds some info to the PyArrayObject
structure documentation.
Not sure where we stand with putting this in (or more when), seems like not for 1.20 most likely, so will have to update the text slightly again before merging finally.
Ping all. |
There's a clear error when you do this I believe (at least if the ABI version is incremented (or API, I forgot) - maybe that's the part we missing for 1.20.0? EDIT: |
The ABI version causes a hard error on any mismatch and would cause forward compatibility to be broken as well as backward compatibility. I.e. it would definitely require a "2.0" release and major churn downstream. There is this chunk of code in
But, I just checked and Chuck did increment the API version in 1.20 (which is the same as the |
Thank you for the hint to
Manually setting to numpy 1.14.6 instead breaks for newer python versions (line 719): I feel like in DLL hell... EDIT: I am currently retrying with numpy 1.16.3 in the hope that this works for all currently tested versions. Due to the security vulnerability known for older versions than 1.16.3 I feel fine to break out-of-the-box wheel support if that helps users to move away from known and exposed vulnerabilities. |
But this is the build we are talking about, not the runtime. The "vulnerability" (which is debatable) is for runtime only.
Yes, oldest-supported-numpy is for python3.5 and up, although it should not break 3.6. Are you sure you need to be creating new wheels for python 2.7, 3.4, 3.5? You see users demanding new releases of your package for those versions? |
... if so, please open an issue in oldest-supported-numpy so it can support the python ecosystem better. |
@ChristophWWagner you can use different versions to build against for different Python versions, to solve your py34 issue. See the
FYI, that alert was more or less nonsense, and isn't relevant for a build-time issue. |
Thank you all for your hep and comments on how to resolve this issue. I have now introduced version-specific minimal requirements and this seems to resove the issue. Of course you are right that run-time security issues do not play a role here, I have reverted the previous wrong assessment in my setup. Thanks for pointing this out!
Unfortunately we have some mainframe systems running that still run python2 code. That's why we still keep the wheels in. As some of you were asking on how to communicate this issue better I would suggest to reqork the error message. First time I have seen this I thought of a bug within the package and (as a user) I thought there was nothing that I could do here. Only after seeing this message repeat as a pattern I started investigating and found out that there is a way of arriving at an inconsistent dependency state and could start to resolve that. Maybe pointing to this idea could already help communicating the underlying issue. Again, thanks for all your help! |
out of curiosity and aside from the buid-time issue: why is this security alert bogus? |
Because it's equivalent to saying "Python has a vulnerability because This comment does a good job of conveying that: #12759 (comment) |
NumPy ABI issue: numpy/numpy#16938 Ref: JP article (https://zenn.dev/ymd_h/articles/934a90e1468a05)
* Add pyproject.toml to specify build-time requirements With this, I believe numpy import hack in the setup.py is no longer necessary and it may fix installation issues like #63 * Use numpy<=1.20.0 to build C extensions NumPy ABI issue: numpy/numpy#16938 Ref: JP article (https://zenn.dev/ymd_h/articles/934a90e1468a05)
This is still causing problems:
That's a pretty popular SO issue. Looking at the comments, we knew this was quite a risky change - looks like we should learn this lesson and not do this kind of thing again unless we bump the major version. |
I have only one version of numpy on my machine and I still get this error. |
I am not convinced by that issue. If I read it correctly, the user was building locally using The alternative is to never change NumPy structs used by cython, which is also painful. Just to expand a bit: the problem is due to
Then cython codes the size of the new (larger) struct, and raises an error when seeing the smaller one. The other direction (building with 1.19, running with 1.21) works, although it may warn. |
We can add things, just not remove them, right? And we actually do that semi-regularly? Here's another example: piskvorky/gensim#3085 (comment). It's really hard to figure out what is actually caused by this PR and what may be user error or error of another project building wheels a certain way. But the bug reports seem compatible with the actual impact we expect (best summarized with @eric-wieser's comment here). And given this was a "it's not that important, but let's see what y'all think" kind of change anyway, my impression is that we shouldn't have merged it. |
Looking at that example, once they figured out the problem, there is this message:
so it seems like this PR has the side-effect of helping projects clean up their builds. :) We never remove fields. And if we reject the approach in this PR neither will we be able to add fields. We do not do that regularly, this was the first attempt to change
We wanted to see what would happen if we changed the size: this is a canary for the further proposed changes to structs that appear in the cython API ( |
My conclusion is that each time we do something like this (the C API change in 1.19.2 comes to mind as well), we get multiple projects linking to the PR in question claiming we broke something - which we did, it's just not clear that that is actually the cause of their problem or not. Doing more of this seems unhealthy, at least without better diagnostics / safeguards.
This one (from @seberg's comment above) we should get to the bottom of. If the problem is indeed just "using a package that was built against a newer numpy with an older numpy at runtime", then our error should say that on import. If we continue to get:
from Cython, that'll be super unhelpful for everyone. |
Just to clarify. I only have one version of numpy on my machine that I installed myself using setup.py. The problem here is h5py which uses a numpy egg file while building that is version 1.17. This new egg file is present within the h5py installation. That is the reason for the error on my machine. Please see this comment - h5py/h5py#1955 (comment) |
@winash12 Since that issue is closed I don't want to rehash it here. The question here is "can we give a better error when a package is built against a newer numpy but uses an older numpy at runtime". |
Do I read the latest discussion right if I say: Real progress here would be trying to make the cython error more clear, or even seeing if we can get cython to call EDIT: Which means someone should probably open a cython issue? |
Yes, that sounds right. If we could just make it say "Installed numpy version is older than the numpy version that this package was built against" or something like that, that would make it a lot less painful to debug issues that are potentially due to PRs like this one. |
This speeds up array deallocation and buffer exports, since it
removes the need to global dictionary lookups. It also somewhat
simplifies the logic. The main advantage is prossibly less the
speedup itself (which is not large compared to most things that
happen in the livetime of an array), but rather that no unnecessary
work is done for shortlived arrays, which never export a buffer.
The downside of this approach is that the ABI changes for anyone
who would be subclassing ndarray in C.
I am not super invested in this, but I do feel its worth considering. This requires breaking the ABI slightly (so would need release notes), because it extends the array struct (and void, but that is much less likely to be used). Marking as draft since there are probably some smaller moving parts. But since I was looking at the buffer interface today, thought I would wrap this up and show it at least.