Skip to content

ENH,API: Store exported buffer info on the array #16938

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Nov 26, 2020

Conversation

seberg
Copy link
Member

@seberg seberg commented Jul 23, 2020

This speeds up array deallocation and buffer exports, since it
removes the need to global dictionary lookups. It also somewhat
simplifies the logic. The main advantage is prossibly less the
speedup itself (which is not large compared to most things that
happen in the livetime of an array), but rather that no unnecessary
work is done for shortlived arrays, which never export a buffer.

The downside of this approach is that the ABI changes for anyone
who would be subclassing ndarray in C.


I am not super invested in this, but I do feel its worth considering. This requires breaking the ABI slightly (so would need release notes), because it extends the array struct (and void, but that is much less likely to be used). Marking as draft since there are probably some smaller moving parts. But since I was looking at the buffer interface today, thought I would wrap this up and show it at least.

@seberg
Copy link
Member Author

seberg commented Jul 24, 2020

The very real alternative to this would be to just implement the dealloc/free slot of the buffer interface. That will break some weirder usage, but maybe its much less then it used to be (plus scalars may have been part of the issue before). So maybe more a major release try (I just don't know how much it would affect).

@mattip
Copy link
Member

mattip commented Jul 24, 2020

In general I am +1 on moving the information to be local.

@mattip
Copy link
Member

mattip commented Nov 2, 2020

In the last triage discussion we decided to push ahead with adding a field to the struct. It is still marked "draft" and has a merge conflict.

@seberg
Copy link
Member Author

seberg commented Nov 2, 2020

I didn't fix the merge conflict yet because I was hoping to do gh-17295 first.

@seberg seberg force-pushed the simplify-buffers branch 3 times, most recently from f9e81e2 to 85fb053 Compare November 6, 2020 22:20
@seberg seberg changed the title ENH: Store exported buffer info on the array ENH,API: Store exported buffer info on the array Nov 6, 2020
@seberg seberg marked this pull request as ready for review November 6, 2020 22:22
@seberg
Copy link
Member Author

seberg commented Nov 6, 2020

@mattip OK, just marking as ready, here are the things we need to consider:

  1. I currently tag the pointers (by adding 2), the reason for this is that it gives a good probability of raising an error (at least often). Instead of just crashing hard (or maybe missing the incompatibility).
  2. I also tag the pointer for the void scalar inside tp_alloc, which feels a bit weird. We already allocate the store space in gentype_alloc which is probably just as dirty, though. I am pretty certain now that this is OK, but maybe worth a quick thought by someone else.
  3. As noted in the release note, the definition of NPY_SIZEOF_PYARRAYOBJECT is now not a constant anymore, the only project I have yet found that might be hit by this is here: https://github.com/patmiller/bignumpy/blob/master/bignumpy.c (opened an issue there, but it seems inactive). That project would not compile successfully on a new NumPy and would require (fairly simple adaptation). It could make sense to explain how to do it, but I am still a bit unsure if we actually want anyone subclassing from C...

EDIT: OK, decided to at least not tag the default NULL, so that the allocation is unchanged (can initialize to NULL).

@scoder as a test, I tried to do the dirty thing and subclass NumPy in cython using:

cimport numpy as np

cdef class MyArr(npc.ndarray):
    pass

This goes down in flames, checking gc.is_tracked(MyArr(3)) shows that cython sets the garbage collection flag and adds it to be tracked, I assume that causes the failure in the end, but I am not sure. It probably doesn't really matter, it would mainly be interesting if there was a simple way to make this work that I am missing.
(Overall, cython users should be happy with this, since it shaves off a bit of overhead from cython +typical python calls that use typed memoryviews.)

@seberg
Copy link
Member Author

seberg commented Nov 7, 2020

Hmm, PyPy crashes, I guess the tp_alloc is likely just wrong, could just stop tagging the NULL pointer (or remove all the tagging), that way initializing all to 0 (which is what tp_alloc should do) is definitely the right thing.

EDIT: Changed to not tag NULL, I can add a test for floating the error in the buffer export path, once we decided we want to give it a shot (and whether to tag the pointer at all).

@mattip
Copy link
Member

mattip commented Nov 12, 2020

@seberg is this ready for final review?

@seberg
Copy link
Member Author

seberg commented Nov 12, 2020

Its ready for review, I will add see if I can add a test for the "error" message (only the buffer export path). EDIT: Today!

@seberg
Copy link
Member Author

seberg commented Nov 12, 2020

@mattip the only real remaining issue that I can think of is whether we have to do something about the fact that NPY_SIZEOF_ARRAYOBJECT is not a constant anymore (documentation, or otherwise).


#define NPY_SIZEOF_PYARRAYOBJECT (PyArray_Type.tp_basicsize)

since the size should not be considered a compile time constant.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
since the size should not be considered a compile time constant.
Making this a compile-time constant may lead to bugs, but it is part of the public headers.

Copy link
Member Author

@seberg seberg Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

This was previously a compile-time constant, which could lead to bugs when running in a different NumPy version.

One thing we could do is adding a new

#define NPY_SIZEOF_PYARRAYOBJECT_STATIC sizeof(PyArrayObject_fields)

so that it is easier to replace the safe version with the unsafe one where necessary (it would be the users job to check this on import time). But as I don't know anyone actually using it, I am somewhat OK with forcing users into sizeof(PyArrayObject_fields) using the deprecated API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think any attempt at using this is doomed to failure. Can we deprecate the macro, with a message that it is the wrong thing to checke?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a deprecated compiler marker I think? That would probably make sense. Should we still keep it set to PyArray_Type.tp_basicsize? Even if it is weird probably useless in almost all cases, it will at least break loudly if used for a static allocation?

I found it used once, for a static allocation to use struct subclass {char super[NPY_SIZEOF_PYARRAYOBJECT], int my_field} . Which will fail compiling with this change, and compile to a non-futureproof version without.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we must disallow its use. These are exactly the use cases we want to loudly fail with the new code since they are depending on something staying the same size when it doesn't.

Copy link
Member Author

@seberg seberg Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattip I agree. Did you formed a clear opinion on what to do here? No NPY_SIZEOF_PYARRAYOBJECT_STATIC sounds good to me. But I need to make a call what to do specifically for this PR.

  • Additionally to making it a run-time constant (if that even matters for anyone), we add a deprecated tag in C?
  • We just remove it (i.e. make it an # error)... The (PyArray_Type.tp_basicsize) doesn't have much point probably?
  • Only deprecate it, which makes it easier to compile code that is not future proof without realizing it.

I am OK with both of the first versions, the # error might be nice, since it allows us to write a few words about what is going on in a very clear way that can't get lost.

EDIT: Oh, I guess you can't put a #error inside a #define, although, I suppose you can put some invalid syntax, which is less nice though. Or maybe there is something equivalent to #error as a function... EDIT2: Or probably, might as well just delete it in that case and change the documentation somewhere with info that can be found when googling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could try to do some static assert complicated thing, or just remove it and document the removal in the release notes. I think removing it is the easiest and will get the desired response if anyone is actually using it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, the last commit just deletes this #define and adds some info to the PyArrayObject structure documentation.

Not sure where we stand with putting this in (or more when), seems like not for 1.20 most likely, so will have to update the text slightly again before merging finally.

@seberg seberg removed the 25 - WIP label Nov 18, 2020
@seberg seberg added this to the 1.20.0 release milestone Nov 18, 2020
@mattip mattip self-requested a review November 18, 2020 19:37
@charris
Copy link
Member

charris commented Nov 21, 2020

Ping all.

@rgommers
Copy link
Member

rgommers commented Mar 3, 2021

It would be great to give a better error in those cases though (something that makes the solution very obvious).

There's a clear error when you do this I believe (at least if the ABI version is incremented (or API, I forgot) - maybe that's the part we missing for 1.20.0?

EDIT: NPY_ABI_VERSION NPY_API_VERSION I believe

@seberg
Copy link
Member Author

seberg commented Mar 3, 2021

The ABI version causes a hard error on any mismatch and would cause forward compatibility to be broken as well as backward compatibility. I.e. it would definitely require a "2.0" release and major churn downstream.

There is this chunk of code in _import_array:

  if (NPY_FEATURE_VERSION > PyArray_GetNDArrayCFeatureVersion()) {
      PyErr_Format(PyExc_RuntimeError, "module compiled against "\
             "API version 0x%%x but this version of numpy is 0x%%x", \
             (int) NPY_FEATURE_VERSION, (int) PyArray_GetNDArrayCFeatureVersion());
      return -1;
  }

But, I just checked and Chuck did increment the API version in 1.20 (which is the same as the NPY_FEATURE_VERSION), so I am a bit confused why our error is not raised. Does cython manage to check this based on the Python side import numpy before it actually calls the C-side _import_array()?

@ChristophWWagner
Copy link

ChristophWWagner commented Mar 3, 2021

Well, the build is successful and fine, but not compatible with older NumPy versions... It would be great to give a better error in those cases though (something that makes the solution very obvious).

I think one thing that might make things convenient for you, is to use the scipy meta-package: https://github.com/scipy/oldest-supported-numpy which explicitly picks the smallest NumPy version (I think for a given support python version). If you build with that, you should be completely fine.

I am not very versed with building, so am a bit unsure how much hassle all of this ends up being...

Thank you for the hint to oldest-supported-numpy. Unfortunately there are two issues with it:

Manually setting to numpy 1.14.6 instead breaks for newer python versions (line 719):
https://www.travis-ci.org/github/EMS-TU-Ilmenau/fastmat/jobs/761330130

I feel like in DLL hell...

EDIT: I am currently retrying with numpy 1.16.3 in the hope that this works for all currently tested versions. Due to the security vulnerability known for older versions than 1.16.3 I feel fine to break out-of-the-box wheel support if that helps users to move away from known and exposed vulnerabilities.

@mattip
Copy link
Member

mattip commented Mar 3, 2021

snyk issues vulnerability alerts for numpy 1.14.5 and 1.14.6

But this is the build we are talking about, not the runtime. The "vulnerability" (which is debatable) is for runtime only.

It will just break the ci test builds for python 3.4, python 3.6 and python 2.7

Yes, oldest-supported-numpy is for python3.5 and up, although it should not break 3.6. Are you sure you need to be creating new wheels for python 2.7, 3.4, 3.5? You see users demanding new releases of your package for those versions?

@mattip
Copy link
Member

mattip commented Mar 3, 2021

... if so, please open an issue in oldest-supported-numpy so it can support the python ecosystem better.

@rgommers
Copy link
Member

rgommers commented Mar 3, 2021

@ChristophWWagner you can use different versions to build against for different Python versions, to solve your py34 issue. See the setup.cfg of oldest-supported-numpy.

FYI, that alert was more or less nonsense, and isn't relevant for a build-time issue.

@ChristophWWagner
Copy link

Thank you all for your hep and comments on how to resolve this issue. I have now introduced version-specific minimal requirements and this seems to resove the issue.

Of course you are right that run-time security issues do not play a role here, I have reverted the previous wrong assessment in my setup. Thanks for pointing this out!

oldest-supported-numpy caused a version to be selected that was not supported by another package, which led to a length dependency resolution process in pip. To reduce CI build load I decided to specify the oldest combinations for which wheels exist. This should do the job.

Unfortunately we have some mainframe systems running that still run python2 code. That's why we still keep the wheels in.

As some of you were asking on how to communicate this issue better I would suggest to reqork the error message. First time I have seen this I thought of a bug within the package and (as a user) I thought there was nothing that I could do here. Only after seeing this message repeat as a pattern I started investigating and found out that there is a way of arriving at an inconsistent dependency state and could start to resolve that. Maybe pointing to this idea could already help communicating the underlying issue.

Again, thanks for all your help!

@ChristophWWagner
Copy link

@ChristophWWagner you can use different versions to build against for different Python versions, to solve your py34 issue. See the setup.cfg of oldest-supported-numpy.

FYI, that alert was more or less nonsense, and isn't relevant for a build-time issue.

out of curiosity and aside from the buid-time issue: why is this security alert bogus?

@eric-wieser
Copy link
Member

eric-wieser commented Mar 4, 2021

out of curiosity and aside from the buid-time issue: why is this security alert bogus?

Because it's equivalent to saying "Python has a vulnerability because pickle.load allows arbitrary code execution".
The true statement is that "code that uses np.load unsafely is vulnerable", not "np.load is unsafe", just as "code that uses pickle.load unsafely is vulnerable"

This comment does a good job of conveying that: #12759 (comment)

r9y9 added a commit to r9y9/Python-Wrapper-for-World-Vocoder that referenced this pull request May 24, 2021
JeremyCCHsu pushed a commit to JeremyCCHsu/Python-Wrapper-for-World-Vocoder that referenced this pull request May 25, 2021
* Add pyproject.toml to specify build-time requirements

With this, I believe numpy import hack in the setup.py is no longer
necessary and it may fix installation issues like
#63

* Use numpy<=1.20.0 to build C extensions

NumPy ABI issue: numpy/numpy#16938

Ref: JP article (https://zenn.dev/ymd_h/articles/934a90e1468a05)
@rgommers
Copy link
Member

This is still causing problems:

That's a pretty popular SO issue. Looking at the comments, we knew this was quite a risky change - looks like we should learn this lesson and not do this kind of thing again unless we bump the major version.

@winash12
Copy link

I have only one version of numpy on my machine and I still get this error.

@mattip
Copy link
Member

mattip commented Aug 31, 2021

looks like we should learn this lesson and not do this kind of thing again unless we bump the major version.

I am not convinced by that issue. If I read it correctly, the user was building locally using python setup.py install, wit a newer NumPy, but then ran their code with an older NumPy. Building locally with python setup.py avoids all the attempts by a build system to pin required package versions.

The alternative is to never change NumPy structs used by cython, which is also painful.

Just to expand a bit: the problem is due to

  • the user having a version of numpy in their environment before this change (say 1.19)
  • installing a package that somehow builds using cython and a newer version of NumPy (say 1.20)

Then cython codes the size of the new (larger) struct, and raises an error when seeing the smaller one. The other direction (building with 1.19, running with 1.21) works, although it may warn.

@rgommers
Copy link
Member

The alternative is to never change NumPy structs used by cython, which is also painful.

We can add things, just not remove them, right? And we actually do that semi-regularly?

Here's another example: piskvorky/gensim#3085 (comment). It's really hard to figure out what is actually caused by this PR and what may be user error or error of another project building wheels a certain way. But the bug reports seem compatible with the actual impact we expect (best summarized with @eric-wieser's comment here). And given this was a "it's not that important, but let's see what y'all think" kind of change anyway, my impression is that we shouldn't have merged it.

@mattip
Copy link
Member

mattip commented Aug 31, 2021

Looking at that example, once they figured out the problem, there is this message:

Using the oldest-supported-numpy as build dependency worked, and the wheels got built on the first try. I don't think that's ever happened before ;)

so it seems like this PR has the side-effect of helping projects clean up their builds. :)

We never remove fields. And if we reject the approach in this PR neither will we be able to add fields. We do not do that regularly, this was the first attempt to change PyArrayObject in many years.

my impression is that we shouldn't have merged it

We wanted to see what would happen if we changed the size: this is a canary for the further proposed changes to structs that appear in the cython API (PyArrayObject and PyArray_Descr are the biggest but there are more). If we cannot change structs then we cannot accept NEP 49 (allocator strategies) and the dtype refactor will have to be much more limited. We did discuss this many times at the developer and triage meetings before we decided to go ahead with it.

@rgommers
Copy link
Member

rgommers commented Sep 1, 2021

We wanted to see what would happen if we changed the size: this is a canary for the further proposed changes

My conclusion is that each time we do something like this (the C API change in 1.19.2 comes to mind as well), we get multiple projects linking to the PR in question claiming we broke something - which we did, it's just not clear that that is actually the cause of their problem or not. Doing more of this seems unhealthy, at least without better diagnostics / safeguards.

But, I just checked and Chuck did increment the API version in 1.20 (which is the same as the NPY_FEATURE_VERSION), so I am a bit confused why our error is not raised. Does cython manage to check this based on the Python side import numpy before it actually calls the C-side _import_array()?

This one (from @seberg's comment above) we should get to the bottom of. If the problem is indeed just "using a package that was built against a newer numpy with an older numpy at runtime", then our error should say that on import. If we continue to get:

ValueError: numpy.ndarray size changed, may indicate binary incompatibility

from Cython, that'll be super unhelpful for everyone.

@winash12
Copy link

winash12 commented Sep 1, 2021

looks like we should learn this lesson and not do this kind of thing again unless we bump the major version.

I am not convinced by that issue. If I read it correctly, the user was building locally using python setup.py install, wit a newer NumPy, but then ran their code with an older NumPy. Building locally with python setup.py avoids all the attempts by a build system to pin required package versions.

The alternative is to never change NumPy structs used by cython, which is also painful.

Just to expand a bit: the problem is due to

  • the user having a version of numpy in their environment before this change (say 1.19)
  • installing a package that somehow builds using cython and a newer version of NumPy (say 1.20)

Then cython codes the size of the new (larger) struct, and raises an error when seeing the smaller one. The other direction (building with 1.19, running with 1.21) works, although it may warn.

Just to clarify. I only have one version of numpy on my machine that I installed myself using setup.py. The problem here is h5py which uses a numpy egg file while building that is version 1.17. This new egg file is present within the h5py installation. That is the reason for the error on my machine. Please see this comment - h5py/h5py#1955 (comment)

@mattip
Copy link
Member

mattip commented Sep 1, 2021

@winash12 Since that issue is closed I don't want to rehash it here. The question here is "can we give a better error when a package is built against a newer numpy but uses an older numpy at runtime".

@seberg
Copy link
Member Author

seberg commented Sep 7, 2021

Do I read the latest discussion right if I say: Real progress here would be trying to make the cython error more clear, or even seeing if we can get cython to call import_array() first (which probably already gives a better warning/error?)

EDIT: Which means someone should probably open a cython issue?

@rgommers
Copy link
Member

rgommers commented Sep 7, 2021

Yes, that sounds right. If we could just make it say "Installed numpy version is older than the numpy version that this package was built against" or something like that, that would make it a lot less painful to debug issues that are potentially due to PRs like this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement component: numpy._core triage review Issue/PR to be discussed at the next triage meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants