ENH,API: Store exported buffer info on the array #16938

seberg · 2020-07-23T22:41:38Z

This speeds up array deallocation and buffer exports, since it
removes the need to global dictionary lookups. It also somewhat
simplifies the logic. The main advantage is prossibly less the
speedup itself (which is not large compared to most things that
happen in the livetime of an array), but rather that no unnecessary
work is done for shortlived arrays, which never export a buffer.

The downside of this approach is that the ABI changes for anyone
who would be subclassing ndarray in C.

I am not super invested in this, but I do feel its worth considering. This requires breaking the ABI slightly (so would need release notes), because it extends the array struct (and void, but that is much less likely to be used). Marking as draft since there are probably some smaller moving parts. But since I was looking at the buffer interface today, thought I would wrap this up and show it at least.

seberg · 2020-07-24T00:25:33Z

The very real alternative to this would be to just implement the dealloc/free slot of the buffer interface. That will break some weirder usage, but maybe its much less then it used to be (plus scalars may have been part of the issue before). So maybe more a major release try (I just don't know how much it would affect).

numpy/core/include/numpy/ndarraytypes.h

mattip · 2020-07-24T06:29:57Z

In general I am +1 on moving the information to be local.

mattip · 2020-11-02T20:18:28Z

In the last triage discussion we decided to push ahead with adding a field to the struct. It is still marked "draft" and has a merge conflict.

seberg · 2020-11-02T20:22:27Z

I didn't fix the merge conflict yet because I was hoping to do gh-17295 first.

seberg · 2020-11-06T22:55:24Z

@mattip OK, just marking as ready, here are the things we need to consider:

I currently tag the pointers (by adding 2), the reason for this is that it gives a good probability of raising an error (at least often). Instead of just crashing hard (or maybe missing the incompatibility).
I also tag the pointer for the void scalar inside tp_alloc, which feels a bit weird. We already allocate the store space in gentype_alloc which is probably just as dirty, though. I am pretty certain now that this is OK, but maybe worth a quick thought by someone else.
As noted in the release note, the definition of NPY_SIZEOF_PYARRAYOBJECT is now not a constant anymore, the only project I have yet found that might be hit by this is here: https://github.com/patmiller/bignumpy/blob/master/bignumpy.c (opened an issue there, but it seems inactive). That project would not compile successfully on a new NumPy and would require (fairly simple adaptation). It could make sense to explain how to do it, but I am still a bit unsure if we actually want anyone subclassing from C...

EDIT: OK, decided to at least not tag the default NULL, so that the allocation is unchanged (can initialize to NULL).

@scoder as a test, I tried to do the dirty thing and subclass NumPy in cython using:

cimport numpy as np

cdef class MyArr(npc.ndarray):
    pass

This goes down in flames, checking gc.is_tracked(MyArr(3)) shows that cython sets the garbage collection flag and adds it to be tracked, I assume that causes the failure in the end, but I am not sure. It probably doesn't really matter, it would mainly be interesting if there was a simple way to make this work that I am missing.
(Overall, cython users should be happy with this, since it shaves off a bit of overhead from cython +typical python calls that use typed memoryviews.)

seberg · 2020-11-07T00:24:20Z

Hmm, PyPy crashes, I guess the tp_alloc is likely just wrong, could just stop tagging the NULL pointer (or remove all the tagging), that way initializing all to 0 (which is what tp_alloc should do) is definitely the right thing.

EDIT: Changed to not tag NULL, I can add a test for floating the error in the buffer export path, once we decided we want to give it a shot (and whether to tag the pointer at all).

mattip · 2020-11-12T17:47:33Z

@seberg is this ready for final review?

seberg · 2020-11-12T18:18:06Z

Its ready for review, I will add see if I can add a test for the "error" message (only the buffer export path). EDIT: Today!

seberg · 2020-11-12T19:22:53Z

@mattip the only real remaining issue that I can think of is whether we have to do something about the fact that NPY_SIZEOF_ARRAYOBJECT is not a constant anymore (documentation, or otherwise).

mattip · 2020-11-12T19:52:34Z

doc/release/upcoming_changes/16938.c_api.rst

+
+    #define NPY_SIZEOF_PYARRAYOBJECT (PyArray_Type.tp_basicsize)
+
+since the size should not be considered a compile time constant.


Suggested change

since the size should not be considered a compile time constant.

Making this a compile-time constant may lead to bugs, but it is part of the public headers.

How about:

This was previously a compile-time constant, which could lead to bugs when running in a different NumPy version.

One thing we could do is adding a new

#define NPY_SIZEOF_PYARRAYOBJECT_STATIC sizeof(PyArrayObject_fields)

so that it is easier to replace the safe version with the unsafe one where necessary (it would be the users job to check this on import time). But as I don't know anyone actually using it, I am somewhat OK with forcing users into sizeof(PyArrayObject_fields) using the deprecated API.

I think any attempt at using this is doomed to failure. Can we deprecate the macro, with a message that it is the wrong thing to checke?

We can add a deprecated compiler marker I think? That would probably make sense. Should we still keep it set to PyArray_Type.tp_basicsize? Even if it is weird probably useless in almost all cases, it will at least break loudly if used for a static allocation?

I found it used once, for a static allocation to use struct subclass {char super[NPY_SIZEOF_PYARRAYOBJECT], int my_field} . Which will fail compiling with this change, and compile to a non-futureproof version without.

I think we must disallow its use. These are exactly the use cases we want to loudly fail with the new code since they are depending on something staying the same size when it doesn't.

@mattip I agree. Did you formed a clear opinion on what to do here? No NPY_SIZEOF_PYARRAYOBJECT_STATIC sounds good to me. But I need to make a call what to do specifically for this PR.

Additionally to making it a run-time constant (if that even matters for anyone), we add a deprecated tag in C?

We just remove it (i.e. make it an # error)... The (PyArray_Type.tp_basicsize) doesn't have much point probably?

Only deprecate it, which makes it easier to compile code that is not future proof without realizing it.

I am OK with both of the first versions, the # error might be nice, since it allows us to write a few words about what is going on in a very clear way that can't get lost.

EDIT: Oh, I guess you can't put a #error inside a #define, although, I suppose you can put some invalid syntax, which is less nice though. Or maybe there is something equivalent to #error as a function... EDIT2: Or probably, might as well just delete it in that case and change the documentation somewhere with info that can be found when googling.

we could try to do some static assert complicated thing, or just remove it and document the removal in the release notes. I think removing it is the easiest and will get the desired response if anyone is actually using it.

OK, the last commit just deletes this #define and adds some info to the PyArrayObject structure documentation.

Not sure where we stand with putting this in (or more when), seems like not for 1.20 most likely, so will have to update the text slightly again before merging finally.

charris · 2020-11-21T17:57:57Z

Ping all.

rgommers · 2021-03-03T18:57:00Z

It would be great to give a better error in those cases though (something that makes the solution very obvious).

There's a clear error when you do this I believe (at least if the ABI version is incremented (or API, I forgot) - maybe that's the part we missing for 1.20.0?

EDIT: ~~NPY_ABI_VERSION~~ NPY_API_VERSION I believe

seberg · 2021-03-03T19:14:37Z

The ABI version causes a hard error on any mismatch and would cause forward compatibility to be broken as well as backward compatibility. I.e. it would definitely require a "2.0" release and major churn downstream.

There is this chunk of code in _import_array:

  if (NPY_FEATURE_VERSION > PyArray_GetNDArrayCFeatureVersion()) {
      PyErr_Format(PyExc_RuntimeError, "module compiled against "\
             "API version 0x%%x but this version of numpy is 0x%%x", \
             (int) NPY_FEATURE_VERSION, (int) PyArray_GetNDArrayCFeatureVersion());
      return -1;
  }

But, I just checked and Chuck did increment the API version in 1.20 (which is the same as the NPY_FEATURE_VERSION), so I am a bit confused why our error is not raised. Does cython manage to check this based on the Python side import numpy before it actually calls the C-side _import_array()?

ChristophWWagner · 2021-03-03T19:43:56Z

Well, the build is successful and fine, but not compatible with older NumPy versions... It would be great to give a better error in those cases though (something that makes the solution very obvious).

I think one thing that might make things convenient for you, is to use the scipy meta-package: https://github.com/scipy/oldest-supported-numpy which explicitly picks the smallest NumPy version (I think for a given support python version). If you build with that, you should be completely fine.

I am not very versed with building, so am a bit unsure how much hassle all of this ends up being...

Thank you for the hint to oldest-supported-numpy. Unfortunately there are two issues with it:

It will just break the ci test builds for python 3.4, python 3.6 and python 2.7 for no apparent reason
snyk issues vulnerability alerts for numpy 1.14.5 and 1.14.6 (https://snyk.io/vuln/SNYK-PYTHON-NUMPY-73513)

Manually setting to numpy 1.14.6 instead breaks for newer python versions (line 719):
https://www.travis-ci.org/github/EMS-TU-Ilmenau/fastmat/jobs/761330130

I feel like in DLL hell...

EDIT: I am currently retrying with numpy 1.16.3 in the hope that this works for all currently tested versions. Due to the security vulnerability known for older versions than 1.16.3 I feel fine to break out-of-the-box wheel support if that helps users to move away from known and exposed vulnerabilities.

mattip · 2021-03-03T20:02:58Z

snyk issues vulnerability alerts for numpy 1.14.5 and 1.14.6

But this is the build we are talking about, not the runtime. The "vulnerability" (which is debatable) is for runtime only.

It will just break the ci test builds for python 3.4, python 3.6 and python 2.7

Yes, oldest-supported-numpy is for python3.5 and up, although it should not break 3.6. Are you sure you need to be creating new wheels for python 2.7, 3.4, 3.5? You see users demanding new releases of your package for those versions?

mattip · 2021-03-03T20:03:59Z

... if so, please open an issue in oldest-supported-numpy so it can support the python ecosystem better.

rgommers · 2021-03-03T20:07:12Z

@ChristophWWagner you can use different versions to build against for different Python versions, to solve your py34 issue. See the setup.cfg of oldest-supported-numpy.

snyk issues vulnerability alerts for numpy 1.14.5 and 1.14.6 (https://snyk.io/vuln/SNYK-PYTHON-NUMPY-73513)

FYI, that alert was more or less nonsense, and isn't relevant for a build-time issue.

ChristophWWagner · 2021-03-04T12:26:30Z

Thank you all for your hep and comments on how to resolve this issue. I have now introduced version-specific minimal requirements and this seems to resove the issue.

Of course you are right that run-time security issues do not play a role here, I have reverted the previous wrong assessment in my setup. Thanks for pointing this out!

oldest-supported-numpy caused a version to be selected that was not supported by another package, which led to a length dependency resolution process in pip. To reduce CI build load I decided to specify the oldest combinations for which wheels exist. This should do the job.

Unfortunately we have some mainframe systems running that still run python2 code. That's why we still keep the wheels in.

As some of you were asking on how to communicate this issue better I would suggest to reqork the error message. First time I have seen this I thought of a bug within the package and (as a user) I thought there was nothing that I could do here. Only after seeing this message repeat as a pattern I started investigating and found out that there is a way of arriving at an inconsistent dependency state and could start to resolve that. Maybe pointing to this idea could already help communicating the underlying issue.

Again, thanks for all your help!

ChristophWWagner · 2021-03-04T12:28:14Z

@ChristophWWagner you can use different versions to build against for different Python versions, to solve your py34 issue. See the setup.cfg of oldest-supported-numpy.

snyk issues vulnerability alerts for numpy 1.14.5 and 1.14.6 (https://snyk.io/vuln/SNYK-PYTHON-NUMPY-73513)

FYI, that alert was more or less nonsense, and isn't relevant for a build-time issue.

out of curiosity and aside from the buid-time issue: why is this security alert bogus?

eric-wieser · 2021-03-04T12:32:15Z

out of curiosity and aside from the buid-time issue: why is this security alert bogus?

Because it's equivalent to saying "Python has a vulnerability because pickle.load allows arbitrary code execution".
The true statement is that "code that uses np.load unsafely is vulnerable", not "np.load is unsafe", just as "code that uses pickle.load unsafely is vulnerable"

This comment does a good job of conveying that: #12759 (comment)

NumPy ABI issue: numpy/numpy#16938 Ref: JP article (https://zenn.dev/ymd_h/articles/934a90e1468a05)

* Add pyproject.toml to specify build-time requirements With this, I believe numpy import hack in the setup.py is no longer necessary and it may fix installation issues like #63 * Use numpy<=1.20.0 to build C extensions NumPy ABI issue: numpy/numpy#16938 Ref: JP article (https://zenn.dev/ymd_h/articles/934a90e1468a05)

rgommers · 2021-08-30T17:08:14Z

This is still causing problems:

That's a pretty popular SO issue. Looking at the comments, we knew this was quite a risky change - looks like we should learn this lesson and not do this kind of thing again unless we bump the major version.

winash12 · 2021-08-31T00:59:14Z

I have only one version of numpy on my machine and I still get this error.

mattip · 2021-08-31T08:22:08Z

looks like we should learn this lesson and not do this kind of thing again unless we bump the major version.

I am not convinced by that issue. If I read it correctly, the user was building locally using python setup.py install, wit a newer NumPy, but then ran their code with an older NumPy. Building locally with python setup.py avoids all the attempts by a build system to pin required package versions.

The alternative is to never change NumPy structs used by cython, which is also painful.

Just to expand a bit: the problem is due to

the user having a version of numpy in their environment before this change (say 1.19)
installing a package that somehow builds using cython and a newer version of NumPy (say 1.20)

Then cython codes the size of the new (larger) struct, and raises an error when seeing the smaller one. The other direction (building with 1.19, running with 1.21) works, although it may warn.

rgommers · 2021-08-31T10:55:02Z

The alternative is to never change NumPy structs used by cython, which is also painful.

We can add things, just not remove them, right? And we actually do that semi-regularly?

Here's another example: piskvorky/gensim#3085 (comment). It's really hard to figure out what is actually caused by this PR and what may be user error or error of another project building wheels a certain way. But the bug reports seem compatible with the actual impact we expect (best summarized with @eric-wieser's comment here). And given this was a "it's not that important, but let's see what y'all think" kind of change anyway, my impression is that we shouldn't have merged it.

mattip · 2021-08-31T12:22:53Z

Looking at that example, once they figured out the problem, there is this message:

Using the oldest-supported-numpy as build dependency worked, and the wheels got built on the first try. I don't think that's ever happened before ;)

so it seems like this PR has the side-effect of helping projects clean up their builds. :)

We never remove fields. And if we reject the approach in this PR neither will we be able to add fields. We do not do that regularly, this was the first attempt to change PyArrayObject in many years.

my impression is that we shouldn't have merged it

We wanted to see what would happen if we changed the size: this is a canary for the further proposed changes to structs that appear in the cython API (PyArrayObject and PyArray_Descr are the biggest but there are more). If we cannot change structs then we cannot accept NEP 49 (allocator strategies) and the dtype refactor will have to be much more limited. We did discuss this many times at the developer and triage meetings before we decided to go ahead with it.

rgommers · 2021-09-01T07:03:57Z

We wanted to see what would happen if we changed the size: this is a canary for the further proposed changes

My conclusion is that each time we do something like this (the C API change in 1.19.2 comes to mind as well), we get multiple projects linking to the PR in question claiming we broke something - which we did, it's just not clear that that is actually the cause of their problem or not. Doing more of this seems unhealthy, at least without better diagnostics / safeguards.

But, I just checked and Chuck did increment the API version in 1.20 (which is the same as the NPY_FEATURE_VERSION), so I am a bit confused why our error is not raised. Does cython manage to check this based on the Python side import numpy before it actually calls the C-side _import_array()?

This one (from @seberg's comment above) we should get to the bottom of. If the problem is indeed just "using a package that was built against a newer numpy with an older numpy at runtime", then our error should say that on import. If we continue to get:

ValueError: numpy.ndarray size changed, may indicate binary incompatibility

from Cython, that'll be super unhelpful for everyone.

winash12 · 2021-09-01T07:09:21Z

looks like we should learn this lesson and not do this kind of thing again unless we bump the major version.

I am not convinced by that issue. If I read it correctly, the user was building locally using python setup.py install, wit a newer NumPy, but then ran their code with an older NumPy. Building locally with python setup.py avoids all the attempts by a build system to pin required package versions.

The alternative is to never change NumPy structs used by cython, which is also painful.

Just to expand a bit: the problem is due to

the user having a version of numpy in their environment before this change (say 1.19)

installing a package that somehow builds using cython and a newer version of NumPy (say 1.20)

Then cython codes the size of the new (larger) struct, and raises an error when seeing the smaller one. The other direction (building with 1.19, running with 1.21) works, although it may warn.

Just to clarify. I only have one version of numpy on my machine that I installed myself using setup.py. The problem here is h5py which uses a numpy egg file while building that is version 1.17. This new egg file is present within the h5py installation. That is the reason for the error on my machine. Please see this comment - h5py/h5py#1955 (comment)

mattip · 2021-09-01T07:29:10Z

@winash12 Since that issue is closed I don't want to rehash it here. The question here is "can we give a better error when a package is built against a newer numpy but uses an older numpy at runtime".

seberg · 2021-09-07T17:08:51Z

Do I read the latest discussion right if I say: Real progress here would be trying to make the cython error more clear, or even seeing if we can get cython to call import_array() first (which probably already gives a better warning/error?)

EDIT: Which means someone should probably open a cython issue?

rgommers · 2021-09-07T17:20:56Z

Yes, that sounds right. If we could just make it say "Installed numpy version is older than the numpy version that this package was built against" or something like that, that would make it a lot less painful to debug issues that are potentially due to PRs like this one.

mattip reviewed Jul 24, 2020

View reviewed changes

numpy/core/include/numpy/ndarraytypes.h Show resolved Hide resolved

charris added 01 - Enhancement 25 - WIP component: numpy._core labels Jul 25, 2020

seberg mentioned this pull request Aug 13, 2020

BUG: Fix memory leak of buffer-info cache due to relaxed strides #16936

Merged

seberg mentioned this pull request Oct 19, 2020

ENH: Configurable allocator #17582

Merged

seberg force-pushed the simplify-buffers branch from 8c18128 to 0ba9870 Compare October 26, 2020 15:34

seberg force-pushed the simplify-buffers branch 3 times, most recently from f9e81e2 to 85fb053 Compare November 6, 2020 22:20

seberg changed the title ~~ENH: Store exported buffer info on the array~~ ENH,API: Store exported buffer info on the array Nov 6, 2020

seberg marked this pull request as ready for review November 6, 2020 22:22

seberg force-pushed the simplify-buffers branch from 85fb053 to e14b63b Compare November 6, 2020 23:03

seberg force-pushed the simplify-buffers branch from 00ae1bf to fcdff4c Compare November 7, 2020 01:06

seberg force-pushed the simplify-buffers branch from fcdff4c to 34281f3 Compare November 12, 2020 19:20

seberg force-pushed the simplify-buffers branch from 34281f3 to f90cbd2 Compare November 12, 2020 19:29

mattip reviewed Nov 12, 2020

View reviewed changes

seberg removed the 25 - WIP label Nov 18, 2020

seberg added this to the 1.20.0 release milestone Nov 18, 2020

mattip self-requested a review November 18, 2020 19:37

piskvorky mentioned this pull request Mar 23, 2021

Prepare 4.0.0 release candidate 1 piskvorky/gensim#3085

Closed

This was referenced Mar 30, 2021

import gensim fails on Windows with numpy 1.19.5 piskvorky/gensim#3095

Closed

Can't import gensim library - Python 3.8.5 + numpy 1.20.2 piskvorky/gensim#3097

Open

j1n6 mentioned this pull request May 21, 2021

numpy causing various errors ddangelov/Top2Vec#125

Closed

r9y9 added a commit to r9y9/Python-Wrapper-for-World-Vocoder that referenced this pull request May 24, 2021

Use numpy<=1.20.0 to build C extensions

339a8c3

NumPy ABI issue: numpy/numpy#16938 Ref: JP article (https://zenn.dev/ymd_h/articles/934a90e1468a05)

rgommers mentioned this pull request Aug 30, 2021

BUG: Unable to install h5py when scipy 1.7.1 egg file is present scipy/scipy#14652

Closed

winash12 mentioned this pull request Aug 31, 2021

Unable to install h5py 3.4 h5py/h5py#1955

Closed

seberg mentioned this pull request Sep 7, 2021

[ENH] Improve the error message for struct size change cython/cython#4366

Open

WarrenWeckesser mentioned this pull request Sep 22, 2021

gentype_alloc in scalartypes.c.src does not check for memory error. #5570

Closed

hofaflo mentioned this pull request Jan 21, 2022

use oldest-supported-numpy to build holgern/pyedflib#164

Merged


		#define NPY_SIZEOF_PYARRAYOBJECT (PyArray_Type.tp_basicsize)

		since the size should not be considered a compile time constant.

	since the size should not be considered a compile time constant.
	Making this a compile-time constant may lead to bugs, but it is part of the public headers.

ENH,API: Store exported buffer info on the array #16938

ENH,API: Store exported buffer info on the array #16938

Conversation

seberg commented Jul 23, 2020 • edited Loading

seberg commented Jul 24, 2020

mattip commented Jul 24, 2020

mattip commented Nov 2, 2020

seberg commented Nov 2, 2020

seberg commented Nov 6, 2020 • edited Loading

seberg commented Nov 7, 2020 • edited Loading

mattip commented Nov 12, 2020

seberg commented Nov 12, 2020 • edited Loading

seberg commented Nov 12, 2020

mattip Nov 12, 2020

Choose a reason for hiding this comment

seberg Nov 12, 2020 • edited Loading

Choose a reason for hiding this comment

mattip Nov 19, 2020

Choose a reason for hiding this comment

seberg Nov 19, 2020

Choose a reason for hiding this comment

mattip Nov 20, 2020

Choose a reason for hiding this comment

seberg Nov 20, 2020 • edited Loading

Choose a reason for hiding this comment

mattip Nov 21, 2020

Choose a reason for hiding this comment

seberg Nov 22, 2020

Choose a reason for hiding this comment

charris commented Nov 21, 2020

rgommers commented Mar 3, 2021 • edited Loading

seberg commented Mar 3, 2021

ChristophWWagner commented Mar 3, 2021 • edited Loading

mattip commented Mar 3, 2021

mattip commented Mar 3, 2021

rgommers commented Mar 3, 2021 • edited Loading

ChristophWWagner commented Mar 4, 2021

ChristophWWagner commented Mar 4, 2021

eric-wieser commented Mar 4, 2021 • edited Loading

rgommers commented Aug 30, 2021

winash12 commented Aug 31, 2021

mattip commented Aug 31, 2021

rgommers commented Aug 31, 2021

mattip commented Aug 31, 2021

rgommers commented Sep 1, 2021

winash12 commented Sep 1, 2021 • edited Loading

mattip commented Sep 1, 2021

seberg commented Sep 7, 2021 • edited Loading

rgommers commented Sep 7, 2021

seberg commented Jul 23, 2020 •

edited

Loading

seberg commented Nov 6, 2020 •

edited

Loading

seberg commented Nov 7, 2020 •

edited

Loading

seberg commented Nov 12, 2020 •

edited

Loading

seberg Nov 12, 2020 •

edited

Loading

seberg Nov 20, 2020 •

edited

Loading

rgommers commented Mar 3, 2021 •

edited

Loading

ChristophWWagner commented Mar 3, 2021 •

edited

Loading

rgommers commented Mar 3, 2021 •

edited

Loading

eric-wieser commented Mar 4, 2021 •

edited

Loading

winash12 commented Sep 1, 2021 •

edited

Loading

seberg commented Sep 7, 2021 •

edited

Loading