-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Configurable allocator #17582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Configurable allocator #17582
Conversation
@@ -384,7 +384,8 @@ array_data_set(PyArrayObject *self, PyObject *op) | |||
} | |||
if (PyArray_FLAGS(self) & NPY_ARRAY_OWNDATA) { | |||
PyArray_XDECREF(self); | |||
PyDataMem_FREE(PyArray_DATA(self)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have been npy_free_cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this raises a possible risk in these changes - as I understand it, it didn't matter if the npy_*_cache
functions were used for the allocation and not the free or vice versa; everything would behave correctly, it would just affect the cache capacity.
With the changes in this PR, either:
- We make the same promise to user hooks, that things you free may be deallocated by someone else or vice versa
- We need to audit every free in our code (and likely, accept bug reports for a couple releases for segfaults due to ones we miss)
I suppose as long as our default allocator is happy to accept raw mallocs and frees, this won't actually result in a regression.
@@ -1991,7 +1991,8 @@ array_setstate(PyArrayObject *self, PyObject *args) | |||
} | |||
|
|||
if ((PyArray_FLAGS(self) & NPY_ARRAY_OWNDATA)) { | |||
PyDataMem_FREE(PyArray_DATA(self)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memory handling in this file ignored npy_*_cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, github showed my comment twice, but it was only there once...
After lots of debugging (first with valgrind, but that only showed me that size 1 (or 0) allocations were added on the cache as larger ones... And annoying, slow and careful stepping through with gdb:
At this point the size of the array is already modified, so freeing (and adding to cache) just doesn't work unless you calculate the size earlier (and maybe move the free to an earlier point). This function is a bit weird, maybe it makes sense to NULL data right away? Its likely still buggy, but at least we can't run into a double-free then.
numpy/core/tests/test_mem_policy.py
Outdated
# so this "if" will be false, preventing infinite recursion | ||
# | ||
# if needed, debug this by setting extra_argv=['-vvx'] | ||
np.core.test(verbose=0, extra_argv=[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call will run all the np.core
tests with the new memory routines. Since the new routines alloc
s but then returns a pointer shifted by 64 bytes, if the calls are mismatched glibc
will segfault when free
ing the wrong address.
|
||
.. code-block:: c | ||
|
||
typedef struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to reference the builtin PyMemAllocatorEx
somewhere in this section and compare the two (https://docs.python.org/3/c-api/memory.html#customize-memory-allocators).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add. The original npy_*_cache
set of functions uses PyDataMem_*
functions, which it seems preceeded these more sophisticated interfaces and directly used malloc
/calloc
/free
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documented. The Python ones do not use our non-documented-but-public PyDataMem_EventHookFunc
callbacks. So I could see a path where we deprecate PyDataMem_EventHookFunc
and move to the Python memory management strategies, although that would mean
- if someone by chance implemented a
PyDataMem_EventHookFunc
callback it would no longer work. - in order to override data allocations, a user would also override other random memory allocations
f71fd5b
to
811b9e4
Compare
rebase and force-psuhed to clear merge conflict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A fairly quick pass, looks nice! And the addition of a test run helper that build test modules from strings is probably overdue :).
Additionally to the inline comments, I think I would like an NPY_USE_DEBUG_ALLOCATOR=1
test run, so that we run the full test suit with an allocator that would blow up if there are incompatibilities/code paths that were forgotten. I guess all that would be needed is allocate always 32 bytes or so extra bytes and store the shifted pointer (probably must remain aligned, or the test suit will go down in flames).
I don't know the needs of those who would like to see such an allocator well, though, so it would be interesting whether we could anticipate the API to grow.
Moving the pointer is what the test I wrote does, either it is not documented well enough or I didn't understand the comment. |
Sorry @mattip, I did not read the tests carefully yet, my main point was, that it would be nice to run the full test suit with a custom allocator set, and I don't think you are doing that yet? |
Correct. I am only running the tests under |
@mattip I was thinking we could run a single CI job with it? That would test it thoroughly also in the future without really adding much time to the whole process. |
Thanks Matti. Just downloaded the version and took a look... here are my thoughts.
This is what I currently see.
I would try to plumb this structure for the future. I would put a struct version high/low. And I would also include FEATURE bits for the memory allocation. I imagine this structure will change in the future as new features are possible. For small vs large it would be nice to know what the cutoff sizes for both are.. and maybe they are settable in the future. Another feature might be GPU aware or NUMA node aware. Then if each allocator chains to the next (see the next pointer below).. that would be a nice feature. So the struct might end up looking something like this...
|
@tdimitri thanks for the review. Maybe Yes, this is the first time this struct is being exposed. We use the NumPy-wide I am concerned with "feature creep" when adding bits that describe aspects of the functions. Since the |
I agree that any memory handler has to follow some rules like chain to the previous handler, and fill in any NULL function pointers with the previous value or as you point out In both Linux and NT kernels, it is common to chain and use the first member of the struct to chain. There are macros in both kernels to do this. I think this is a good practice that I have used myself. As per feature creep, the flip side is no or few new features? How often is this really going to change and force a stack recompile? As per Information like this (and more) can be returned in an info call. Per byte alignment, let's say the current allocator already does 32 byte alignment, and I have AVX-2526 -- then no need to change. But then we detect AVX-512 capability on another computer, now I need to replace handler with 64 byte alignment. What happens in the future when a version 1.0 memory handler calls SetHandler that is version 2.0 ? That might fail, which is why I thought it could fail. On another note, not sure why For riptable, we can also flush the cache or return statistics on the cache. For instance, we return the hit/miss ratio. We also have a timeout for the garbage collector that can be changed. In numpy, code can be written to flush the cache, but there is no mechanism for it. Another possible addition to the Struct is a field reserved ( Then if memory handlers are chained, might need a way to remove a "name" from the chain (or move to front). |
Sounds like there is enough reason that we may want to grow the struct/feature list in the future. So it probably does make sense to make the struct opaque and maybe version it. I suppose going with a |
I think having a |
Any solution that provides 1) PyDataMem_GetHandler , 2) can chain and I can walk the chain, 3) can remove a handler from the chain. and 4) indicates what version of the memory handler (which i suppose too old a version can be rejected) in the future I am ok with and will write code to use. If up to me, I would not make the struct opaque. Thank you. |
Could you give a concrete use case for walking the chain of methods? What problem does this solve? |
If there are only two possible memory handlers at any point; The new one and the default one... then chaining might not matter. But I believe there can be > 2 as any module can try to Set the memory handler. Especially because arrays can stick around for the life of the process and only the memory handler that allocated can properly delete it. Per walking the chain -- aside from just general debugging to know who has hooked what... (which i prefer is exposed and not hidden) I could walk the chain, get to the last member of the chain or search for the "default" handler name. Then I would make note of the default alloc handler. Then when a small memory request came in, I might call the default handler since I know it already has a cache for small memory < 1024 in size. In general anytime I wanted to "punt" to the default routine I could. Or there could be two useful memory handlers in the chain -- perhaps one that is NUMA aware -- or for Windows it can allocate large pages (numpy does not currently support it). Maybe a module in the future does support it. I might search for and call that memory handler for large objects. windows large page support From my experience, it is common to be able call the next hook see:CallNextHookEx Or here is an example of chaining signal handlers in linux chain signal handler -- but what if the previous memory handler is removed? Then what was returned in SetMemoryHandler() (the previous one) is now gone but I still have a pointer to it. I might just call |
It seems this is complicated enough that a short NEP is needed. |
Since this changes the |
Just curious about the status of this PR? |
e81eff4
to
442b0e1
Compare
Sorry random question that I've been thinking about. Would it be possible to use this with shared memory? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Matti and Elias, great work keeping up with all the API request changes and introducing the context var!
As far as I am aware the public API is de-facto settled and this has gotten some pretty thorough review. The last real concern I had was the OWNDATA "transfer" which is addressed now.
I think at this point, progress on the PR would give diminishing returns and it seems stable enough to ship.
@numpy/numpy-developers If anyone has API concerns or finds any fixup requests, please do not hesitate to note them, I expect we still have some wiggle-room (but personally would be surprised about any serious change). But the next release is also soon ;).
Notes, to move to a new issue:
- Release note is missing?
- I would like a bit "hand holding" for the
arr->flags |= NPY_OWNDATA
user. I.e. tell users in some place, (maybe the flag documentation?) linked from the warning: Please use use aPyCapsule
as the array base (this is simple, but you have to know aboutPyCapsule
). (I could look into that myself.) PyCapsule_GetPointer
technically can set an error and may need the GIL? I am not worried enough about this to delay, but the functions look like they are meant to be callable without the GIL (I doubt we use it, and I doubt it really can error/need the GIL, but maybe we can make it a bit nicer anyway).- The
void
functions use (normally) small allocation: I think it would make more sense to just use the Python allocators rather than a (possibly slow) aligned allocator.
Lets try this 🎉, thanks again! |
Not sure I understand the question, but maybe I don't know the shared-memory use-case well enough. This allows you to customize allocations. I suppose you could do something:
But does that help?! |
Great work @mattip and everyone! Looking forward to the evolution of this allocator! @eliaskoromilas's numpy-allocator is great in that we have a Python-only playground, but I hope we can mature this support sooner so that the Python allocator API can be made as part of NumPy. |
Quite possibly, yes. Am still thinking through how this might work. Though being able to plug this into NumPy would be pretty useful. |
Fix astropy#12324 : Fix the warnings emitted by Numpy when transferring ownership by setting ``NPY_ARRAY_OWNDATA``: RuntimeWarning: Trying to dealloc data, but a memory policy is not set. If you take ownership of the data, you must set a base owning the data (e.g. a PyCapsule). The warning was added in [1], and is now visible only when setting the NUMPY_WARN_IF_NO_MEM_POLICY environment variable [2]. For more details and a fix see [3]. [1] numpy/numpy#17582 [2] numpy/numpy#20200 [3] https://github.com/numpy/numpy/blob/main/doc/source/reference/c-api/data_memory.rst#what-happens-when-deallocating-if-there-is-no-policy-set
Fix astropy#12324 : Fix the warnings emitted by Numpy when transferring ownership by setting ``NPY_ARRAY_OWNDATA``: RuntimeWarning: Trying to dealloc data, but a memory policy is not set. If you take ownership of the data, you must set a base owning the data (e.g. a PyCapsule). The warning was added in [1], and is now visible only when setting the NUMPY_WARN_IF_NO_MEM_POLICY environment variable [2]. For more details and a fix see [3]. [1] numpy/numpy#17582 [2] numpy/numpy#20200 [3] https://github.com/numpy/numpy/blob/main/doc/source/reference/c-api/data_memory.rst#what-happens-when-deallocating-if-there-is-no-policy-set
Fix astropy#12324 : Fix the warnings emitted by Numpy when transferring ownership by setting ``NPY_ARRAY_OWNDATA``: RuntimeWarning: Trying to dealloc data, but a memory policy is not set. If you take ownership of the data, you must set a base owning the data (e.g. a PyCapsule). The warning was added in [1], and is now visible only when setting the NUMPY_WARN_IF_NO_MEM_POLICY environment variable [2]. For more details and a fix see [3]. [1] numpy/numpy#17582 [2] numpy/numpy#20200 [3] https://github.com/numpy/numpy/blob/main/doc/source/reference/c-api/data_memory.rst#what-happens-when-deallocating-if-there-is-no-policy-set
Fix astropy#12324 : Fix the warnings emitted by Numpy when transferring ownership by setting ``NPY_ARRAY_OWNDATA``: RuntimeWarning: Trying to dealloc data, but a memory policy is not set. If you take ownership of the data, you must set a base owning the data (e.g. a PyCapsule). The warning was added in [1], and is now visible only when setting the NUMPY_WARN_IF_NO_MEM_POLICY environment variable [2]. For more details and a fix see [3]. [1] numpy/numpy#17582 [2] numpy/numpy#20200 [3] https://github.com/numpy/numpy/blob/main/doc/source/reference/c-api/data_memory.rst#what-happens-when-deallocating-if-there-is-no-policy-set
Fix astropy#12324 : Fix the warnings emitted by Numpy when transferring ownership by setting ``NPY_ARRAY_OWNDATA``: RuntimeWarning: Trying to dealloc data, but a memory policy is not set. If you take ownership of the data, you must set a base owning the data (e.g. a PyCapsule). The warning was added in [1], and is now visible only when setting the NUMPY_WARN_IF_NO_MEM_POLICY environment variable [2]. For more details and a fix see [3]. [1] numpy/numpy#17582 [2] numpy/numpy#20200 [3] https://github.com/numpy/numpy/blob/main/doc/source/reference/c-api/data_memory.rst#what-happens-when-deallocating-if-there-is-no-policy-set
Fix astropy#12324 : Fix the warnings emitted by Numpy when transferring ownership by setting ``NPY_ARRAY_OWNDATA``: RuntimeWarning: Trying to dealloc data, but a memory policy is not set. If you take ownership of the data, you must set a base owning the data (e.g. a PyCapsule). The warning was added in [1], and is now visible only when setting the NUMPY_WARN_IF_NO_MEM_POLICY environment variable [2]. For more details and a fix see [3]. [1] numpy/numpy#17582 [2] numpy/numpy#20200 [3] https://github.com/numpy/numpy/blob/main/doc/source/reference/c-api/data_memory.rst#what-happens-when-deallocating-if-there-is-no-policy-set
Fix astropy#12324 : Fix the warnings emitted by Numpy when transferring ownership by setting ``NPY_ARRAY_OWNDATA``: RuntimeWarning: Trying to dealloc data, but a memory policy is not set. If you take ownership of the data, you must set a base owning the data (e.g. a PyCapsule). The warning was added in [1], and is now visible only when setting the NUMPY_WARN_IF_NO_MEM_POLICY environment variable [2]. For more details and a fix see [3]. [1] numpy/numpy#17582 [2] numpy/numpy#20200 [3] https://github.com/numpy/numpy/blob/main/doc/source/reference/c-api/data_memory.rst#what-happens-when-deallocating-if-there-is-no-policy-set
Fixes gh-17467. Adds a public struct to hold memory manipulation routines
PyDataMem_Handler
and two new API functionsPyDataMem_SetHandler
to replace the current routines with the new ones, andPyDataMem_GetHandlerName
to get the string name of the current routines (either globally or for a specific ndarray object). This also changes the size of thendarray
object to hold thePyDataMem_Handler
active when it was created so subsequent actions on its data memory will remain consistent.Tests and documentation are included. Along the way, I found some places in the code where the current policy is inconsistent (all data memory handling should have gone through
npy_*_cache
notPyDataMem_*
) so even if this is rejected it might improve the cache handling.The
PyDataMem_Handler
has fields to overridememcpy
, these are currently not implemented:memcpy
in the code base is untouched. I think this PR is invasive enough as-is, if desiredmemcpy
can be handled in a follow-up PR.Once the general idea undergoes some review, this should hit the mailing list.