-
Notifications
You must be signed in to change notification settings - Fork 748
1838 Use Python buffer protocol when converting Python objects to .NET arrays #2372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If a python object is converted to a managed array and it exposes the buffer protocol the buffer can be copied directly to the managed array. For now this works only if the requested array element type is blittable.
Currently not handled by the buffer copying mechanism anyway
/// </summary> | ||
private static bool ToArray(BorrowedReference value, Type obType, out object? result, bool setError) | ||
{ | ||
Type elementType = obType.GetElementType(); | ||
result = null; | ||
|
||
// structs can also be blittable but for the standard usecases this suffices | ||
var isBlittable = elementType.IsPrimitive && elementType != typeof(char) && elementType != typeof(bool); | ||
if (isBlittable && Runtime.PyObject_GetBuffer(value, out var view, (int)PyBUF.CONTIG_RO) == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both directions need a check that source and target element types match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be ok to match the typecode of the .NET type to the corresponding format string of the buffer, i.e. roughly like so:
var compatible = false;
var format = buf.format;
switch (tc)
{
case TypeCode.Boolean:
compatible = format == "?";
break;
case TypeCode.Byte:
compatible = format == "b";
break;
//etc...
default:
break;
This would require #2371 to be fixed so I can actually retrieve the format field.
/// </summary> | ||
private static bool ToArray(BorrowedReference value, Type obType, out object? result, bool setError) | ||
{ | ||
Type elementType = obType.GetElementType(); | ||
result = null; | ||
|
||
// structs can also be blittable but for the standard usecases this suffices | ||
var isBlittable = elementType.IsPrimitive && elementType != typeof(char) && elementType != typeof(bool); | ||
if (isBlittable && Runtime.PyObject_GetBuffer(value, out var view, (int)PyBUF.CONTIG_RO) == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it is safe to assume getting buffer has the same semantics as using iterator for an arbitrary type that supports GetBuffer.
Imho, this could be made the default behavior for converting a fixed set of standard types, but for the others it might be better to expose some kind of explicit call instead of conversion. E.g. pyObjectInst.ReadBuffer(destinationArray)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this caught by the primitive blitable check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could move the whole thing to after the if (IterObject.IsNull())
check to make sure it's an iterable? Other than that as @filmor said I already allow only primitive blittable types as array element types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue is not about element types, but about container types. Does Python guarantee, that GetBuffer()[0], GetBuffer()[1], etc is the same sequence as .iter[0], .iter[1], etc for every type that implements GetBuffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that was the point of the buffer protocol :). I'll read up more on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chickenservice the concern is about breaking changes, especially within Python.NET 3.x, more especially within Python.NET 3.0.x.
If there's a difference in result, we should delay the change of behavior to 4.x, and in 3.x only add new functions.
IMHO, we can only immediately apply the speedups for the types that we know have the same behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lostmsu that makes sense. The tests cover bytearray, array.array and ndarray. From my experience these behave according to what you described and are the canonical implementations anyway. Any concerns? Are there other types you have in mind? Should we limit ndim == 1 for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the tests do not cover multidim or the behavior in multidim is different, we should limit ndim to 1.
But again, I strongly suggest to add an explicit new method to PyObject
that does the right thing with buffers, and only works with buffer protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I misunderstood your suggestion, to clarify:
For upcoming minor/patch release expose this as a dedicated method and remove it from ToArray
completely as breaking changes cannot be ruled out completely.
For the next major release reevaluate whether this can/should be done implicitly in ToArray
and accept breaking changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chickenservice yep, that's exactly right.
My related attempts / investigation on ways of NumPy <> C# arrays interop:
cc @myrthel |
Closing as the requested changes were not implemented. |
What does this implement/fix? Explain your changes.
If a python object exposes the buffer protocol then copy the buffer directly to a .NET array. This works for (blittable) single- or multidimensional arrays. I added tests for numpy arrays, bytearrays and python arrays to check whether this change works across different buffer protocol implementations.
My implementation currently has the following caveats:
ToArray
previously didn't handle, so we weaken a precondition of the converter. However efficient copying of multidimensional arrays is highly desirable in my opinion.Buffer.MemoryCopy
which is marked as not CLS-compliant. If this is not acceptable I will of course find another way.PyBuffer
if you want.Does this close any currently open issues?
1838 Use Python buffer protocol when converting Python objects to .NET arrays
Any other comments?
The test with bytearray to .NET byte array is currently failing for NET 6.0 due to the way the Py_Buffer struct is being marshalled. I opened a separate issue #2371 to address this.
Checklist
Check all those that are applicable and complete.
AUTHORS
CHANGELOG