Skip to content

Reimpl Buffer Protocol and memoryview support ndarray with shape, stride and suboffset #3340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Nov 13, 2021

Conversation

qingshi163
Copy link
Contributor

There was a misunderstanding about the code for buffer protocol I wrote at beginning, so the related pr #3029 and #3043 finally broke the memoryview(the test did not show). When I review there pr I also just broken the mind, really sorry about that.
But in the pr I reimplement the protocol with more efficient way(thanks for #3121), this should works better.

Comment on lines 1325 to 1335
#[derive(Debug)]
#[pyclass(noattr, module = false, name = "_buffered_raw_buffer")]
#[derive(Debug, PyValue)]
struct BufferedRawBuffer {
data: PyMutex<Vec<u8>>,
range: Range<usize>,
}
impl BufferInternal for BufferedRawBuffer {
fn obj_bytes(&self) -> BorrowedValue<[u8]> {
BorrowedValue::map(self.data.lock().into(), |data| &data[self.range.clone()])
}
#[pyimpl(with(AsBuffer))]
impl BufferedRawBuffer {}

fn obj_bytes_mut(&self) -> BorrowedValueMut<[u8]> {
BorrowedValueMut::map(self.data.lock().into(), |data| {
&mut data[self.range.clone()]
})
}
static BUFFERED_RAW_BUFFER_METHODS: BufferMethods = BufferMethods {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to implement BufferedRawBuffer as an internal class so we can downcast from pyobjectref when the buffer methods is called. but I am getting panic for 'static type has not been initialized', anything missing here?
@youknowone

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, I didn't catch I was mentioned here. You added pyclass to BufferredRawBuffer but with noattr. I think a class is initialized only when it is a pyattr. In this case, you can manually initailize it in make_modue. Look for noattr in vm/src/stdlib/sys.rs. There are a few examples about it.
Probably we'd better to fix pymodule macro to initialize them regardless of noattr

Comment on lines 12 to 17
// GUARANTEE: called only if the buffer option is contiguous
pub contiguous: Option<fn(&PyObjectRef) -> BorrowedValue<[u8]>>,
// GUARANTEE: called only if the buffer option is contiguous
pub contiguous_mut: Option<fn(&PyObjectRef) -> BorrowedValueMut<[u8]>>,
// collect bytes to buf when buffer option is not contiguous
pub collect_bytes: Option<fn(&PyObjectRef, buf: &mut Vec<u8>)>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three function should return the bytes that available in the current class. compare to obj_bytes and obj_bytes_mut that should return the whole bytes for the top class.

@@ -167,6 +168,7 @@ impl TypeZoo {
none_type: singletons::PyNone::init_bare_type().clone(),
not_implemented_type: singletons::PyNotImplemented::init_bare_type().clone(),
generic_alias_type: genericalias::PyGenericAlias::init_bare_type().clone(),
vec_buffer_type: VecBuffer::init_bare_type().clone(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had put the type here, it works, but I don't know does it has better solution?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't need to use it through vm.ctx.types.vec_buffer_type, you don't need to add a member here. static types will be saved for each of their static_class.
(In other word, if you need to use it like that, this is the right place.)

@youknowone
Copy link
Member

How CPython handle it? Are those python classes there too?

@qingshi163
Copy link
Contributor Author

How CPython handle it? Are those python classes there too?

CPython is using void* with buffer options for the buffer data, we cannot do like that because we may have the lock before the data, so we have to visit data via function that base on each class.

Secondly, CPython manage the exports by manually call PyBuffer_Release that will call the bf_releasebuffer slots, we are manage it more safer way by wrap it with PyBuffer.

As we are using static functions for what buffer need, our appoarch shouldn't have a lot of overhead.

For contiguous, contiguous_mut and collect_bytes, I think there is a way to rid out it by move some logic from memoryview to PyBuffer, I will work on it push to here.

For VecBuffer, CPython don't have it because it use void pointer, we need type to retrive the data from the python object so I use VecBuffer to wrap a Vec to support as a buffer.

@youknowone
Copy link
Member

For the polymorphism using void* cases, It sounds more like trait object like &dyn T or *dyn T than a python class to me. Did you also checked this option? (It could be totally wrong. I didn't look in the details.)

@qingshi163
Copy link
Contributor Author

For the polymorphism using void* cases, It sounds more like trait object like &dyn T or *dyn T than a python class to me. Did you also checked this option? (It could be totally wrong. I didn't look in the details.)

The original design for void* case is to use dyn T(dyn BufferInternal).
The issue for dyn T is,

  • first, PyBuffer own an obj as PyObjectRef, if we are going to have another field for dyn T, we doubled the rc cost.
  • second, for us able to get buffer via python object, we are implement BufferInternal for PyRef<T: PyValue>, this is another rc. so finally the field type will be like PyRc<PyRef<dyn BufferInternal>>

so I decide to use static functions and downcast the obj to visit the class. the only issue for it is when we need buffer a rust vec, we need a wrap object that implement PyValue so we can able to downcast it. but I think that is only const cost when initialize and zero cost on the runtime.

@qingshi163 qingshi163 changed the title [WIP] Fix buffer protocol and memoryview [WIP] Reimpl Buffer Protocol and memoryview support ndarray with shape, stride and suboffset Nov 3, 2021
@@ -81,6 +81,7 @@ pub struct TypeZoo {
pub none_type: PyTypeRef,
pub not_implemented_type: PyTypeRef,
pub generic_alias_type: PyTypeRef,
pub vec_buffer_type: PyTypeRef,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youknowone can you help me to move it away?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attached a commit for this

@qingshi163 qingshi163 changed the title [WIP] Reimpl Buffer Protocol and memoryview support ndarray with shape, stride and suboffset Reimpl Buffer Protocol and memoryview support ndarray with shape, stride and suboffset Nov 5, 2021
@qingshi163 qingshi163 marked this pull request as ready for review November 5, 2021 07:12
@qingshi163 qingshi163 requested a review from youknowone November 5, 2021 07:12
}
}

pub struct SaturatedSliceIterator {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust iterators are typically named Iter (mostly because Iterator is a trait name)

Suggested change
pub struct SaturatedSliceIterator {
pub struct SaturatedSliceIter {

Comment on lines 326 to 328
pub index: isize,
pub step: isize,
pub len: usize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fields of iterator, but is it pub?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it should not be pub.

self.internal.obj_bytes().to_vec()
/// # Safety
/// assume the buffer is contiguous
pub unsafe fn contiguous(&self) -> BorrowedValue<[u8]> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are going to name this under rust convention,

Suggested change
pub unsafe fn contiguous(&self) -> BorrowedValue<[u8]> {
pub unsafe fn contiguous_unchecked(&self) -> BorrowedValue<[u8]> {

self.obj_bytes_mut()
}

pub fn collect(&self, buf: &mut Vec<u8>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collect typically returns the collection itself, but this is extending existing vector.

Suggested change
pub fn collect(&self, buf: &mut Vec<u8>) {
pub fn append_to(&self, buf: &mut Vec<u8>) {


/// this function do not check the bound
/// panic if indices.len() != ndim
pub fn get_position_fast(&self, indices: &[usize]) -> isize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the name get_ is not coming from CPython, I want to suggest to remove it because this function is not perform getter or getitem. Just fast_position like fast_getitem?

@qingshi163
Copy link
Contributor Author

#3244

@qingshi163 qingshi163 requested a review from youknowone November 8, 2021 19:24
Comment on lines 96 to 106
let format_spec = Self::parse_format(&buffer.desc.format, vm)?;
let desc = buffer.desc.clone();

let mut zelf = PyMemoryView {
buffer: ManuallyDrop::new(buffer),
released: AtomicCell::new(false),
start: range.start * itemsize,
stop: range.end * itemsize,
step: 1,
start: 0,
format_spec,
desc,
hash: OnceCell::new(),
}
.validate())
}

fn as_contiguous(&self) -> Option<BorrowedValue<[u8]>> {
if !self.buffer.options.contiguous {
return None;
}
Some(self.obj_bytes())
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these lines looks same as from_buffer. then will let mut zelf = Self::from_buffer(buffer, vm) works?


fn retain(&self) {}
}
fn collect(&self, buf: &mut Vec<u8>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is another collect not returning buffer

@@ -123,17 +147,293 @@ impl TryFromBorrowedObject for PyBuffer {
// but it is not supported by Rust
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment looks outdated


#[pyimpl(flags(BASETYPE), with(Constructor))]
impl VecBuffer {
pub fn new(data: Vec<u8>) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From is typically implemented for conversion

@qingshi163
Copy link
Contributor Author

We should have the _test_buffer for multi-dimention buffer test. but I think this pr need to be merge now to fix the existing issue.

@qingshi163 qingshi163 merged commit 742ea0c into RustPython:main Nov 13, 2021
@qingshi163 qingshi163 deleted the fix-buffer branch July 14, 2023 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants