Skip to content

Implement Buffer Protocol #2226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Oct 10, 2020
Merged

Conversation

qingshi163
Copy link
Contributor

I add a slot for implement the buffer protocol, more or less like what CPython doing. I am not sure is the right way for RustPython. Also I am confusing about the slots seem do not inherit from the base? So how we control the behavour of the subclass?

@youknowone
Copy link
Member

Slots is a data field of type (PyClass). For now, we typically iterate mro to find proper field of slots.

use crate::{bytesinner::try_as_bytes, pyobject::IntoPyObject};
use crossbeam_utils::atomic::AtomicCell;

pub trait BufferProtocol: Debug + Sync + Send {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of Send + Sync here, you probably want to use PyThreadingConstraint from pyobject

@youknowone
Copy link
Member

related to #2125 and #2195, right?

@qingshi163
Copy link
Contributor Author

qingshi163 commented Sep 24, 2020

#2195 I have done it, I will push out when I passed the memoryview unittest.
#2125 I don't think it works with buffer, we have to do something inside objbytes.
@youknowone

@youknowone
Copy link
Member

youknowone commented Sep 25, 2020

@qingshi163 I attached a commit to change Box to plain fn c0ecc35 . Because this is using read-only slot unlike other ones, It needed different storing method. I couldn't catch it on the previous comments.

vm/src/slots.rs Outdated
@@ -376,3 +381,14 @@ impl PyComparisonOp {
}
}
}

#[pyimpl]
pub trait Bufferable: PyValue {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original name AsBuffer you used was better. "hashable object" is a term in python(or cpython?) but bufferable is not.

Though "comparable object" is also nothing in python. @coolreader18 any idea about naming?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think AsBuffer is good, "buffer" isn't a verb so it doesn't really make sense as Verbable and there's precedence for AsNoun as a trait name in Rust, e.g. AsRawFd, AsRef

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AsBuffer is good, but the trait function name I changed it to get_buffer as same as CPython, also because it did return a new BufferProtocol. so we should go back to AsBuffer or something like GetBuffer? what we perfer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think we can keep it as AsBuffer

Copy link
Member

@youknowone youknowone Sep 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like the implementation trait name can be Buffer and the desriptor trait name can be BufferProtocol. Like,

  • AsBuffer(Bufferable) -> BufferProtocol
  • BufferProtocol -> Buffer

Also I personally prefer to reuse CPython terms if there is no special reason not to do. Because we are practically read and port CPython code everyday. So always +1 for following CPython naming.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, since a trait is inherently a "protocol", so there's no need to specify that stuff that a buffer can do is a protocol; it's redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good!

Comment on lines +31 to +65

fn as_contiguous(&self) -> Option<BorrowedValue<[u8]>> {
let options = self.get_options();
if !options.contiguous {
return None;
}
Some(self.obj_bytes())
}

fn as_contiguous_mut(&self) -> Option<BorrowedValueMut<[u8]>> {
let options = self.get_options();
if !options.contiguous {
return None;
}
Some(self.obj_bytes_mut())
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the consumer should call as_contiguous if they want just a plain buffer. Should we also need to add a function here for consuming a non-contiguous buffer like a sliced memoryview? Like a iter() return the unpacked object?

Comment on lines 24 to 48
pub trait BufferProtocol: Debug + PyThreadingConstraint {
// TODO: return reference to avoid copy
fn get_options(&self) -> BufferOptions;
fn obj_bytes(&self) -> BorrowedValue<[u8]>;
fn obj_bytes_mut(&self) -> BorrowedValueMut<[u8]>;
fn release(&self);
fn is_resizable(&self) -> bool;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed as_bytes to obj_bytes because it should always return the full memory range for the original object. as_bytes may easier to be miss using.

Comment on lines 57 to 68
impl FormatCode {
fn unit_size(&self) -> usize {
// XXX: size of l L q Q is platform depended?
match self.code {
'x' | 'c' | 'b' | 'B' | '?' | 's' | 'p' => 1,
'h' | 'H' => 2,
'i' | 'l' | 'I' | 'L' | 'f' => 4,
'q' | 'Q' | 'd' => 8,
'i' | 'I' | 'f' => 4,
'l' | 'L' | 'q' | 'Q' | 'd' => 8,
'n' | 'N' | 'P' => std::mem::size_of::<usize>(),
c => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to change the unit size for 'l' and 'L' to 8, because that is how array.array doing. I don't know is any compatible problem, but in my machine CPython 'l' and 'L' is 8 bytes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it architecture dependent value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    /* Integers */
    case 'h':
        intsize = sizeof(short);
        is_signed = 1;
        break;
    case 'H':
        intsize = sizeof(short);
        is_signed = 0;
        break;
    case 'i':
        intsize = sizeof(int);
        is_signed = 1;
        break;
    case 'I':
        intsize = sizeof(int);
        is_signed = 0;
        break;
    case 'l':
        intsize = sizeof(long);
        is_signed = 1;
        break;
    case 'L':
        intsize = sizeof(long);
        is_signed = 0;
        break;
    case 'q':
        intsize = sizeof(long long);
        is_signed = 1;
        break;
    case 'Q':
        intsize = sizeof(long long);
        is_signed = 0;
        break;

This is from CPython, should we put it somewhere together? because we have to make sure array.array and struct have the same size for the type.

Copy link
Member

@youknowone youknowone Sep 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For common platforms, using std::mem::size_of::<isize> or cfg target_pointer_width work. It fits for common platforms but not perfect. using libc always will fit as it defined in CPython. Try std::mem::size_of::<libc::c_int>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do it in another PR and try unify the reflect from format code to size and the type.

Comment on lines 168 to 169
drop(s.1);
read_rwlock(s.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
drop(s.1);
read_rwlock(s.0)
s.1

Self::Ref(r) => BorrowedValue::Ref(f(r)),
Self::MuLock(m) => BorrowedValue::MappedMuLock(PyMutexGuard::map(m, |x| unsafe {
#[allow(mutable_transmutes, clippy::transmute_ptr_to_ptr)]
std::mem::transmute(f(x))
Copy link
Member

@coolreader18 coolreader18 Sep 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't sound -- somebody could match to get the mutex guard out of the enum, and then treat it as mutable. I ran into the same problem, and I'm not sure exactly what the solution is. Maybe try something in cell? PyImmutableMappedMutexGuard or something?

Copy link
Member

@coolreader18 coolreader18 Sep 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could try making that later today if you don't feel confident about it; I think it would be like { data: *const T, raw: &'a parking_lot::RawMutex, _marker: PhantomData<(&'a T, RawMutex::GuardMarker)> }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can fix that will be perfect, I have no clue to get it soundless..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coolreader18 will you merge #2239 before or after the pr?

@qingshi163 qingshi163 force-pushed the buffer_protocol branch 2 times, most recently from 3403be7 to 9e5032f Compare October 1, 2020 09:39
@coolreader18
Copy link
Member

Oooh, here's an idea: Buffer could require PyObjectPayload as a supertrait, and then get_buffer() could return a PyObjectRc<dyn Buffer>, and then the Buffer and the obj in the memoryview could be the same thing.

@qingshi163
Copy link
Contributor Author

But the obj and the buffer in the memoryview could be different, the obj should always point to the original data source and the buffer is the controller shows how can we visit the data. That is how I implement it as the memoryview build from a memoryview the obj is cloned from original object.

@qingshi163 qingshi163 marked this pull request as ready for review October 1, 2020 18:45
@qingshi163 qingshi163 changed the title [WIP] Implement Buffer Protocol Implement Buffer Protocol Oct 1, 2020
@qingshi163
Copy link
Contributor Author

I want to know if it is ok to merge, it is not fully completed but works.

@coolreader18
Copy link
Member

Oh, I suppose that's true. Nevermind

@@ -114,11 +112,13 @@ def setitem(key, value):
self.assertRaises(TypeError, setitem, "a", b"a")
# Not implemented: multidimensional slices
slices = (slice(0,1,1), slice(0,1,2))
self.assertRaises(NotImplementedError, setitem, slices, b"a")
# TODO: RUSTPYTHON
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this PR enabled a lot of parts from this test, but unless it is fully resolved, I prefer to keep the whole test as expectedFailure before merge. this is only trackable by code, but expectedFailure is trackable by test result. Once the feature is fully done, it will be detected if it is expectedFailiure but not if it is commented out.

@@ -6,7 +6,7 @@
a = memoryview(obj)
assert a.obj == obj

assert a[2:3] == b"c"
# assert a[2:3] == b"c"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intended?

l @ PyList => l.to_byte_inner(vm),
// TODO: PyTyple
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyTuple?

@qingshi163 qingshi163 force-pushed the buffer_protocol branch 2 times, most recently from 6c72e69 to 9c3d97b Compare October 2, 2020 13:27
Comment on lines 263 to 271
op: PyComparisonOp,
vm: &VirtualMachine,
) -> PyComparisonValue {
// TODO: bytes can compare with any object implemented buffer protocol
// but not memoryview, and not equal if compare with unicode str(PyStr)
PyComparisonValue::from_option(
try_bytes_like(vm, other, |other| {
op.eval_ord(self.elements.as_slice().cmp(other))
Copy link
Contributor Author

@qingshi163 qingshi163 Oct 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we have different behavour between (memoryview op bytes) and (bytes op memoryview), Is any solution better than check the type here?
@coolreader18

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does memory view not implement the buffer protocol? I think that would maybe fix the issue(?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But memoryview is implemented buffer protocol, we will have much more issue if not so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, that's strange, I think that's probably fine for now.

Comment on lines 49 to 54
// TODO: generic way from &[PyObjectRef]
l @ PyList => l.to_byte_inner(vm),
t @ PyTuple => t.to_bytes_inner(vm),
obj => {
let iter = vm.get_method_or_type_error(obj.clone(), "__iter__", || {
format!("a bytes-like object is required, not {}", obj.class())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could have a trait for the objects that can be borrowed as &[PyObjectRef], so we can have a generic and efficiency way to iter.

@qingshi163
Copy link
Contributor Author

What that random ci fail comes from?

@qingshi163 qingshi163 force-pushed the buffer_protocol branch 2 times, most recently from dcca305 to 4e08961 Compare October 7, 2020 15:36
@coolreader18
Copy link
Member

@qingshi163 I rebased this to master with the new mutex/BorrowedValue::map implementations; it put me as a committer for all the commits but hopefully you can just rebase it yourself and it would get rid of that.

@qingshi163
Copy link
Contributor Author

@coolreader18 Thanks, I think I need review for merge now.

Copy link
Member

@coolreader18 coolreader18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this overall looks really good!! I'm fine with merging it and fixing leftover issues later, since this is so big already. @youknowone what do you think?

@coolreader18
Copy link
Member

And don't worry about that CI failure; I can fix that when I merge.

@youknowone
Copy link
Member

oops, after a few days of resting, I totally forgot to merge this first before other PRs. I agree to merge this and fix other stuffs later. Let me fix the conflicts.

@youknowone
Copy link
Member

youknowone commented Oct 10, 2020

I fixed a few commit messages because fisrt two 'fix test' commits are not fixing test but disabling it.
I also reordered commits a bit to merge "update memoryview eq"s to be single commit and also "Implement buffer protocol" and "Rename to AsBuffer". The contents of them are not changed except for conflict resolution.

@youknowone youknowone merged commit 54cfdf2 into RustPython:master Oct 10, 2020
@qingshi163 qingshi163 deleted the buffer_protocol branch December 7, 2020 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants