-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Implement Buffer Protocol #2226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Slots is a data field of type (PyClass). For now, we typically iterate mro to find proper field of slots. |
vm/src/obj/objmemory.rs
Outdated
use crate::{bytesinner::try_as_bytes, pyobject::IntoPyObject}; | ||
use crossbeam_utils::atomic::AtomicCell; | ||
|
||
pub trait BufferProtocol: Debug + Sync + Send { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of Send + Sync
here, you probably want to use PyThreadingConstraint
from pyobject
#2195 I have done it, I will push out when I passed the memoryview unittest. |
7a3ae26
to
df78720
Compare
@qingshi163 I attached a commit to change Box to plain fn c0ecc35 . Because this is using read-only slot unlike other ones, It needed different storing method. I couldn't catch it on the previous comments. |
vm/src/slots.rs
Outdated
@@ -376,3 +381,14 @@ impl PyComparisonOp { | |||
} | |||
} | |||
} | |||
|
|||
#[pyimpl] | |||
pub trait Bufferable: PyValue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original name AsBuffer
you used was better. "hashable object" is a term in python(or cpython?) but bufferable is not.
Though "comparable object" is also nothing in python. @coolreader18 any idea about naming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think AsBuffer is good, "buffer" isn't a verb so it doesn't really make sense as Verbable
and there's precedence for AsNoun
as a trait name in Rust, e.g. AsRawFd
, AsRef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AsBuffer is good, but the trait function name I changed it to get_buffer as same as CPython, also because it did return a new BufferProtocol. so we should go back to AsBuffer or something like GetBuffer? what we perfer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think we can keep it as AsBuffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like the implementation trait name can be Buffer
and the desriptor trait name can be BufferProtocol
. Like,
- AsBuffer(Bufferable) -> BufferProtocol
- BufferProtocol -> Buffer
Also I personally prefer to reuse CPython terms if there is no special reason not to do. Because we are practically read and port CPython code everyday. So always +1 for following CPython naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, since a trait is inherently a "protocol", so there's no need to specify that stuff that a buffer can do is a protocol; it's redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good!
c0ecc35
to
e769804
Compare
|
||
fn as_contiguous(&self) -> Option<BorrowedValue<[u8]>> { | ||
let options = self.get_options(); | ||
if !options.contiguous { | ||
return None; | ||
} | ||
Some(self.obj_bytes()) | ||
} | ||
|
||
fn as_contiguous_mut(&self) -> Option<BorrowedValueMut<[u8]>> { | ||
let options = self.get_options(); | ||
if !options.contiguous { | ||
return None; | ||
} | ||
Some(self.obj_bytes_mut()) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the consumer should call as_contiguous if they want just a plain buffer. Should we also need to add a function here for consuming a non-contiguous buffer like a sliced memoryview? Like a iter() return the unpacked object?
vm/src/obj/objmemory.rs
Outdated
pub trait BufferProtocol: Debug + PyThreadingConstraint { | ||
// TODO: return reference to avoid copy | ||
fn get_options(&self) -> BufferOptions; | ||
fn obj_bytes(&self) -> BorrowedValue<[u8]>; | ||
fn obj_bytes_mut(&self) -> BorrowedValueMut<[u8]>; | ||
fn release(&self); | ||
fn is_resizable(&self) -> bool; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed as_bytes to obj_bytes because it should always return the full memory range for the original object. as_bytes may easier to be miss using.
vm/src/stdlib/pystruct.rs
Outdated
impl FormatCode { | ||
fn unit_size(&self) -> usize { | ||
// XXX: size of l L q Q is platform depended? | ||
match self.code { | ||
'x' | 'c' | 'b' | 'B' | '?' | 's' | 'p' => 1, | ||
'h' | 'H' => 2, | ||
'i' | 'l' | 'I' | 'L' | 'f' => 4, | ||
'q' | 'Q' | 'd' => 8, | ||
'i' | 'I' | 'f' => 4, | ||
'l' | 'L' | 'q' | 'Q' | 'd' => 8, | ||
'n' | 'N' | 'P' => std::mem::size_of::<usize>(), | ||
c => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to change the unit size for 'l' and 'L' to 8, because that is how array.array doing. I don't know is any compatible problem, but in my machine CPython 'l' and 'L' is 8 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it architecture dependent value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* Integers */
case 'h':
intsize = sizeof(short);
is_signed = 1;
break;
case 'H':
intsize = sizeof(short);
is_signed = 0;
break;
case 'i':
intsize = sizeof(int);
is_signed = 1;
break;
case 'I':
intsize = sizeof(int);
is_signed = 0;
break;
case 'l':
intsize = sizeof(long);
is_signed = 1;
break;
case 'L':
intsize = sizeof(long);
is_signed = 0;
break;
case 'q':
intsize = sizeof(long long);
is_signed = 1;
break;
case 'Q':
intsize = sizeof(long long);
is_signed = 0;
break;
This is from CPython, should we put it somewhere together? because we have to make sure array.array and struct have the same size for the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For common platforms, using std::mem::size_of::<isize>
or cfg target_pointer_width work. It fits for common platforms but not perfect. using libc always will fit as it defined in CPython. Try std::mem::size_of::<libc::c_int>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do it in another PR and try unify the reflect from format code to size and the type.
e769804
to
258ffcb
Compare
common/src/cell.rs
Outdated
drop(s.1); | ||
read_rwlock(s.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop(s.1); | |
read_rwlock(s.0) | |
s.1 |
common/src/borrow.rs
Outdated
Self::Ref(r) => BorrowedValue::Ref(f(r)), | ||
Self::MuLock(m) => BorrowedValue::MappedMuLock(PyMutexGuard::map(m, |x| unsafe { | ||
#[allow(mutable_transmutes, clippy::transmute_ptr_to_ptr)] | ||
std::mem::transmute(f(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't sound -- somebody could match to get the mutex guard out of the enum, and then treat it as mutable. I ran into the same problem, and I'm not sure exactly what the solution is. Maybe try something in cell? PyImmutableMappedMutexGuard or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could try making that later today if you don't feel confident about it; I think it would be like { data: *const T, raw: &'a parking_lot::RawMutex, _marker: PhantomData<(&'a T, RawMutex::GuardMarker)> }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can fix that will be perfect, I have no clue to get it soundless..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coolreader18 will you merge #2239 before or after the pr?
3403be7
to
9e5032f
Compare
Oooh, here's an idea: |
But the obj and the buffer in the memoryview could be different, the obj should always point to the original data source and the buffer is the controller shows how can we visit the data. That is how I implement it as the memoryview build from a memoryview the obj is cloned from original object. |
I want to know if it is ok to merge, it is not fully completed but works. |
Oh, I suppose that's true. Nevermind |
Lib/test/test_memoryview.py
Outdated
@@ -114,11 +112,13 @@ def setitem(key, value): | |||
self.assertRaises(TypeError, setitem, "a", b"a") | |||
# Not implemented: multidimensional slices | |||
slices = (slice(0,1,1), slice(0,1,2)) | |||
self.assertRaises(NotImplementedError, setitem, slices, b"a") | |||
# TODO: RUSTPYTHON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this PR enabled a lot of parts from this test, but unless it is fully resolved, I prefer to keep the whole test as expectedFailure before merge. this is only trackable by code, but expectedFailure is trackable by test result. Once the feature is fully done, it will be detected if it is expectedFailiure but not if it is commented out.
extra_tests/snippets/memoryview.py
Outdated
@@ -6,7 +6,7 @@ | |||
a = memoryview(obj) | |||
assert a.obj == obj | |||
|
|||
assert a[2:3] == b"c" | |||
# assert a[2:3] == b"c" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this intended?
vm/src/bytesinner.rs
Outdated
l @ PyList => l.to_byte_inner(vm), | ||
// TODO: PyTyple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyTuple?
6c72e69
to
9c3d97b
Compare
op: PyComparisonOp, | ||
vm: &VirtualMachine, | ||
) -> PyComparisonValue { | ||
// TODO: bytes can compare with any object implemented buffer protocol | ||
// but not memoryview, and not equal if compare with unicode str(PyStr) | ||
PyComparisonValue::from_option( | ||
try_bytes_like(vm, other, |other| { | ||
op.eval_ord(self.elements.as_slice().cmp(other)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we have different behavour between (memoryview op bytes) and (bytes op memoryview), Is any solution better than check the type here?
@coolreader18
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does memory view not implement the buffer protocol? I think that would maybe fix the issue(?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But memoryview is implemented buffer protocol, we will have much more issue if not so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, that's strange, I think that's probably fine for now.
vm/src/bytesinner.rs
Outdated
// TODO: generic way from &[PyObjectRef] | ||
l @ PyList => l.to_byte_inner(vm), | ||
t @ PyTuple => t.to_bytes_inner(vm), | ||
obj => { | ||
let iter = vm.get_method_or_type_error(obj.clone(), "__iter__", || { | ||
format!("a bytes-like object is required, not {}", obj.class()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could have a trait for the objects that can be borrowed as &[PyObjectRef], so we can have a generic and efficiency way to iter.
9c3d97b
to
604bd05
Compare
What that random ci fail comes from? |
dcca305
to
4e08961
Compare
4e08961
to
2ed49bc
Compare
@qingshi163 I rebased this to master with the new mutex/ |
@coolreader18 Thanks, I think I need review for merge now. |
07959c2
to
896590c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this overall looks really good!! I'm fine with merging it and fixing leftover issues later, since this is so big already. @youknowone what do you think?
And don't worry about that CI failure; I can fix that when I merge. |
oops, after a few days of resting, I totally forgot to merge this first before other PRs. I agree to merge this and fix other stuffs later. Let me fix the conflicts. |
896590c
to
8a0b2d9
Compare
8a0b2d9
to
88f5466
Compare
I fixed a few commit messages because fisrt two 'fix test' commits are not fixing test but disabling it. |
I add a slot for implement the buffer protocol, more or less like what CPython doing. I am not sure is the right way for RustPython. Also I am confusing about the slots seem do not inherit from the base? So how we control the behavour of the subclass?