-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Marshaling support for ints, floats, strs, lists, dicts, tuples #3506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
vm/src/stdlib/marshal.rs
Outdated
@@ -30,25 +78,108 @@ mod decl { | |||
Ok(()) | |||
} | |||
|
|||
/// Read the next 8 bytes of a slice, convert to usize | |||
/// Has side effect of increasing position pointer | |||
fn eat_usize(bytes: &[u8], position: &mut usize) -> usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errors could be stemming from this function. Not all architectures have a usize of 8 bytes (could be 2, 4, 8). So instead of reading usize, I am going to switch marshaling to use u32 despite architecture, then convert back to usize. So the only possible error is an element or list being longer than 2^32 bytes, in which case we send an error that it can't be marshaled
vm/src/protocol/buffer.rs
Outdated
@@ -11,7 +11,7 @@ use crate::{ | |||
sliceable::wrap_index, | |||
types::{Constructor, Unconstructible}, | |||
PyObject, PyObjectPayload, PyObjectRef, PyObjectView, PyObjectWrap, PyRef, PyResult, | |||
TryFromBorrowedObject, TypeProtocol, VirtualMachine, | |||
TryFromBorrowedObject, TypeProtocol, VirtualMachine, PyValue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cargo fmt issues stemming from this file
Hi, I am sorry for late review, but I think I will take a few more vacations so it might be delayed a few more days, I am sorry! We don't need data compatibility with CPython, because I have no idea for mac/ubuntu test failures for now. |
I added support for dicts and tuples as well. I'll stop adding features until after a review. Thanks and I hope to hear back about this PR soon! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am really sorry for late review. To make a little excuse, recently I was horribly occupied by my main job.
The core approach looks good. I have a few suggestions about Rust and the project conventions.
I think you brought test_marshal.py
from CPython. Could you revise your commit message to mark the CPython version you brougt it from? here is a nice example: https://github.com/RustPython/RustPython/pull/3474/commits
I left other suggestions on the codes.
pub(crate) fn from_entries(entries: DictContentType) -> Self { | ||
Self { entries } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are encapsulating DictContentType
from outside. If we can avoid to expose this type as more as possible, I would like it.
Could you check if you can refactor PyDict::merge_object
a little bit to expose merge from iterator part? it starts from line 108.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still working on this, everything else is updated. Thanks for all the feedback btw!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great! please let me know if you meet any blocker
vm/src/stdlib/marshal.rs
Outdated
let pytuple_items: Vec<PyObjectRef> = pytuple.fast_getitems(); | ||
let mut tuple_bytes = dump_list(pytuple_items.iter(), vm)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let pytuple_items: Vec<PyObjectRef> = pytuple.fast_getitems(); | |
let mut tuple_bytes = dump_list(pytuple_items.iter(), vm)?; | |
let mut tuple_bytes = dump_list(pytuple.as_slice().iter(), vm)?; |
vm/src/stdlib/marshal.rs
Outdated
let dict = DictContentType::default(); | ||
for elem in iterable { | ||
let items = match_class!(match elem.clone() { | ||
pytuple @ PyTuple => pytuple.fast_getitems(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pytuple @ PyTuple => pytuple.fast_getitems(), | |
pytuple @ PyTuple => pytuple.as_slice().to_vec(), |
vm/src/builtins/tuple.rs
Outdated
pub(crate) fn fast_getitems(&self) -> Vec<PyObjectRef> { | ||
(*self.elements.clone()).to_vec() | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub(crate) fn fast_getitems(&self) -> Vec<PyObjectRef> { | |
(*self.elements.clone()).to_vec() | |
} |
this function is almost covered by as_slice()
vm/src/builtins/tuple.rs
Outdated
pub(crate) fn from_elements(elements: Box<[PyObjectRef]>) -> Self { | ||
Self { elements } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyTuple::new_ref(elements, vm)
is same as PyTuple::from_elements(elements).into_pyobject(vm)
vm/src/stdlib/marshal.rs
Outdated
let mut byte_list = size_to_bytes(pyobjs.len(), vm)?.to_vec(); | ||
// For each element, dump into binary, then add its length and value. | ||
for element in pyobjs { | ||
let element_bytes: PyBytes = dumps(element.clone(), vm)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reuse dumps
for this, I think we need another function not to create PyBytes
.
How about renaming dumps
to _dumps
by changing it to returns Vec<u8>
and adding a new dumps
as a thin wrapper? Then calling _dumps
will be simpler than this one.
vm/src/stdlib/marshal.rs
Outdated
} | ||
pystr @ PyStr => { | ||
let mut str_bytes = pystr.as_str().as_bytes().to_vec(); | ||
str_bytes.insert(0, STR_BYTE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than insert(0, ...)
, swapping the order will prevent huge memcpy in bad cases. same for list and tuples.
vm/src/stdlib/marshal.rs
Outdated
}) | ||
.collect(); | ||
// Converts list of tuples to list, dump into binary | ||
let mut dict_bytes = dumps(elements.into_pyobject(vm), vm)?.deref().to_vec(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a list, can dump_list be reused?
vm/src/stdlib/marshal.rs
Outdated
let length_as_u32 = | ||
u32::from_le_bytes(match bytes[*position..(*position + 4)].try_into() { | ||
Ok(length_as_u32) => length_as_u32, | ||
Err(_) => { | ||
return Err( | ||
vm.new_buffer_error("Could not read u32 size from byte array".to_owned()) | ||
) | ||
} | ||
}); | ||
*position += 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing slice itself is a preferred way to having a fixed buffer and position. (function signature does not match to this code for now)
let length_as_u32 = | |
u32::from_le_bytes(match bytes[*position..(*position + 4)].try_into() { | |
Ok(length_as_u32) => length_as_u32, | |
Err(_) => { | |
return Err( | |
vm.new_buffer_error("Could not read u32 size from byte array".to_owned()) | |
) | |
} | |
}); | |
*position += 4; | |
let (len_bytes, other_bytes) = bytes.split_at(4); | |
let len = u32::from_le_bytes(len_bytes.try_into().map_err(|_| vm.new_buffer_error("Could not read u32 size from byte array".to_owned())?); | |
bytes = other_bytes |
vm/src/stdlib/marshal.rs
Outdated
Ok(PyCode { | ||
code: vm.map_codeobj(code), | ||
}) | ||
match buf[0] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buf.split_first()
will be helpful
Could you also please rebase the code? Since broken CI is recently fixed, it will be green after rebase. |
Most of project using git including us prefer patches(PR) never use merge but only rebases. Could you try rebase if possible? When |
879043e
to
feb3b31
Compare
I tried to rebase some of the earlier commits by squashing each of them into the first. However, it looks like my Any tips haha? Should I create an entirely new PR, not sure if it's too far gone. |
The history looks twisted a bit, but I think running You don't need to make another PR, because one PR is always bound to one branch. You can fix it by force push whenever you want. |
I just use |
@jakearmendariz I rebased it and pushed it to your github branch |
Sorry I have been so slow at responding and making changes, took a lot of hard classes this quarter and midterms have been killer. A couple of notes, Also, I couldn't refactor I think that addresses all of the PR comments, everything else should have been fixed. Thanks again for the help! |
There's no real need to worry about that. A similar situation exists for |
For |
Done! |
I thought about that, but the snippets are run by both CPython and RustPython, and I'm guessing there's a chance that the snippet would fail when CPython runs it. |
Yep, the snippets failed the tests because of marshal.py (CPython runs it) |
Go ahead and move it back, (optionally) adding a note explaining that RustPython's version of |
I don't think putting original tests to |
Updated! No byte-str comparison, passes test for RustPython and CPython |
@jakearmendariz thank you for long time effort. there is a error for
|
Changed to send an EOF error when an empty byte string is passed in! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great! thank you very much!
My work on #3458. Marshaling still needs support for tuples, dicts, sets, etc. But I wanted to post a PR for some feedback (and possibly an early merge).
How It Works
When marshaled, each buffer is prefixed with a byte indicating the type of data. Then while decoding we can check the first byte to see which datatype the buffer holds.
Lists are more complicated because they have different types, and can be recursive (lists inside of lists). Each list has a prefix of its size, and then each element's data is stored directly after another 8 bytes indicating its size.
Resulting in
[LIST_BYTE, <bytes for size>, <size><element>,<size><element>...]
Already there are some improvement areas. Most notably, each list/element size is stored as usize (u64) and that can probably be changed to a u32.
Testing
I added a python file for testing. It passes for RustPython, marshaling/unmarshaling primitives and lists.
However, CPython marshals their data differently. I am not sure if they are supposed to match, however, I thought it was best to optimize the code to fit the RustPython Datastructures and not worry about how CPython handles marshaling.
Closing
Please let me know if there are any problems or more idiomatic ways of writing this. I've never written rust code outside of personal projects so I expect some issues.
Thanks!