-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Implement __reduce__ and __reduce_ex__ for array #3064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
vm/src/stdlib/array.rs
Outdated
vm.ctx.new_str( | ||
char::from_u32(self.0 as u32) | ||
.unwrap_or_default() | ||
.to_string(), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the convert safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WideChar can contain invalid utf-8 character like surrogate. So I don't think so. To be safe, It must be decoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we assuming the data in the array is decoded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure. I guess this is corresponding to WChar unicode object in CPython - which is deprecated same as this type array.
Even if it is regarded as a decoded string, because this is array, it still can contain surrogate character, which is not a valid utf8 string.
So my last comment was wrong. Regardless it is decoded or not, it can contains invalid utf8 character.
vm/src/stdlib/array.rs
Outdated
vm.ctx.new_str( | ||
char::from_u32(self.0 as u32) | ||
.unwrap_or_default() | ||
.to_string(), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WideChar can contain invalid utf-8 character like surrogate. So I don't think so. To be safe, It must be decoded.
f6dd539
to
c289397
Compare
@youknowone Can you check what went wrong in windows tests and fix it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am really sorry. I don't have an accessible windows machine for now and until early october.
can anyone help this?
extra_tests/snippets/stdlib_array.py
Outdated
u = array('u', test_str) | ||
assert u.__reduce_ex__(1)[1][1] == list(test_str) | ||
assert str(loads(dumps(u, 1))) == f"array('u', '{test_str}')" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert str(loads(dumps(u, 1))) == f"array('u', '{test_str}')" | |
assert str(loads(dumps(u, 1))) == f"array('u', '{test_str}')", str(loads(dumps(u, 1))) |
then it will shows the value on assertion failure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(jedi mind trick wave) That was not the line you're looking for…
@coolreader18 might be able to diagnose the Windows problem |
@qingshi163 could you rebase this PR? I will check windows problem during this week |
c289397
to
a0db9f7
Compare
let s = Self::_wchar_bytes_to_string(array.get_bytes(), array.itemsize(), vm)?; | ||
s.chars().map(|x| x.into_pyobject(vm)).collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In windows, I tried this code with CPython:
test_str = '🌉abc🌐def🌉🌐'
u = array('u', test_str)
print(u.__reduce_ex__(1))
then
File ".\extra_tests\snippets\stdlib_array.py", line 104, in <module>
assert u.__reduce_ex__(1)[1][1] == list(test_str), (u.__reduce_ex__(1)[1][1], list(test_str))
AssertionError: (['\ud83c', '\udf09', 'a', 'b', 'c', '\ud83c', '\udf10', 'd', 'e', 'f', '\ud83c', '\udf09', '\ud83c', '\udf10'], ['🌉', 'a', 'b', 'c', '🌐', 'd', 'e', 'f', '🌉', '🌐'])
The values are wchar encoded in CPython but doesn't look like that here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean CPython did not treat it as same in linux macos?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it seems. I think that is not exactly the platform-specific variant but sizeof(wchar_t)
is different by platforms. Looks very fragile in point of view of compatibility, no wonder why it is deprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind if you want to just skip testing for 16bit character environments.
ee6fa13
to
be2bf61
Compare
9af4998
to
ea69dc5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the long time effort!
No description provided.