Skip to content

Implement __reduce__ and __reduce_ex__ for array #3064

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 11, 2021

Conversation

qingshi163
Copy link
Contributor

No description provided.

Comment on lines 564 to 576
vm.ctx.new_str(
char::from_u32(self.0 as u32)
.unwrap_or_default()
.to_string(),
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the convert safe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WideChar can contain invalid utf-8 character like surrogate. So I don't think so. To be safe, It must be decoded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we assuming the data in the array is decoded?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. I guess this is corresponding to WChar unicode object in CPython - which is deprecated same as this type array.
Even if it is regarded as a decoded string, because this is array, it still can contain surrogate character, which is not a valid utf8 string.
So my last comment was wrong. Regardless it is decoded or not, it can contains invalid utf8 character.

Comment on lines 564 to 576
vm.ctx.new_str(
char::from_u32(self.0 as u32)
.unwrap_or_default()
.to_string(),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WideChar can contain invalid utf-8 character like surrogate. So I don't think so. To be safe, It must be decoded.

@qingshi163 qingshi163 force-pushed the array-pickle branch 2 times, most recently from f6dd539 to c289397 Compare September 16, 2021 13:37
@qingshi163
Copy link
Contributor Author

@youknowone Can you check what went wrong in windows tests and fix it?

Copy link
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really sorry. I don't have an accessible windows machine for now and until early october.

can anyone help this?

u = array('u', test_str)
assert u.__reduce_ex__(1)[1][1] == list(test_str)
assert str(loads(dumps(u, 1))) == f"array('u', '{test_str}')"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert str(loads(dumps(u, 1))) == f"array('u', '{test_str}')"
assert str(loads(dumps(u, 1))) == f"array('u', '{test_str}')", str(loads(dumps(u, 1)))

then it will shows the value on assertion failure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fanninpm
Copy link
Contributor

@coolreader18 might be able to diagnose the Windows problem

@youknowone
Copy link
Member

@qingshi163 could you rebase this PR? I will check windows problem during this week

Comment on lines +1146 to +1149
let s = Self::_wchar_bytes_to_string(array.get_bytes(), array.itemsize(), vm)?;
s.chars().map(|x| x.into_pyobject(vm)).collect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In windows, I tried this code with CPython:

test_str = '🌉abc🌐def🌉🌐'
u = array('u', test_str)
print(u.__reduce_ex__(1))

then

  File ".\extra_tests\snippets\stdlib_array.py", line 104, in <module>
    assert u.__reduce_ex__(1)[1][1] == list(test_str), (u.__reduce_ex__(1)[1][1], list(test_str))
AssertionError: (['\ud83c', '\udf09', 'a', 'b', 'c', '\ud83c', '\udf10', 'd', 'e', 'f', '\ud83c', '\udf09', '\ud83c', '\udf10'], ['🌉', 'a', 'b', 'c', '🌐', 'd', 'e', 'f', '🌉', '🌐'])

The values are wchar encoded in CPython but doesn't look like that here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean CPython did not treat it as same in linux macos?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems. I think that is not exactly the platform-specific variant but sizeof(wchar_t) is different by platforms. Looks very fragile in point of view of compatibility, no wonder why it is deprecated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind if you want to just skip testing for 16bit character environments.

@qingshi163 qingshi163 force-pushed the array-pickle branch 2 times, most recently from ee6fa13 to be2bf61 Compare October 7, 2021 13:41
Copy link
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the long time effort!

@youknowone youknowone merged commit b986e6b into RustPython:main Oct 11, 2021
@qingshi163 qingshi163 deleted the array-pickle branch August 16, 2022 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants