-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-119609, PEP 756: Add PyUnicode_Export() function #123738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add PyUnicode_Export(), PyUnicode_GetBufferFormat() and PyUnicode_Import() functions to the limited C API.
When you're done making the requested changes, leave the comment: |
I have made the requested changes; please review again. |
Thanks for making the requested changes! @mdboom: please review the changes made to this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A final nitpick on my side (sorry but I only skimmed through the implementation since I don't have much energy now...).
A bit off-topic, but do we use the PRI*
macros in the code base? I saw that you used the %i
for formatting a uint32_t
value, which usually works, but I wondered whether you prefer using the platform-dependent ones.
A side effect of this change is to add the I had to implement |
Use signed int32_t for the format.
@serhiy-storchaka: I updated the PR to use _PyUnicode_EncodeUTF16() and _PyUnicode_EncodeUTF32(), and address your other comments. |
I had to remove the check "last character in a NUL character" in tests, since _PyUnicode_EncodeUTF16() and _PyUnicode_EncodeUTF32() don't write such last NUL character. |
That's a security vulnerability waiting to happen. Since the internal buffers do have the terminating NUL, and in most cases we expose those, people will expect the NUL even if we'd explicitly document that it's not guaranteed. IMO, we need to add it. |
@serhiy-storchaka: Sorry, I reverted the "Use _PyUnicode_EncodeUTF16() and _PyUnicode_EncodeUTF32()" change to get back the NUL trailing character. |
I'm not sure if we should guarantee that the exported buffer ends with a NUL character. I'm not sure that all Python implementations will be able to provide such guarantee in an efficient way (without having to allocate a temporary buffer for that). |
We should. As long as the API is used from C, exported strings should be NUL-terminated for safety. |
I suggest to continue this discussion at: capi-workgroup/decisions#33 (comment) |
I withdrawn my PEP 756. |
Add PyUnicode_Export(), PyUnicode_GetBufferFormat() and PyUnicode_Import() functions to the limited C API.
📚 Documentation preview 📚: https://cpython-previews--123738.org.readthedocs.build/