gh-119609, PEP 756: Add PyUnicode_Export() function #123738

vstinner · 2024-09-05T15:23:10Z

Add PyUnicode_Export(), PyUnicode_GetBufferFormat() and PyUnicode_Import() functions to the limited C API.

Issue: [C API] PEP 756: Add PyUnicode_Export() and PyUnicode_Import() functions #119609

📚 Documentation preview 📚: https://cpython-previews--123738.org.readthedocs.build/

Add PyUnicode_Export(), PyUnicode_GetBufferFormat() and PyUnicode_Import() functions to the limited C API.

Doc/c-api/unicode.rst

bedevere-app · 2024-09-05T15:57:39Z

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

Doc/c-api/unicode.rst

Objects/unicodeobject.c

vstinner · 2024-09-05T16:56:01Z

I have made the requested changes; please review again.

bedevere-app · 2024-09-05T16:56:05Z

Thanks for making the requested changes!

@mdboom: please review the changes made to this pull request.

vstinner · 2024-09-05T16:56:26Z

@mdboom @picnixz: Thanks for your reviews. I think that I addressed most, if not all, of them :-)

picnixz

A final nitpick on my side (sorry but I only skimmed through the implementation since I don't have much energy now...).

A bit off-topic, but do we use the PRI* macros in the code base? I saw that you used the %i for formatting a uint32_t value, which usually works, but I wondered whether you prefer using the platform-dependent ones.

Objects/unicodeobject.c

vstinner · 2024-09-05T18:40:40Z

A side effect of this change is to add the __release_buffer__() method to the built-in str type.

I had to implement collections.UserString.__release_buffer__() to fix test_collections (the UserString simply raises NotImplementedError).

Doc/c-api/unicode.rst

Use signed int32_t for the format.

Objects/unicodeobject.c

Doc/c-api/unicode.rst

vstinner · 2024-09-12T10:37:42Z

@serhiy-storchaka: I updated the PR to use _PyUnicode_EncodeUTF16() and _PyUnicode_EncodeUTF32(), and address your other comments.

vstinner · 2024-09-12T10:38:27Z

I had to remove the check "last character in a NUL character" in tests, since _PyUnicode_EncodeUTF16() and _PyUnicode_EncodeUTF32() don't write such last NUL character.

encukou · 2024-09-12T11:54:57Z

I had to remove the check "last character in a NUL character" in tests, since _PyUnicode_EncodeUTF16() and _PyUnicode_EncodeUTF32() don't write such last NUL character.

That's a security vulnerability waiting to happen.

Since the internal buffers do have the terminating NUL, and in most cases we expose those, people will expect the NUL even if we'd explicitly document that it's not guaranteed. IMO, we need to add it.

This reverts commit abf5c58.

vstinner · 2024-09-12T13:44:51Z

@encukou:

Since the internal buffers do have the terminating NUL, and in most cases we expose those, people will expect the NUL even if we'd explicitly document that it's not guaranteed. IMO, we need to add it.

@serhiy-storchaka: Sorry, I reverted the "Use _PyUnicode_EncodeUTF16() and _PyUnicode_EncodeUTF32()" change to get back the NUL trailing character.

vstinner · 2024-09-12T13:46:05Z

I'm not sure if we should guarantee that the exported buffer ends with a NUL character. I'm not sure that all Python implementations will be able to provide such guarantee in an efficient way (without having to allocate a temporary buffer for that).

encukou · 2024-09-12T14:00:37Z

We should. As long as the API is used from C, exported strings should be NUL-terminated for safety.
Another implementation can add a function like XPyUnicode_Export_Raw; if it becomes popular CPython can adopt it as an alias of PyUnicode_Export.

vstinner · 2024-09-12T14:16:01Z

We should. As long as the API is used from C, exported strings should be NUL-terminated for safety.

I suggest to continue this discussion at: capi-workgroup/decisions#33 (comment)

Objects/unicodeobject.c

Use "=H" and "=I" formats.

vstinner · 2024-11-05T15:05:46Z

I withdrawn my PEP 756.

pythongh-119609: Add PyUnicode_Export() function

c84f314

Add PyUnicode_Export(), PyUnicode_GetBufferFormat() and PyUnicode_Import() functions to the limited C API.

vstinner requested review from a team and encukou as code owners September 5, 2024 15:23

bedevere-app bot added the awaiting core review label Sep 5, 2024

mdboom requested changes Sep 5, 2024

View reviewed changes

Doc/c-api/unicode.rst Outdated Show resolved Hide resolved

Doc/c-api/unicode.rst Outdated Show resolved Hide resolved

Doc/c-api/unicode.rst Outdated Show resolved Hide resolved

bedevere-app bot removed the awaiting core review label Sep 5, 2024

bedevere-app bot added the awaiting changes label Sep 5, 2024

picnixz reviewed Sep 5, 2024

View reviewed changes

Doc/c-api/unicode.rst Outdated Show resolved Hide resolved

Doc/c-api/unicode.rst Outdated Show resolved Hide resolved

Objects/unicodeobject.c Outdated Show resolved Hide resolved

vstinner added 2 commits September 5, 2024 18:51

Address reviews

d0cdbd1

Exclude from limited C API 3.13 and older

9b33dca

bedevere-app bot added awaiting change review and removed awaiting changes labels Sep 5, 2024

bedevere-app bot requested a review from mdboom September 5, 2024 16:56

vstinner mentioned this pull request Sep 5, 2024

PEP 756: Add PyUnicode_Export() and PyUnicode_Import() to the limited C API capi-workgroup/decisions#33

Closed

picnixz approved these changes Sep 5, 2024

View reviewed changes

Objects/unicodeobject.c Outdated Show resolved Hide resolved

mdboom approved these changes Sep 5, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting change review labels Sep 5, 2024

Replace PyErr_Format() with PyErr_SetString()

cf1f74a

picnixz reviewed Sep 5, 2024

View reviewed changes

Objects/unicodeobject.c Outdated Show resolved Hide resolved

Fix test_collections: implement UserString.__release_buffer__()

93d4470

vstinner requested a review from rhettinger as a code owner September 5, 2024 18:34

rhettinger removed their request for review September 5, 2024 20:51

vstinner mentioned this pull request Sep 6, 2024

gh-119609: Add PyUnicode_Export() and PyUnicode_Import() functions #119610

Closed

Add format parameter to PyUnicode_Export()

17ad7b9

encukou reviewed Sep 11, 2024

View reviewed changes

Doc/c-api/unicode.rst Outdated Show resolved Hide resolved

vstinner added 3 commits September 11, 2024 12:03

Update constants value in the doc

bcb41f3

Remove unicode_releasebuffer(); use bytes instead

44cb702

PyUnicode_Export() returns the format

1809d8d

Use signed int32_t for the format.

serhiy-storchaka reviewed Sep 12, 2024

View reviewed changes

Objects/unicodeobject.c Show resolved Hide resolved

Objects/unicodeobject.c Outdated Show resolved Hide resolved

Objects/unicodeobject.c Outdated Show resolved Hide resolved

Doc/c-api/unicode.rst Outdated Show resolved Hide resolved

vstinner added 2 commits September 12, 2024 12:34

Fix PyUnicode_Export() signature in doc

6707ef4

Use _PyUnicode_EncodeUTF16() and _PyUnicode_EncodeUTF32()

abf5c58

Use signed int in C tests

033fc07

vstinner added 2 commits September 12, 2024 15:41

Update stable_abi: remove PyUnicode_GetBufferFormat()

078dfcf

Revert "Use _PyUnicode_EncodeUTF16() and _PyUnicode_EncodeUTF32()"

79c6d01

This reverts commit abf5c58.

Allow surrogate characters in UTF-8

5479ab2

serhiy-storchaka reviewed Sep 12, 2024

View reviewed changes

Objects/unicodeobject.c Outdated Show resolved Hide resolved

vstinner added 4 commits September 14, 2024 00:04

Merge branch 'main' into unicode_view

ab2f9b0

Avoid a second copy in the UTF-8 export

f71f230

UCS-4 export: remove one memory copy

492f10a

Update Py_buffer format

b031163

Use "=H" and "=I" formats.

vstinner changed the title ~~gh-119609: Add PyUnicode_Export() function~~ gh-119609, PEP 756: Add PyUnicode_Export() function Sep 17, 2024

bedevere-app bot mentioned this pull request Sep 5, 2024

[C API] PEP 756: Add PyUnicode_Export() and PyUnicode_Import() functions #119609

Closed

vstinner added 2 commits September 23, 2024 17:50

Add PyUnicode_EXPORT_COPY flag

21e6012

doc

3267ce6

vstinner closed this Nov 5, 2024

vstinner deleted the unicode_view branch November 5, 2024 15:05

Uh oh!

gh-119609, PEP 756: Add PyUnicode_Export() function #123738

gh-119609, PEP 756: Add PyUnicode_Export() function #123738

Uh oh!

Conversation

vstinner commented Sep 5, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bedevere-app bot commented Sep 5, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vstinner commented Sep 5, 2024

Uh oh!

bedevere-app bot commented Sep 5, 2024

Uh oh!

vstinner commented Sep 5, 2024

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vstinner commented Sep 5, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vstinner commented Sep 12, 2024

Uh oh!

vstinner commented Sep 12, 2024

Uh oh!

encukou commented Sep 12, 2024

Uh oh!

vstinner commented Sep 12, 2024

Uh oh!

vstinner commented Sep 12, 2024

Uh oh!

encukou commented Sep 12, 2024

Uh oh!

vstinner commented Sep 12, 2024

Uh oh!

Uh oh!

vstinner commented Nov 5, 2024

Uh oh!

Uh oh!

vstinner commented Sep 5, 2024 •

edited by github-actions bot

Loading