-
-
Notifications
You must be signed in to change notification settings - Fork 32.8k
Description
Bug report
wcsxfrm()
produces a sequence of wchar_t
that can be compared using wcscmp()
. There is no any promise that the resulting string can be interpreted as text in any way, all that you can do with it is to compare with other result of wcsxfrm()
wchar_t
by wchar_t
.
For example, if wchar_t
is 32-bit, the result can contain values larger than 0x10FFFF. Python strings can only contain Unicode code points in the range 0 to 0x10FFFF. If wchar_t
is 16-bit, surrogate pair should not be interpreted as a single code point with value larger than 0xFFFF -- this breaks order when compare them wchar_t
by wchar_t
. PyUnicode_FromWideChar()
will fail in the former case and produce wrong result in the latter case.
#138242 tries to solve this issue. We need to test on exotic platforms (AIX, Solaris) to check if it helps.
Linked PRs
Metadata
Metadata
Assignees
Labels
Projects
Status
Status