Skip to content

locale.strxfrm() may improperly use PyUnicode_FromWideChar() #138247

@serhiy-storchaka

Description

@serhiy-storchaka

Bug report

wcsxfrm() produces a sequence of wchar_t that can be compared using wcscmp(). There is no any promise that the resulting string can be interpreted as text in any way, all that you can do with it is to compare with other result of wcsxfrm() wchar_t by wchar_t.

For example, if wchar_t is 32-bit, the result can contain values larger than 0x10FFFF. Python strings can only contain Unicode code points in the range 0 to 0x10FFFF. If wchar_t is 16-bit, surrogate pair should not be interpreted as a single code point with value larger than 0xFFFF -- this breaks order when compare them wchar_t by wchar_t. PyUnicode_FromWideChar() will fail in the former case and produce wrong result in the latter case.

#138242 tries to solve this issue. We need to test on exotic platforms (AIX, Solaris) to check if it helps.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.13bugs and security fixes3.14bugs and security fixes3.15new features, bugs and security fixesOS-unsupportedextension-modulesC modules in the Modules dirtype-bugAn unexpected behavior, bug, or error

    Projects

    Status

    Done

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions