gh-60462: Fix locale.strxfrm() on Solaris #138242

serhiy-storchaka · 2025-08-29T12:42:27Z

It should interpret the result of wcsxfrm() as a sequence of abstract integers, not a sequence of Unicode code points or using other encoding scheme that does not preserve ordering.

Issue: locale.strxfrm() may improperly use PyUnicode_FromWideChar() #138247

Issue: test_local.TestEnUSCollection failures on Solaris 10 #60462

It should interpret the result of wcsxfrm() as a sequence of abstract integers, not a sequence of Unicode code points or using other encoding scheme that does not preserve ordering.

serhiy-storchaka · 2025-08-29T16:35:38Z

!buildbot Solaris

bedevere-bot · 2025-08-29T16:35:42Z

🤖 New build scheduled with the buildbot fleet by @serhiy-storchaka for commit 60a5481 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F138242%2Fmerge

The command will test the builders whose names match following regular expression: Solaris

The builders matched are:

SPARCv9 Oracle Solaris 11.4 PR

StanFromIreland · 2025-08-29T17:21:04Z

test_locale now passes!

0:10:12 load avg: 17.86 [438/492/7] test_locale passed

kulikjak · 2025-09-01T12:32:56Z

Thanks! I tested the patch on Solaris on both SPARC and Intel, and the tests are happy with it.

That said, I am unsure whether it's correct to split the codes only when they are longer than 16 bits - couldn't that break the ordering?

for example with values 0x100FF and 0xF

0x100FF gets split into 0x1 and 0xFF
0xF remains unchanged

-> comparing element by element, 0x1 < 0xF, but that would not be the case without the split

kulikjak · 2025-09-01T12:33:14Z

BTW, we are using similar patch on Solaris:
https://github.com/oracle/solaris-userland/blob/master/components/python/python313/patches/24-strxfrm-fix.patch

serhiy-storchaka · 2025-09-01T21:12:20Z

Note | 0x10000u. 0x100FF gets split into 0x10001 and 0xFF. It is larger than any unchanged value.

serhiy-storchaka · 2025-09-02T06:41:49Z

BTW, we are using similar patch on Solaris:

Yes, it is surprisingly similar. You don't need to add 0x10000 if you split every character. My implementation needs this because it leaves 16-bit codes unchanged (this saves memory and time).

More important, PyUnicode_FromWideChar() should not be used here, because it changes order on Solaris.

kulikjak · 2025-09-02T07:29:35Z

Note | 0x10000u. 0x100FF gets split into 0x10001 and 0xFF. It is larger than any unchanged value.

Oh, I completely overlooked that | 0x10000u; part - thanks for pointing that out.

More important, PyUnicode_FromWideChar() should not be used here, because it changes order on Solaris.

That's true. I don't know if in can change order in our case, but it certainly shouldn't go through that HAVE_NON_UNICODE_WCHAR_T_REPRESENTATION specific conversion we have there.

Modules/_localemodule.c

vstinner

LGTM

miss-islington-app · 2025-09-03T13:09:12Z

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!

miss-islington-app · 2025-09-03T13:09:12Z

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

It should interpret the result of wcsxfrm() as a sequence of abstract integers, not a sequence of Unicode code points or using other encoding scheme that does not preserve ordering. (cherry picked from commit 482fd0c) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

bedevere-app · 2025-09-03T13:09:25Z

GH-138448 is a backport of this pull request to the 3.14 branch.

bedevere-app · 2025-09-03T13:09:29Z

GH-138449 is a backport of this pull request to the 3.13 branch.

It should interpret the result of wcsxfrm() as a sequence of abstract integers, not a sequence of Unicode code points or using other encoding scheme that does not preserve ordering. (cherry picked from commit 482fd0c) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

It should interpret the result of wcsxfrm() as a sequence of abstract integers, not a sequence of Unicode code points or using other encoding scheme that does not preserve ordering.

This was referenced Aug 29, 2025

Skip tests failing on Solaris #91214

Open

locale.strxfrm() may improperly use PyUnicode_FromWideChar() #138247

Closed

pythongh-138247: Fix locale.strxfrm()

60a5481

It should interpret the result of wcsxfrm() as a sequence of abstract integers, not a sequence of Unicode code points or using other encoding scheme that does not preserve ordering.

serhiy-storchaka force-pushed the locale-strxfrm branch from ea8283a to 60a5481 Compare August 29, 2025 15:49

serhiy-storchaka changed the title ~~Fix locale.strxfrm()~~ gh-138247: Fix locale.strxfrm() Aug 29, 2025

Add a NEWS entry.

58713db

serhiy-storchaka changed the title ~~gh-138247: Fix locale.strxfrm()~~ gh-138247: Fix locale.strxfrm() on Solaris Aug 30, 2025

serhiy-storchaka marked this pull request as ready for review August 30, 2025 07:09

bedevere-app bot added the awaiting core review label Aug 30, 2025

serhiy-storchaka changed the title ~~gh-138247: Fix locale.strxfrm() on Solaris~~ gh-60462: Fix locale.strxfrm() on Solaris Sep 2, 2025

bedevere-app bot mentioned this pull request Sep 2, 2025

test_local.TestEnUSCollection failures on Solaris 10 #60462

Closed

Move to pythongh-60462.

9aa3c9d

serhiy-storchaka requested a review from vstinner September 2, 2025 07:24

kulikjak approved these changes Sep 2, 2025

View reviewed changes

vstinner reviewed Sep 2, 2025

View reviewed changes

Modules/_localemodule.c Show resolved Hide resolved

Ensure that it works for signed wchar_t.

349c3b2

vstinner approved these changes Sep 3, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Sep 3, 2025

serhiy-storchaka merged commit 482fd0c into python:main Sep 3, 2025
49 checks passed

bedevere-app bot removed the awaiting merge label Sep 3, 2025

serhiy-storchaka deleted the locale-strxfrm branch September 3, 2025 12:49

serhiy-storchaka added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Sep 3, 2025

bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Sep 3, 2025

bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Sep 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-60462: Fix locale.strxfrm() on Solaris #138242

gh-60462: Fix locale.strxfrm() on Solaris #138242

Uh oh!

serhiy-storchaka commented Aug 29, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

serhiy-storchaka commented Aug 29, 2025

Uh oh!

bedevere-bot commented Aug 29, 2025

Uh oh!

StanFromIreland commented Aug 29, 2025

Uh oh!

kulikjak commented Sep 1, 2025

Uh oh!

kulikjak commented Sep 1, 2025

Uh oh!

serhiy-storchaka commented Sep 1, 2025

Uh oh!

serhiy-storchaka commented Sep 2, 2025

Uh oh!

kulikjak commented Sep 2, 2025

Uh oh!

Uh oh!

vstinner left a comment

Uh oh!

Uh oh!

miss-islington-app bot commented Sep 3, 2025

Uh oh!

miss-islington-app bot commented Sep 3, 2025

Uh oh!

bedevere-app bot commented Sep 3, 2025

Uh oh!

bedevere-app bot commented Sep 3, 2025

Uh oh!

Uh oh!

Uh oh!

gh-60462: Fix locale.strxfrm() on Solaris #138242

gh-60462: Fix locale.strxfrm() on Solaris #138242

Uh oh!

Conversation

serhiy-storchaka commented Aug 29, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serhiy-storchaka commented Aug 29, 2025

Uh oh!

bedevere-bot commented Aug 29, 2025

Uh oh!

StanFromIreland commented Aug 29, 2025

Uh oh!

kulikjak commented Sep 1, 2025

Uh oh!

kulikjak commented Sep 1, 2025

Uh oh!

serhiy-storchaka commented Sep 1, 2025

Uh oh!

serhiy-storchaka commented Sep 2, 2025

Uh oh!

kulikjak commented Sep 2, 2025

Uh oh!

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

miss-islington-app bot commented Sep 3, 2025

Uh oh!

miss-islington-app bot commented Sep 3, 2025

Uh oh!

bedevere-app bot commented Sep 3, 2025

Uh oh!

bedevere-app bot commented Sep 3, 2025

Uh oh!

Uh oh!

serhiy-storchaka commented Aug 29, 2025 •

edited by bedevere-app bot

Loading