Skip to content

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Aug 29, 2025

It should interpret the result of wcsxfrm() as a sequence of abstract integers, not a sequence of Unicode code points or using other encoding scheme that does not preserve ordering.

It should interpret the result of wcsxfrm() as a sequence of abstract
integers, not a sequence of Unicode code points or using other encoding
scheme that does not preserve ordering.
@serhiy-storchaka serhiy-storchaka changed the title Fix locale.strxfrm() gh-138247: Fix locale.strxfrm() Aug 29, 2025
@serhiy-storchaka
Copy link
Member Author

!buildbot Solaris

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @serhiy-storchaka for commit 60a5481 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F138242%2Fmerge

The command will test the builders whose names match following regular expression: Solaris

The builders matched are:

  • SPARCv9 Oracle Solaris 11.4 PR

@StanFromIreland
Copy link
Member

test_locale now passes!

0:10:12 load avg: 17.86 [438/492/7] test_locale passed

@serhiy-storchaka serhiy-storchaka changed the title gh-138247: Fix locale.strxfrm() gh-138247: Fix locale.strxfrm() on Solaris Aug 30, 2025
@serhiy-storchaka serhiy-storchaka marked this pull request as ready for review August 30, 2025 07:09
@kulikjak
Copy link
Contributor

kulikjak commented Sep 1, 2025

Thanks! I tested the patch on Solaris on both SPARC and Intel, and the tests are happy with it.

That said, I am unsure whether it's correct to split the codes only when they are longer than 16 bits - couldn't that break the ordering?

for example with values 0x100FF and 0xF

  • 0x100FF gets split into 0x1 and 0xFF
  • 0xF remains unchanged

-> comparing element by element, 0x1 < 0xF, but that would not be the case without the split

@kulikjak
Copy link
Contributor

kulikjak commented Sep 1, 2025

@serhiy-storchaka
Copy link
Member Author

Note | 0x10000u. 0x100FF gets split into 0x10001 and 0xFF. It is larger than any unchanged value.

@serhiy-storchaka
Copy link
Member Author

BTW, we are using similar patch on Solaris:

Yes, it is surprisingly similar. You don't need to add 0x10000 if you split every character. My implementation needs this because it leaves 16-bit codes unchanged (this saves memory and time).

More important, PyUnicode_FromWideChar() should not be used here, because it changes order on Solaris.

@serhiy-storchaka serhiy-storchaka changed the title gh-138247: Fix locale.strxfrm() on Solaris gh-60462: Fix locale.strxfrm() on Solaris Sep 2, 2025
@kulikjak
Copy link
Contributor

kulikjak commented Sep 2, 2025

Note | 0x10000u. 0x100FF gets split into 0x10001 and 0xFF. It is larger than any unchanged value.

Oh, I completely overlooked that | 0x10000u; part - thanks for pointing that out.

More important, PyUnicode_FromWideChar() should not be used here, because it changes order on Solaris.

That's true. I don't know if in can change order in our case, but it certainly shouldn't go through that HAVE_NON_UNICODE_WCHAR_T_REPRESENTATION specific conversion we have there.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@serhiy-storchaka serhiy-storchaka merged commit 482fd0c into python:main Sep 3, 2025
49 checks passed
@serhiy-storchaka serhiy-storchaka deleted the locale-strxfrm branch September 3, 2025 12:49
@serhiy-storchaka serhiy-storchaka added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Sep 3, 2025
@miss-islington-app
Copy link

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!

@miss-islington-app
Copy link

Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 3, 2025
It should interpret the result of wcsxfrm() as a sequence of abstract
integers, not a sequence of Unicode code points or using other encoding
scheme that does not preserve ordering.
(cherry picked from commit 482fd0c)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 3, 2025
It should interpret the result of wcsxfrm() as a sequence of abstract
integers, not a sequence of Unicode code points or using other encoding
scheme that does not preserve ordering.
(cherry picked from commit 482fd0c)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Sep 3, 2025

GH-138448 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Sep 3, 2025
@bedevere-app
Copy link

bedevere-app bot commented Sep 3, 2025

GH-138449 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Sep 3, 2025
serhiy-storchaka added a commit that referenced this pull request Sep 3, 2025
It should interpret the result of wcsxfrm() as a sequence of abstract
integers, not a sequence of Unicode code points or using other encoding
scheme that does not preserve ordering.
(cherry picked from commit 482fd0c)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
lkollar pushed a commit to lkollar/cpython that referenced this pull request Sep 9, 2025
It should interpret the result of wcsxfrm() as a sequence of abstract
integers, not a sequence of Unicode code points or using other encoding
scheme that does not preserve ordering.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants