-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
test_alt_digits_nl_langinfo fails for locale uk_UA #133740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is weird. ALT_DIGITS should be empty for uk_UA, as for most locales. Most likely this is a platform bug. How old is RHELS 7.6? Is it new or ancient? Does the test pass on other versions? What version of libc? |
Sort of ancient (from 2014) :) This is the version of glib:
|
Interestingly when I test with this:
I get: ALT_DIGITS for uk_UA: 0 |
Also tried this:
This prints:
and with this:
prints
|
Some investigation with gdb:
|
Looks like this is the problem:
|
Related changes:
|
You should look further in the string since glibc uses NUL separator in ALT_DIGITS value.
|
Yeah but that's the problem I think: we are passing the 0 directly to decode string, no? Or who you mean by "you" here? |
There was a hack that used ALT_DIGITS for month names in a genitive case. It was removed in https://sourceware.org/git/?p=glibc.git;a=commit;h=86530b9fed4466a7c05e20ec4d5fd89b4dc41fa6 . Glibc uses null separator instead of ";" for ALT_DIGITS (in violation of Posix). This is why you get "0". Python has a work around this, so it sees the whole thing, including non-ASCII month names. Binary locale files for the "uk_UA" locale are created using the "KOI8-U" encoding, as specified in |
|
gives
|
It confirms that ALT_DIGITS is encoded to KOI8-U:
"січня" means January. |
Now, could you please add By the way, is Python running with options or environment variables that force |
No, just regular |
Thank you, @pablogsal. Is the LC_ALL category set? What if set the LC_ALL environment variable to an empty string or to "uk_UA" before running the test? |
No:
When using an empty string it fails the same:
but when using
|
@serhiy-storchaka do you have a preference in how we should address this? It's changing the locale when deciding enough? |
I'm out of ideas. For further research, we need to trace which exactly C functions are called and why We could ignore the issue and skip the test for the uk_UA locale with old glibc, as it is a unique hack removed from glibc years ago. But I afraid that it may be a sign that we misunderstand something about locales, and that the code can fail on other locales, maybe for other nl_langinfo items, maybe on other platforms. They can just not be tested. In long term we should use |
I can reproduce the issue on RHEL 7.9 with this script: import locale
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
locale.setlocale(locale.LC_TIME, 'uk_UA')
alt_digits = locale.nl_langinfo(locale.ALT_DIGITS)
for item in alt_digits.split(';'):
print(repr(item)) Error: Traceback (most recent call last):
File "/root/cpython/x.py", line 4, in <module>
alt_digits = locale.nl_langinfo(locale.ALT_DIGITS)
UnicodeDecodeError: 'locale' codec can't decode byte 0xde in position 4: decoding error The problem is the The following patch fix the issue: diff --git a/Modules/_localemodule.c b/Modules/_localemodule.c
index ad61839..6f934e3 100644
--- a/Modules/_localemodule.c
+++ b/Modules/_localemodule.c
@@ -692,7 +692,6 @@
result = result != NULL ? result : "";
char *oldloc = NULL;
if (langinfo_constants[i].category != LC_CTYPE
- && !is_all_ascii(result)
&& change_locale(langinfo_constants[i].category, &oldloc) < 0)
{
return NULL; Script output with the patch:
|
Set the LC_CTYPE locale to the LC_TIME locale even if nl_langinfo(ALT_DIGITS) result is ASCII. The result is a list separated by NUL characters and the code only checks the first list item which can be ASCII whereas following items are non-ASCII. Fix test__locale for the uk_UA locale on RHEL 7.
Thank you for solving this mystery @vstinner. 👏 |
Set the LC_CTYPE locale to the LC_TIME locale even if nl_langinfo(ALT_DIGITS) result is ASCII. The result is a list separated by NUL characters and the code only checks the first list item which can be ASCII whereas following items are non-ASCII. Fix test__locale for the uk_UA locale on RHEL 7. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Set the LC_CTYPE locale to the LC_TIME locale even if nl_langinfo(ALT_DIGITS) result is ASCII. The result is a list separated by NUL characters and the code only checks the first list item which can be ASCII whereas following items are non-ASCII. Fix test__locale for the uk_UA locale on RHEL 7. (cherry picked from commit 899c7dc) Co-authored-by: Victor Stinner <vstinner@python.org> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…4512) gh-133740: Fix locale.nl_langinfo(ALT_DIGITS) (GH-134468) Set the LC_CTYPE locale to the LC_TIME locale even if nl_langinfo(ALT_DIGITS) result is ASCII. The result is a list separated by NUL characters and the code only checks the first list item which can be ASCII whereas following items are non-ASCII. Fix test__locale for the uk_UA locale on RHEL 7. (cherry picked from commit 899c7dc) Co-authored-by: Victor Stinner <vstinner@python.org> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Thanks @serhiy-storchaka and @vstinner for the fix 🖤 |
Set the LC_CTYPE locale to the LC_TIME locale even if nl_langinfo(ALT_DIGITS) result is ASCII. The result is a list separated by NUL characters and the code only checks the first list item which can be ASCII whereas following items are non-ASCII. Fix test__locale for the uk_UA locale on RHEL 7. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Uh oh!
There was an error while loading. Please reload this page.
Bug report
Bug description:
This is in a Red Hat Enterprise Linux Server release 7.6 (Maipo) machine
CPython versions tested on:
CPython main branch, 3.14
Operating systems tested on:
Linux
Linked PRs
The text was updated successfully, but these errors were encountered: