From 60a26f7d45a16046500521dba6f74b738ed6f1b4 Mon Sep 17 00:00:00 2001 From: Serhiy Storchaka Date: Fri, 1 Aug 2025 21:21:03 +0300 Subject: [PATCH 1/6] gh-87281: Improve documentation for locale.setlocale() and locale.getlocale() --- Doc/library/locale.rst | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/Doc/library/locale.rst b/Doc/library/locale.rst index 426e3a06e1ef11..fb262251bf48b1 100644 --- a/Doc/library/locale.rst +++ b/Doc/library/locale.rst @@ -34,12 +34,18 @@ The :mod:`locale` module defines the following exception and functions: If *locale* is given and not ``None``, :func:`setlocale` modifies the locale setting for the *category*. The available categories are listed in the data - description below. *locale* may be a string, or an iterable of two strings - (language code and encoding). If it's an iterable, it's converted to a locale + description below. *locale* may be a string, or a pair, + language code and encoding. If it is a pair, it is converted to a locale name using the locale aliasing engine. An empty string specifies the user's default settings. If the modification of the locale fails, the exception :exc:`Error` is raised. If successful, the new locale setting is returned. + The format of the *locale* and the language code strings is platform + depended, but the forms ``language[_territory][.encoding][@modifier]`` + and ``language[_territory]`` respectively are typically accepted on all + platforms. + The language code and encoding can be ``None``. + If *locale* is omitted or ``None``, the current setting for *category* is returned. @@ -345,22 +351,26 @@ The :mod:`locale` module defines the following exception and functions: ``'LANG'``. The GNU gettext search path contains ``'LC_ALL'``, ``'LC_CTYPE'``, ``'LANG'`` and ``'LANGUAGE'``, in that order. - Except for the code ``'C'``, the language code corresponds to :rfc:`1766`. - *language code* and *encoding* may be ``None`` if their values cannot be + The format of the language code is platform depended, but on Posix + platforms it usually looks like ``language[_territory]``. + The language code and encoding may be ``None`` if their values cannot be determined. + The "C" locale is represented as ``(None, None)``. .. deprecated-removed:: 3.11 3.15 .. function:: getlocale(category=LC_CTYPE) - Returns the current setting for the given locale category as sequence containing - *language code*, *encoding*. *category* may be one of the :const:`!LC_\*` values + Returns the current setting for the given locale category as a tuple containing + language code and encoding. *category* may be one of the :const:`!LC_\*` values except :const:`LC_ALL`. It defaults to :const:`LC_CTYPE`. - Except for the code ``'C'``, the language code corresponds to :rfc:`1766`. - *language code* and *encoding* may be ``None`` if their values cannot be + The format of the language code is platform depended, but on Posix + platforms it usually looks like ``language[_territory]``. + The language code and encoding may be ``None`` if their values cannot be determined. + The "C" locale is represented as ``(None, None)``. .. function:: getpreferredencoding(do_setlocale=True) From b78ca1b7f2cb5167ce9930566172a51eea5c9786 Mon Sep 17 00:00:00 2001 From: Serhiy Storchaka Date: Fri, 1 Aug 2025 22:47:06 +0300 Subject: [PATCH 2/6] Apply suggestions from code review Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> --- Doc/library/locale.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Doc/library/locale.rst b/Doc/library/locale.rst index fb262251bf48b1..d064d7de607cf7 100644 --- a/Doc/library/locale.rst +++ b/Doc/library/locale.rst @@ -41,7 +41,7 @@ The :mod:`locale` module defines the following exception and functions: :exc:`Error` is raised. If successful, the new locale setting is returned. The format of the *locale* and the language code strings is platform - depended, but the forms ``language[_territory][.encoding][@modifier]`` + dependent, but the forms ``language[_territory][.encoding][@modifier]`` and ``language[_territory]`` respectively are typically accepted on all platforms. The language code and encoding can be ``None``. @@ -363,10 +363,10 @@ The :mod:`locale` module defines the following exception and functions: .. function:: getlocale(category=LC_CTYPE) Returns the current setting for the given locale category as a tuple containing - language code and encoding. *category* may be one of the :const:`!LC_\*` values - except :const:`LC_ALL`. It defaults to :const:`LC_CTYPE`. + the language code and encoding. *category* may be one of the :const:`!LC_\*` + values except :const:`LC_ALL`. It defaults to :const:`LC_CTYPE`. - The format of the language code is platform depended, but on Posix + The format of the language code is platform dependent, but on Posix platforms it usually looks like ``language[_territory]``. The language code and encoding may be ``None`` if their values cannot be determined. From 10228bcf88eb2b8a6b43f2fb535b34e2bcf330f3 Mon Sep 17 00:00:00 2001 From: Serhiy Storchaka Date: Mon, 4 Aug 2025 16:02:38 +0300 Subject: [PATCH 3/6] Add a section for locale names. --- Doc/library/locale.rst | 74 +++++++++++++++++++++++++++++++++++------- 1 file changed, 63 insertions(+), 11 deletions(-) diff --git a/Doc/library/locale.rst b/Doc/library/locale.rst index d064d7de607cf7..fddd44f01bf8d6 100644 --- a/Doc/library/locale.rst +++ b/Doc/library/locale.rst @@ -34,16 +34,15 @@ The :mod:`locale` module defines the following exception and functions: If *locale* is given and not ``None``, :func:`setlocale` modifies the locale setting for the *category*. The available categories are listed in the data - description below. *locale* may be a string, or a pair, - language code and encoding. If it is a pair, it is converted to a locale - name using the locale aliasing engine. An empty string specifies the user's + description below. *locale* may be a :ref:`string `, or a pair, + language code and encoding. An empty string specifies the user's default settings. If the modification of the locale fails, the exception :exc:`Error` is raised. If successful, the new locale setting is returned. - The format of the *locale* and the language code strings is platform - dependent, but the forms ``language[_territory][.encoding][@modifier]`` - and ``language[_territory]`` respectively are typically accepted on all - platforms. + If *locale* is a pair, it is converted to a locale name using + the locale aliasing engine. + The language code has the same format as a :ref:`locale name `, + but without encoding and ``@``-modifier. The language code and encoding can be ``None``. If *locale* is omitted or ``None``, the current setting for *category* is @@ -351,8 +350,8 @@ The :mod:`locale` module defines the following exception and functions: ``'LANG'``. The GNU gettext search path contains ``'LC_ALL'``, ``'LC_CTYPE'``, ``'LANG'`` and ``'LANGUAGE'``, in that order. - The format of the language code is platform depended, but on Posix - platforms it usually looks like ``language[_territory]``. + The language code has the same format as a :ref:`locale name `, + but without encoding and ``@``-modifier. The language code and encoding may be ``None`` if their values cannot be determined. The "C" locale is represented as ``(None, None)``. @@ -366,8 +365,8 @@ The :mod:`locale` module defines the following exception and functions: the language code and encoding. *category* may be one of the :const:`!LC_\*` values except :const:`LC_ALL`. It defaults to :const:`LC_CTYPE`. - The format of the language code is platform dependent, but on Posix - platforms it usually looks like ``language[_territory]``. + The language code has the same format as a :ref:`locale name `, + but without encoding and ``@``-modifier. The language code and encoding may be ``None`` if their values cannot be determined. The "C" locale is represented as ``(None, None)``. @@ -625,6 +624,59 @@ whose high bit is set (i.e., non-ASCII bytes) are never converted or considered part of a character class such as letter or whitespace. +.. _locale_name: + +Locale names +------------ + +The format of the locale name is platform dependent, and the set of supported +locales can depend on the system configuration. + +On Posix platforms, it usually has the format + +.. productionlist:: locale_name + : language ["_" territory] ["." charset] ["@" modifier] + +where *language* is a two- or three-letter language code from `ISO 639`_, +*territory* is a two-letter country or region code from ISO 3166, +*charset* is a locale encoding, and *modifier* is a script name, +a language subtag, a sort order identifier, or other locale modifier +(e.g. "latin", "valencia", "stroke" and "euro"). + +On Windows, several formats are supported. +A subset of `IETF BCP 47`_ tags: + +.. productionlist:: locale_name + : language ["-" script] ["-" territory] ["." charset] + : language ["-" script] "-" territory "-" modifier + +where *language* and *territory* has the same meaning as in Posix, +*script* is a four-letter script code from `ISO 15924`_, +and *modifier* is a language subtag, a sort order identifier +or custom modifier (e.g. "valencia", "stroke" or "x-python"). +Both hyphen ("``-``") and underscore ("``_``") separators are supported. +Only UTF-8 encoding is allowed for BCP 47 tags. + +Windows supports also locale names in the format + +.. productionlist:: locale_name + : language ["_" territory] ["." charset] + +where *language* and *territory* are long names, such as "English" and +"United States", and *charset* is either a code page number (e.g. "1252") +or UTF-8. +Only the underscore separator is supported in this format. + +The "C" locale is supported on all platforms. + +.. _ISO 639: https://www.iso.org/iso-639-language-code +.. _IETF BCP 47: https://www.rfc-editor.org/info/bcp47 +.. _ISO 15924: https://www.unicode.org/iso15924/ + +.. https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap08.html#tag_08_02 +.. https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings + + .. _embedding-locale: For extension writers and programs that embed Python From 53d7bcbc3090344878298d64568c69e73cd85701 Mon Sep 17 00:00:00 2001 From: Serhiy Storchaka Date: Mon, 4 Aug 2025 17:32:53 +0300 Subject: [PATCH 4/6] Apply suggestions from code review Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> --- Doc/library/locale.rst | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/Doc/library/locale.rst b/Doc/library/locale.rst index fddd44f01bf8d6..e1cf291249a8bd 100644 --- a/Doc/library/locale.rst +++ b/Doc/library/locale.rst @@ -632,7 +632,7 @@ Locale names The format of the locale name is platform dependent, and the set of supported locales can depend on the system configuration. -On Posix platforms, it usually has the format +On Posix platforms, it usually has the format: .. productionlist:: locale_name : language ["_" territory] ["." charset] ["@" modifier] @@ -641,7 +641,7 @@ where *language* is a two- or three-letter language code from `ISO 639`_, *territory* is a two-letter country or region code from ISO 3166, *charset* is a locale encoding, and *modifier* is a script name, a language subtag, a sort order identifier, or other locale modifier -(e.g. "latin", "valencia", "stroke" and "euro"). +(for example, "latin", "valencia", "stroke" and "euro"). On Windows, several formats are supported. A subset of `IETF BCP 47`_ tags: @@ -650,11 +650,11 @@ A subset of `IETF BCP 47`_ tags: : language ["-" script] ["-" territory] ["." charset] : language ["-" script] "-" territory "-" modifier -where *language* and *territory* has the same meaning as in Posix, +where *language* and *territory* have the same meaning as in Posix, *script* is a four-letter script code from `ISO 15924`_, and *modifier* is a language subtag, a sort order identifier -or custom modifier (e.g. "valencia", "stroke" or "x-python"). -Both hyphen ("``-``") and underscore ("``_``") separators are supported. +or custom modifier (for example, "valencia", "stroke" or "x-python"). +Both hyphen (``'-'``) and underscore (``'_'``) separators are supported. Only UTF-8 encoding is allowed for BCP 47 tags. Windows supports also locale names in the format @@ -662,8 +662,8 @@ Windows supports also locale names in the format .. productionlist:: locale_name : language ["_" territory] ["." charset] -where *language* and *territory* are long names, such as "English" and -"United States", and *charset* is either a code page number (e.g. "1252") +where *language* and *territory* are full names, such as "English" and +"United States", and *charset* is either a code page number (for example, "1252") or UTF-8. Only the underscore separator is supported in this format. From 1a16363cf54e86ecb796d2e01de1e4ac0dd98c8a Mon Sep 17 00:00:00 2001 From: Serhiy Storchaka Date: Mon, 4 Aug 2025 17:41:19 +0300 Subject: [PATCH 5/6] Update Doc/library/locale.rst Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> --- Doc/library/locale.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Doc/library/locale.rst b/Doc/library/locale.rst index e1cf291249a8bd..d9692edccff073 100644 --- a/Doc/library/locale.rst +++ b/Doc/library/locale.rst @@ -657,7 +657,7 @@ or custom modifier (for example, "valencia", "stroke" or "x-python"). Both hyphen (``'-'``) and underscore (``'_'``) separators are supported. Only UTF-8 encoding is allowed for BCP 47 tags. -Windows supports also locale names in the format +Windows also supports locale names in the format: .. productionlist:: locale_name : language ["_" territory] ["." charset] From d3aa0bd33e411d18ac0f1d710ed1c4e62827244c Mon Sep 17 00:00:00 2001 From: Serhiy Storchaka Date: Mon, 4 Aug 2025 18:28:57 +0300 Subject: [PATCH 6/6] Add more links. --- Doc/library/locale.rst | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/Doc/library/locale.rst b/Doc/library/locale.rst index d9692edccff073..d48ea04077f366 100644 --- a/Doc/library/locale.rst +++ b/Doc/library/locale.rst @@ -632,18 +632,18 @@ Locale names The format of the locale name is platform dependent, and the set of supported locales can depend on the system configuration. -On Posix platforms, it usually has the format: +On Posix platforms, it usually has the format [1]_: .. productionlist:: locale_name : language ["_" territory] ["." charset] ["@" modifier] where *language* is a two- or three-letter language code from `ISO 639`_, -*territory* is a two-letter country or region code from ISO 3166, +*territory* is a two-letter country or region code from `ISO 3166`_, *charset* is a locale encoding, and *modifier* is a script name, a language subtag, a sort order identifier, or other locale modifier (for example, "latin", "valencia", "stroke" and "euro"). -On Windows, several formats are supported. +On Windows, several formats are supported. [2]_ [3]_ A subset of `IETF BCP 47`_ tags: .. productionlist:: locale_name @@ -670,11 +670,13 @@ Only the underscore separator is supported in this format. The "C" locale is supported on all platforms. .. _ISO 639: https://www.iso.org/iso-639-language-code +.. _ISO 3166: https://www.iso.org/iso-3166-country-codes.html .. _IETF BCP 47: https://www.rfc-editor.org/info/bcp47 .. _ISO 15924: https://www.unicode.org/iso15924/ -.. https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap08.html#tag_08_02 -.. https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings +.. [1] `IEEE Std 1003.1-2024; 8.2 Internationalization Variables `_ +.. [2] `UCRT Locale names, Languages, and Country/Region strings `_ +.. [3] `Locale Names `_ .. _embedding-locale: