Skip to content

Commit 15ab457

Browse files
gh-87281: Improve documentation for locale.setlocale() and locale.getlocale() (GH-137313)
Add a section explaining the locale name formats.
1 parent b78e9c0 commit 15ab457

File tree

1 file changed

+74
-10
lines changed

1 file changed

+74
-10
lines changed

Doc/library/locale.rst

Lines changed: 74 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -34,12 +34,17 @@ The :mod:`locale` module defines the following exception and functions:
3434

3535
If *locale* is given and not ``None``, :func:`setlocale` modifies the locale
3636
setting for the *category*. The available categories are listed in the data
37-
description below. *locale* may be a string, or an iterable of two strings
38-
(language code and encoding). If it's an iterable, it's converted to a locale
39-
name using the locale aliasing engine. An empty string specifies the user's
37+
description below. *locale* may be a :ref:`string <locale_name>`, or a pair,
38+
language code and encoding. An empty string specifies the user's
4039
default settings. If the modification of the locale fails, the exception
4140
:exc:`Error` is raised. If successful, the new locale setting is returned.
4241

42+
If *locale* is a pair, it is converted to a locale name using
43+
the locale aliasing engine.
44+
The language code has the same format as a :ref:`locale name <locale_name>`,
45+
but without encoding and ``@``-modifier.
46+
The language code and encoding can be ``None``.
47+
4348
If *locale* is omitted or ``None``, the current setting for *category* is
4449
returned.
4550

@@ -345,22 +350,26 @@ The :mod:`locale` module defines the following exception and functions:
345350
``'LANG'``. The GNU gettext search path contains ``'LC_ALL'``,
346351
``'LC_CTYPE'``, ``'LANG'`` and ``'LANGUAGE'``, in that order.
347352

348-
Except for the code ``'C'``, the language code corresponds to :rfc:`1766`.
349-
*language code* and *encoding* may be ``None`` if their values cannot be
353+
The language code has the same format as a :ref:`locale name <locale_name>`,
354+
but without encoding and ``@``-modifier.
355+
The language code and encoding may be ``None`` if their values cannot be
350356
determined.
357+
The "C" locale is represented as ``(None, None)``.
351358

352359
.. deprecated-removed:: 3.11 3.15
353360

354361

355362
.. function:: getlocale(category=LC_CTYPE)
356363

357-
Returns the current setting for the given locale category as sequence containing
358-
*language code*, *encoding*. *category* may be one of the :const:`!LC_\*` values
359-
except :const:`LC_ALL`. It defaults to :const:`LC_CTYPE`.
364+
Returns the current setting for the given locale category as a tuple containing
365+
the language code and encoding. *category* may be one of the :const:`!LC_\*`
366+
values except :const:`LC_ALL`. It defaults to :const:`LC_CTYPE`.
360367

361-
Except for the code ``'C'``, the language code corresponds to :rfc:`1766`.
362-
*language code* and *encoding* may be ``None`` if their values cannot be
368+
The language code has the same format as a :ref:`locale name <locale_name>`,
369+
but without encoding and ``@``-modifier.
370+
The language code and encoding may be ``None`` if their values cannot be
363371
determined.
372+
The "C" locale is represented as ``(None, None)``.
364373

365374

366375
.. function:: getpreferredencoding(do_setlocale=True)
@@ -615,6 +624,61 @@ whose high bit is set (i.e., non-ASCII bytes) are never converted or considered
615624
part of a character class such as letter or whitespace.
616625

617626

627+
.. _locale_name:
628+
629+
Locale names
630+
------------
631+
632+
The format of the locale name is platform dependent, and the set of supported
633+
locales can depend on the system configuration.
634+
635+
On Posix platforms, it usually has the format [1]_:
636+
637+
.. productionlist:: locale_name
638+
: language ["_" territory] ["." charset] ["@" modifier]
639+
640+
where *language* is a two- or three-letter language code from `ISO 639`_,
641+
*territory* is a two-letter country or region code from `ISO 3166`_,
642+
*charset* is a locale encoding, and *modifier* is a script name,
643+
a language subtag, a sort order identifier, or other locale modifier
644+
(for example, "latin", "valencia", "stroke" and "euro").
645+
646+
On Windows, several formats are supported. [2]_ [3]_
647+
A subset of `IETF BCP 47`_ tags:
648+
649+
.. productionlist:: locale_name
650+
: language ["-" script] ["-" territory] ["." charset]
651+
: language ["-" script] "-" territory "-" modifier
652+
653+
where *language* and *territory* have the same meaning as in Posix,
654+
*script* is a four-letter script code from `ISO 15924`_,
655+
and *modifier* is a language subtag, a sort order identifier
656+
or custom modifier (for example, "valencia", "stroke" or "x-python").
657+
Both hyphen (``'-'``) and underscore (``'_'``) separators are supported.
658+
Only UTF-8 encoding is allowed for BCP 47 tags.
659+
660+
Windows also supports locale names in the format:
661+
662+
.. productionlist:: locale_name
663+
: language ["_" territory] ["." charset]
664+
665+
where *language* and *territory* are full names, such as "English" and
666+
"United States", and *charset* is either a code page number (for example, "1252")
667+
or UTF-8.
668+
Only the underscore separator is supported in this format.
669+
670+
The "C" locale is supported on all platforms.
671+
672+
.. _ISO 639: https://www.iso.org/iso-639-language-code
673+
.. _ISO 3166: https://www.iso.org/iso-3166-country-codes.html
674+
.. _IETF BCP 47: https://www.rfc-editor.org/info/bcp47
675+
.. _ISO 15924: https://www.unicode.org/iso15924/
676+
677+
.. [1] `IEEE Std 1003.1-2024; 8.2 Internationalization Variables <https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap08.html#tag_08_02>`_
678+
.. [2] `UCRT Locale names, Languages, and Country/Region strings <https://learn.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings>`_
679+
.. [3] `Locale Names <https://learn.microsoft.com/en-us/windows/win32/intl/locale-names>`_
680+
681+
618682
.. _embedding-locale:
619683

620684
For extension writers and programs that embed Python

0 commit comments

Comments
 (0)