Skip to content

Commit 23988e8

Browse files
authored
Merge branch 'main' into pep750-concat-update
2 parents a0c1bb6 + a6566e4 commit 23988e8

File tree

3 files changed

+62
-2
lines changed

3 files changed

+62
-2
lines changed

Doc/library/codecs.rst

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1484,6 +1484,66 @@ mapping. It is not supported by :meth:`str.encode` (which only produces
14841484
Restoration of the ``rot13`` alias.
14851485

14861486

1487+
:mod:`encodings` --- Encodings package
1488+
--------------------------------------
1489+
1490+
.. module:: encodings
1491+
:synopsis: Encodings package
1492+
1493+
This module implements the following functions:
1494+
1495+
.. function:: normalize_encoding(encoding)
1496+
1497+
Normalize encoding name *encoding*.
1498+
1499+
Normalization works as follows: all non-alphanumeric characters except the
1500+
dot used for Python package names are collapsed and replaced with a single
1501+
underscore, leading and trailing underscores are removed.
1502+
For example, ``' -;#'`` becomes ``'_'``.
1503+
1504+
Note that *encoding* should be ASCII only.
1505+
1506+
1507+
.. note::
1508+
The following functions should not be used directly, except for testing
1509+
purposes; :func:`codecs.lookup` should be used instead.
1510+
1511+
1512+
.. function:: search_function(encoding)
1513+
1514+
Search for the codec module corresponding to the given encoding name
1515+
*encoding*.
1516+
1517+
This function first normalizes the *encoding* using
1518+
:func:`normalize_encoding`, then looks for a corresponding alias.
1519+
It attempts to import a codec module from the encodings package using either
1520+
the alias or the normalized name. If the module is found and defines a valid
1521+
``getregentry()`` function that returns a :class:`codecs.CodecInfo` object,
1522+
the codec is cached and returned.
1523+
1524+
If the codec module defines a ``getaliases()`` function any returned aliases
1525+
are registered for future use.
1526+
1527+
1528+
.. function:: win32_code_page_search_function(encoding)
1529+
1530+
Search for a Windows code page encoding *encoding* of the form ``cpXXXX``.
1531+
1532+
If the code page is valid and supported, return a :class:`codecs.CodecInfo`
1533+
object for it.
1534+
1535+
.. availability:: Windows.
1536+
1537+
.. versionadded:: 3.14
1538+
1539+
1540+
This module implements the following exception:
1541+
1542+
.. exception:: CodecRegistryError
1543+
1544+
Raised when a codec is invalid or incompatible.
1545+
1546+
14871547
:mod:`encodings.idna` --- Internationalized Domain Names in Applications
14881548
------------------------------------------------------------------------
14891549

Doc/whatsnew/3.9.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1139,7 +1139,7 @@ Changes in the Python API
11391139
(Contributed by Christian Heimes in :issue:`36384`).
11401140

11411141
* :func:`codecs.lookup` now normalizes the encoding name the same way as
1142-
:func:`!encodings.normalize_encoding`, except that :func:`codecs.lookup` also
1142+
:func:`encodings.normalize_encoding`, except that :func:`codecs.lookup` also
11431143
converts the name to lower case. For example, ``"latex+latin1"`` encoding
11441144
name is now normalized to ``"latex_latin1"``.
11451145
(Contributed by Jordon Xu in :issue:`37751`.)

Include/cpython/unicodeobject.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ struct _PyUnicodeObject_state {
109109
immediately follow the structure. utf8_length can be found
110110
in the length field; the utf8 pointer is equal to the data pointer. */
111111
typedef struct {
112-
/* There are 4 forms of Unicode strings:
112+
/* There are 3 forms of Unicode strings:
113113
114114
- compact ascii:
115115

0 commit comments

Comments
 (0)