From 6479dce9afd4553f1b62824ca55ff15c999aeaae Mon Sep 17 00:00:00 2001 From: Stan Ulbrych Date: Fri, 1 Aug 2025 16:13:18 +0200 Subject: [PATCH 1/4] Commit --- Doc/library/codecs.rst | 2 + Doc/library/functions.rst | 78 +++++++++++++++++++++++++-------------- 2 files changed, 52 insertions(+), 28 deletions(-) diff --git a/Doc/library/codecs.rst b/Doc/library/codecs.rst index f96f2f8281f450..2e243537d409d7 100644 --- a/Doc/library/codecs.rst +++ b/Doc/library/codecs.rst @@ -350,6 +350,8 @@ error handling schemes by accepting the *errors* string argument: The following error handlers can be used with all Python :ref:`standard-encodings` codecs: +.. The following tables are reproduced on the library/functions page under open. + .. tabularcolumns:: |l|L| +-------------------------+-----------------------------------------------+ diff --git a/Doc/library/functions.rst b/Doc/library/functions.rst index 80bd1275973f8d..c760d708403b39 100644 --- a/Doc/library/functions.rst +++ b/Doc/library/functions.rst @@ -1423,37 +1423,59 @@ are always available. They are listed here in alphabetical order. *errors* is an optional string that specifies how encoding and decoding errors are to be handled—this cannot be used in binary mode. A variety of standard error handlers are available - (listed under :ref:`error-handlers`), though any - error handling name that has been registered with + (listed under :ref:`error-handlers`, and reproduced below for convenience), + though any error handling name that has been registered with :func:`codecs.register_error` is also valid. The standard names include: - * ``'strict'`` to raise a :exc:`ValueError` exception if there is - an encoding error. The default value of ``None`` has the same - effect. - - * ``'ignore'`` ignores errors. Note that ignoring encoding errors - can lead to data loss. - - * ``'replace'`` causes a replacement marker (such as ``'?'``) to be inserted - where there is malformed data. - - * ``'surrogateescape'`` will represent any incorrect bytes as low - surrogate code units ranging from U+DC80 to U+DCFF. - These surrogate code units will then be turned back into - the same bytes when the ``surrogateescape`` error handler is used - when writing data. This is useful for processing files in an - unknown encoding. - - * ``'xmlcharrefreplace'`` is only supported when writing to a file. - Characters not supported by the encoding are replaced with the - appropriate XML character reference :samp:`&#{nnn};`. - - * ``'backslashreplace'`` replaces malformed data by Python's backslashed - escape sequences. - - * ``'namereplace'`` (also only supported when writing) - replaces unsupported characters with ``\N{...}`` escape sequences. + .. list-table:: + :header-rows: 1 + + * - Error handler + - Description + * - ``'strict'`` + - Raise a :exc:`UnicodeError` (or a subclass) exception if there is + an error. The default value of ``None`` has the same effect. + * - ``'ignore'`` + - Ignore the malformed data and continue without further notice. + Note that ignoring encoding errors can lead to data loss. + * - ``'replace'`` + - Replace malformed data with a replacement marker. + On encoding, use ``?`` (ASCII character). + On decoding, use ``�`` (U+FFFD, the official REPLACEMENT CHARACTER) + * - ``'backslashreplace'`` + - Replace malformed data with backslashed escape sequences. + On encoding, use hexadecimal form of Unicode code point with formats + :samp:`\\x{hh}` :samp:`\\u{xxxx}` :samp:`\\U{xxxxxxxx}`. + On decoding, use hexadecimal form of byte value with format :samp:`\\x{hh}`. + * - ``'surrogateescape'`` + - Will represent any incorrect bytes as low + surrogate code units ranging from ``U+DC80`` to ``U+DCFF``. + These surrogate code units will then be turned back into + the same bytes when the ``'surrogateescape'`` error handler is used + when writing data. This is useful for processing files in an + unknown encoding. + * - ``'surrogatepass'`` + - Only available for Unicode codecs. + Allow encoding and decoding surrogate code point + (``U+D800`` - ``U+DFFF``) as normal code point. Otherwise these codecs + treat the presence of surrogate code point in :class:`str` as an error. + + The following error handlers are only applicable to encoding (within + :term:`text encodings `): + + .. list-table:: + :header-rows: 1 + + * - Error handler + - Description + * - ``'xmlcharrefreplace'`` + - Only supported when writing to a file. + Characters not supported by the encoding are replaced with the + appropriate XML character reference :samp:`&#{nnn};`. + * - ``'namereplace'`` + - Only supported when writing. Replaces unsupported characters with + ``\N{...}`` escape sequences. .. index:: single: universal newlines; open() built-in function From fd1b26e4c1c8a2f3e72f9db1ea90f7b8cbef1872 Mon Sep 17 00:00:00 2001 From: Stan Ulbrych Date: Wed, 6 Aug 2025 17:23:45 +0200 Subject: [PATCH 2/4] Petr's suggestions --- Doc/library/functions.rst | 27 +++++++++------------------ 1 file changed, 9 insertions(+), 18 deletions(-) diff --git a/Doc/library/functions.rst b/Doc/library/functions.rst index c760d708403b39..9c1cfacfaef05c 100644 --- a/Doc/library/functions.rst +++ b/Doc/library/functions.rst @@ -1423,7 +1423,7 @@ are always available. They are listed here in alphabetical order. *errors* is an optional string that specifies how encoding and decoding errors are to be handled—this cannot be used in binary mode. A variety of standard error handlers are available - (listed under :ref:`error-handlers`, and reproduced below for convenience), + (listed under :ref:`error-handlers`, and summarized below for convenience), though any error handling name that has been registered with :func:`codecs.register_error` is also valid. The standard names include: @@ -1441,13 +1441,13 @@ are always available. They are listed here in alphabetical order. Note that ignoring encoding errors can lead to data loss. * - ``'replace'`` - Replace malformed data with a replacement marker. - On encoding, use ``?`` (ASCII character). - On decoding, use ``�`` (U+FFFD, the official REPLACEMENT CHARACTER) + On writing, use ``?`` (ASCII character 63). + On reading, use ``�`` (U+FFFD, the official REPLACEMENT CHARACTER) * - ``'backslashreplace'`` - Replace malformed data with backslashed escape sequences. - On encoding, use hexadecimal form of Unicode code point with formats + On writing, use hexadecimal form of Unicode code points with formats :samp:`\\x{hh}` :samp:`\\u{xxxx}` :samp:`\\U{xxxxxxxx}`. - On decoding, use hexadecimal form of byte value with format :samp:`\\x{hh}`. + On reading, use hexadecimal form of byte value with format :samp:`\\x{hh}`. * - ``'surrogateescape'`` - Will represent any incorrect bytes as low surrogate code units ranging from ``U+DC80`` to ``U+DCFF``. @@ -1457,20 +1457,11 @@ are always available. They are listed here in alphabetical order. unknown encoding. * - ``'surrogatepass'`` - Only available for Unicode codecs. - Allow encoding and decoding surrogate code point - (``U+D800`` - ``U+DFFF``) as normal code point. Otherwise these codecs - treat the presence of surrogate code point in :class:`str` as an error. - - The following error handlers are only applicable to encoding (within - :term:`text encodings `): - - .. list-table:: - :header-rows: 1 - - * - Error handler - - Description + Allow encoding and decoding surrogate code points + (``U+D800`` - ``U+DFFF``) as normal code points. Otherwise these codecs + treat the presence of surrogate code points in :class:`str` as an error. * - ``'xmlcharrefreplace'`` - - Only supported when writing to a file. + - Only supported when writing. Characters not supported by the encoding are replaced with the appropriate XML character reference :samp:`&#{nnn};`. * - ``'namereplace'`` From 3305a3f9ff0a57934f9c10c26cf3d039a9115e71 Mon Sep 17 00:00:00 2001 From: Stan Ulbrych Date: Tue, 12 Aug 2025 08:07:07 +0200 Subject: [PATCH 3/4] Link instead --- Doc/library/functions.rst | 45 ++------------------------------------- 1 file changed, 2 insertions(+), 43 deletions(-) diff --git a/Doc/library/functions.rst b/Doc/library/functions.rst index 9c1cfacfaef05c..c40399f0c88e68 100644 --- a/Doc/library/functions.rst +++ b/Doc/library/functions.rst @@ -1422,51 +1422,10 @@ are always available. They are listed here in alphabetical order. *errors* is an optional string that specifies how encoding and decoding errors are to be handled—this cannot be used in binary mode. - A variety of standard error handlers are available - (listed under :ref:`error-handlers`, and summarized below for convenience), + A variety of standard error handlers are available, though any error handling name that has been registered with :func:`codecs.register_error` is also valid. The standard names - include: - - .. list-table:: - :header-rows: 1 - - * - Error handler - - Description - * - ``'strict'`` - - Raise a :exc:`UnicodeError` (or a subclass) exception if there is - an error. The default value of ``None`` has the same effect. - * - ``'ignore'`` - - Ignore the malformed data and continue without further notice. - Note that ignoring encoding errors can lead to data loss. - * - ``'replace'`` - - Replace malformed data with a replacement marker. - On writing, use ``?`` (ASCII character 63). - On reading, use ``�`` (U+FFFD, the official REPLACEMENT CHARACTER) - * - ``'backslashreplace'`` - - Replace malformed data with backslashed escape sequences. - On writing, use hexadecimal form of Unicode code points with formats - :samp:`\\x{hh}` :samp:`\\u{xxxx}` :samp:`\\U{xxxxxxxx}`. - On reading, use hexadecimal form of byte value with format :samp:`\\x{hh}`. - * - ``'surrogateescape'`` - - Will represent any incorrect bytes as low - surrogate code units ranging from ``U+DC80`` to ``U+DCFF``. - These surrogate code units will then be turned back into - the same bytes when the ``'surrogateescape'`` error handler is used - when writing data. This is useful for processing files in an - unknown encoding. - * - ``'surrogatepass'`` - - Only available for Unicode codecs. - Allow encoding and decoding surrogate code points - (``U+D800`` - ``U+DFFF``) as normal code points. Otherwise these codecs - treat the presence of surrogate code points in :class:`str` as an error. - * - ``'xmlcharrefreplace'`` - - Only supported when writing. - Characters not supported by the encoding are replaced with the - appropriate XML character reference :samp:`&#{nnn};`. - * - ``'namereplace'`` - - Only supported when writing. Replaces unsupported characters with - ``\N{...}`` escape sequences. + can be found in :ref:`error-handlers`. .. index:: single: universal newlines; open() built-in function From 82eed02d82ead1e1fb547bd3026c48a42cc29a31 Mon Sep 17 00:00:00 2001 From: Stan Ulbrych Date: Tue, 12 Aug 2025 08:08:15 +0200 Subject: [PATCH 4/4] Remove unnecessary comment --- Doc/library/codecs.rst | 2 -- 1 file changed, 2 deletions(-) diff --git a/Doc/library/codecs.rst b/Doc/library/codecs.rst index 2e243537d409d7..f96f2f8281f450 100644 --- a/Doc/library/codecs.rst +++ b/Doc/library/codecs.rst @@ -350,8 +350,6 @@ error handling schemes by accepting the *errors* string argument: The following error handlers can be used with all Python :ref:`standard-encodings` codecs: -.. The following tables are reproduced on the library/functions page under open. - .. tabularcolumns:: |l|L| +-------------------------+-----------------------------------------------+