Skip to content

bpo-43510: Implement PEP 597 opt-in EncodingWarning. #19481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 51 commits into from
Mar 29, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
a3c014b
Raise a warning when encoding is omitted
methane Apr 7, 2020
050bd1b
add test
methane Apr 12, 2020
939f4a0
wrap encoding=None with text_encoding.
methane Apr 12, 2020
3c99777
Add io.LOCALE_ENCODING = "locale"
methane Jan 29, 2021
4016278
Add EncodingWarning.
methane Jan 29, 2021
c5c556c
Add sys.warn_default_encoding
methane Jan 29, 2021
d9a08c2
shorten option names
methane Jan 30, 2021
772648e
EncodingWarning extends Warning
methane Jan 30, 2021
1a8e305
make clinic
methane Jan 30, 2021
20966cd
fix test
methane Jan 30, 2021
2b80f42
remove wrong test case
methane Jan 30, 2021
760308c
fix exception_hierarchy.txt
methane Jan 30, 2021
a95dff2
Make sys.flags.encoding_warning int
methane Jan 31, 2021
31fb411
Fix text_embed.
methane Jan 31, 2021
096a0a3
Fix test_pickle
methane Jan 31, 2021
99fc938
configparser: use io.text_encoding()
methane Feb 13, 2021
6fdbcbc
Rename option names
methane Feb 22, 2021
3f362bc
Merge remote-tracking branch 'upstream/master' into open-encoding
methane Mar 16, 2021
674feff
Update docs
methane Mar 16, 2021
d9d850f
Add NEWS entry
methane Mar 16, 2021
16463ea
Add document for text_encoding and encoding="locale".
methane Mar 17, 2021
412d633
Suppress EncodingWarning from site.py
methane Mar 17, 2021
ee883d1
Remove io.LOCALE_ENCODING
methane Mar 18, 2021
6a15e2a
text_encoding() first argument is mandatory.
methane Mar 18, 2021
5d474b4
Apply suggestions from code review
methane Mar 18, 2021
c17016f
Simplify _PyPreCmdline and PyConfig
methane Mar 18, 2021
03f971c
Update EncodingWarning doc
methane Mar 18, 2021
9d26b7a
Update document
methane Mar 19, 2021
60e74cf
tweak warning message
methane Mar 19, 2021
a505b5f
Use stacklevel=2 for text_encoding() default
methane Mar 19, 2021
cbe22e2
fixup
methane Mar 19, 2021
a9f9f04
tweak for readability
methane Mar 19, 2021
3bea88f
make clinic
methane Mar 19, 2021
d260a4c
fix doc build error
methane Mar 19, 2021
049a269
tweak warning message
methane Mar 19, 2021
018ba64
fixup
methane Mar 19, 2021
3a9623e
Fix subprocess
methane Mar 23, 2021
737059e
Update Doc/library/io.rst
methane Mar 23, 2021
6a62211
Update Doc/library/io.rst
methane Mar 23, 2021
54c7dc6
Update Doc/library/io.rst
methane Mar 23, 2021
5b2830b
Update Doc/library/io.rst
methane Mar 23, 2021
14f2a6e
Apply suggestions from code review
methane Mar 23, 2021
06e2a32
Move EncodingWarnings
methane Mar 23, 2021
27d49d2
fix comment
methane Mar 23, 2021
80f4644
fix text_encoding() docstring
methane Mar 23, 2021
6ad0e7f
update what's new
methane Mar 23, 2021
73b27f1
fix doc build
methane Mar 23, 2021
c149d65
Update Doc/library/io.rst
methane Mar 24, 2021
4eb7655
Apply suggestions from code review
methane Mar 24, 2021
e3bce76
Apply suggestions from code review
methane Mar 24, 2021
c089fd7
Update Doc/library/io.rst
methane Mar 24, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions Doc/c-api/init_config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -583,6 +583,15 @@ PyConfig

Default: ``0``.

.. c:member:: int warn_default_encoding

If non-zero, emit a :exc:`EncodingWarning` warning when :class:`io.TextIOWrapper`
uses its default encoding. See :ref:`io-encoding-warning` for details.

Default: ``0``.

.. versionadded:: 3.10

.. c:member:: wchar_t* check_hash_pycs_mode

Control the validation behavior of hash-based ``.pyc`` files:
Expand Down
9 changes: 9 additions & 0 deletions Doc/library/exceptions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -741,6 +741,15 @@ The following exceptions are used as warning categories; see the
Base class for warnings related to Unicode.


.. exception:: EncodingWarning

Base class for warnings related to encodings.

See :ref:`io-encoding-warning` for details.

.. versionadded:: 3.10


.. exception:: BytesWarning

Base class for warnings related to :class:`bytes` and :class:`bytearray`.
Expand Down
81 changes: 81 additions & 0 deletions Doc/library/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,56 @@ stream by opening a file in binary mode with buffering disabled::
The raw stream API is described in detail in the docs of :class:`RawIOBase`.


.. _io-text-encoding:

Text Encoding
-------------

The default encoding of :class:`TextIOWrapper` and :func:`open` is
locale-specific (:func:`locale.getpreferredencoding(False) <locale.getpreferredencoding>`).

However, many developers forget to specify the encoding when opening text files
encoded in UTF-8 (e.g. JSON, TOML, Markdown, etc...) since most Unix
platforms use UTF-8 locale by default. This causes bugs because the locale
encoding is not UTF-8 for most Windows users. For example::

# May not work on Windows when non-ASCII characters in the file.
with open("README.md") as f:
long_description = f.read()

Additionally, while there is no concrete plan as of yet, Python may change
the default text file encoding to UTF-8 in the future.

Accordingly, it is highly recommended that you specify the encoding
explicitly when opening text files. If you want to use UTF-8, pass
``encoding="utf-8"``. To use the current locale encoding,
``encoding="locale"`` is supported in Python 3.10.

When you need to run existing code on Windows that attempts to opens
UTF-8 files using the default locale encoding, you can enable the UTF-8
mode. See :ref:`UTF-8 mode on Windows <win-utf8-mode>`.

.. _io-encoding-warning:

Opt-in EncodingWarning
^^^^^^^^^^^^^^^^^^^^^^

.. versionadded:: 3.10
See :pep:`597` for more details.

To find where the default locale encoding is used, you can enable
the ``-X warn_default_encoding`` command line option or set the
:envvar:`PYTHONWARNDEFAULTENCODING` environment variable, which will
emit an :exc:`EncodingWarning` when the default encoding is used.

If you are providing an API that uses :func:`open` or
:class:`TextIOWrapper` and passes ``encoding=None`` as a parameter, you
can use :func:`text_encoding` so that callers of the API will emit an
:exc:`EncodingWarning` if they don't pass an ``encoding``. However,
please consider using UTF-8 by default (i.e. ``encoding="utf-8"``) for
new APIs.


High-level Module Interface
---------------------------

Expand Down Expand Up @@ -143,6 +193,32 @@ High-level Module Interface
.. versionadded:: 3.8


.. function:: text_encoding(encoding, stacklevel=2)

This is a helper function for callables that use :func:`open` or
:class:`TextIOWrapper` and have an ``encoding=None`` parameter.

This function returns *encoding* if it is not ``None`` and ``"locale"`` if
*encoding* is ``None``.

This function emits an :class:`EncodingWarning` if
:data:`sys.flags.warn_default_encoding <sys.flags>` is true and *encoding*
is None. *stacklevel* specifies where the warning is emitted.
For example::

def read_text(path, encoding=None):
encoding = io.text_encoding(encoding) # stacklevel=2
with open(path, encoding) as f:
return f.read()

In this example, an :class:`EncodingWarning` is emitted for the caller of
``read_text()``.

See :ref:`io-text-encoding` for more information.

.. versionadded:: 3.10


.. exception:: BlockingIOError

This is a compatibility alias for the builtin :exc:`BlockingIOError`
Expand Down Expand Up @@ -879,6 +955,8 @@ Text I/O
*encoding* gives the name of the encoding that the stream will be decoded or
encoded with. It defaults to
:func:`locale.getpreferredencoding(False) <locale.getpreferredencoding>`.
``encoding="locale"`` can be used to specify the current locale's encoding
explicitly. See :ref:`io-text-encoding` for more information.

*errors* is an optional string that specifies how encoding and decoding
errors are to be handled. Pass ``'strict'`` to raise a :exc:`ValueError`
Expand Down Expand Up @@ -930,6 +1008,9 @@ Text I/O
locale encoding using :func:`locale.setlocale`, use the current locale
encoding instead of the user preferred encoding.

.. versionchanged:: 3.10
The *encoding* argument now supports the ``"locale"`` dummy encoding name.

:class:`TextIOWrapper` provides these data attributes and methods in
addition to those from :class:`TextIOBase` and :class:`IOBase`:

Expand Down
15 changes: 15 additions & 0 deletions Doc/using/cmdline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -453,6 +453,9 @@ Miscellaneous options
* ``-X pycache_prefix=PATH`` enables writing ``.pyc`` files to a parallel
tree rooted at the given directory instead of to the code tree. See also
:envvar:`PYTHONPYCACHEPREFIX`.
* ``-X warn_default_encoding`` issues a :class:`EncodingWarning` when the
locale-specific default encoding is used for opening files.
See also :envvar:`PYTHONWARNDEFAULTENCODING`.

It also allows passing arbitrary values and retrieving them through the
:data:`sys._xoptions` dictionary.
Expand Down Expand Up @@ -482,6 +485,9 @@ Miscellaneous options

The ``-X showalloccount`` option has been removed.

.. versionadded:: 3.10
The ``-X warn_default_encoding`` option.

.. deprecated-removed:: 3.9 3.10
The ``-X oldparser`` option.

Expand Down Expand Up @@ -907,6 +913,15 @@ conflict.

.. versionadded:: 3.7

.. envvar:: PYTHONWARNDEFAULTENCODING

If this environment variable is set to a non-empty string, issue a
:class:`EncodingWarning` when the locale-specific default encoding is used.

See :ref:`io-encoding-warning` for details.

.. versionadded:: 3.10


Debug-mode variables
~~~~~~~~~~~~~~~~~~~~
Expand Down
24 changes: 24 additions & 0 deletions Doc/whatsnew/3.10.rst
Original file line number Diff line number Diff line change
Expand Up @@ -444,6 +444,30 @@ For the full specification see :pep:`634`. Motivation and rationale
are in :pep:`635`, and a longer tutorial is in :pep:`636`.


.. _whatsnew310-pep597:

Optional ``EncodingWarning`` and ``encoding="locale"`` option
-------------------------------------------------------------

The default encoding of :class:`TextIOWrapper` and :func:`open` is
platform and locale dependent. Since UTF-8 is used on most Unix
platforms, omitting ``encoding`` option when opening UTF-8 files
(e.g. JSON, YAML, TOML, Markdown) is very common bug. For example::

# BUG: "rb" mode or encoding="utf-8" should be used.
with open("data.json") as f:
data = json.laod(f)

To find this type of bugs, optional ``EncodingWarning`` is added.
It is emitted when :data:`sys.flags.warn_default_encoding <sys.flags>`
is true and locale-specific default encoding is used.

``-X warn_default_encoding`` option and :envvar:`PYTHONWARNDEFAULTENCODING`
are added to enable the warning.

See :ref:`io-text-encoding` for more information.


New Features Related to Type Annotations
========================================

Expand Down
1 change: 1 addition & 0 deletions Include/cpython/initconfig.h
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,7 @@ typedef struct PyConfig {
PyWideStringList warnoptions;
int site_import;
int bytes_warning;
int warn_default_encoding;
int inspect;
int interactive;
int optimization_level;
Expand Down
1 change: 1 addition & 0 deletions Include/internal/pycore_initconfig.h
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ typedef struct {
int isolated; /* -I option */
int use_environment; /* -E option */
int dev_mode; /* -X dev and PYTHONDEVMODE */
int warn_default_encoding; /* -X warn_default_encoding and PYTHONWARNDEFAULTENCODING */
} _PyPreCmdline;

#define _PyPreCmdline_INIT \
Expand Down
1 change: 1 addition & 0 deletions Include/pyerrors.h
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ PyAPI_DATA(PyObject *) PyExc_FutureWarning;
PyAPI_DATA(PyObject *) PyExc_ImportWarning;
PyAPI_DATA(PyObject *) PyExc_UnicodeWarning;
PyAPI_DATA(PyObject *) PyExc_BytesWarning;
PyAPI_DATA(PyObject *) PyExc_EncodingWarning;
PyAPI_DATA(PyObject *) PyExc_ResourceWarning;


Expand Down
47 changes: 37 additions & 10 deletions Lib/_pyio.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,29 @@
_CHECK_ERRORS = _IOBASE_EMITS_UNRAISABLE


def text_encoding(encoding, stacklevel=2):
"""
A helper function to choose the text encoding.

When encoding is not None, just return it.
Otherwise, return the default text encoding (i.e. "locale").

This function emits an EncodingWarning if *encoding* is None and
sys.flags.warn_default_encoding is true.

This can be used in APIs with an encoding=None parameter
that pass it to TextIOWrapper or open.
However, please consider using encoding="utf-8" for new APIs.
"""
if encoding is None:
encoding = "locale"
if sys.flags.warn_default_encoding:
import warnings
warnings.warn("'encoding' argument not specified.",
EncodingWarning, stacklevel + 1)
return encoding


def open(file, mode="r", buffering=-1, encoding=None, errors=None,
newline=None, closefd=True, opener=None):

Expand Down Expand Up @@ -248,6 +271,7 @@ def open(file, mode="r", buffering=-1, encoding=None, errors=None,
result = buffer
if binary:
return result
encoding = text_encoding(encoding)
text = TextIOWrapper(buffer, encoding, errors, newline, line_buffering)
result = text
text.mode = mode
Expand Down Expand Up @@ -2004,19 +2028,22 @@ class TextIOWrapper(TextIOBase):
def __init__(self, buffer, encoding=None, errors=None, newline=None,
line_buffering=False, write_through=False):
self._check_newline(newline)
if encoding is None:
encoding = text_encoding(encoding)

if encoding == "locale":
try:
encoding = os.device_encoding(buffer.fileno())
encoding = os.device_encoding(buffer.fileno()) or "locale"
except (AttributeError, UnsupportedOperation):
pass
if encoding is None:
try:
import locale
except ImportError:
# Importing locale may fail if Python is being built
encoding = "ascii"
else:
encoding = locale.getpreferredencoding(False)

if encoding == "locale":
try:
import locale
except ImportError:
# Importing locale may fail if Python is being built
encoding = "utf-8"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw what you did there! :-D Mention it in the final commit message (I didn't read your 24 commit messages, GitHub UI isn't convenient for that :-( ).

else:
encoding = locale.getpreferredencoding(False)

if not isinstance(encoding, str):
raise ValueError("invalid encoding: %r" % encoding)
Expand Down
1 change: 1 addition & 0 deletions Lib/bz2.py
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,7 @@ def open(filename, mode="rb", compresslevel=9,
binary_file = BZ2File(filename, bz_mode, compresslevel=compresslevel)

if "t" in mode:
encoding = io.text_encoding(encoding)
return io.TextIOWrapper(binary_file, encoding, errors, newline)
else:
return binary_file
Expand Down
1 change: 1 addition & 0 deletions Lib/configparser.py
Original file line number Diff line number Diff line change
Expand Up @@ -690,6 +690,7 @@ def read(self, filenames, encoding=None):
"""
if isinstance(filenames, (str, bytes, os.PathLike)):
filenames = [filenames]
encoding = io.text_encoding(encoding)
read_ok = []
for filename in filenames:
try:
Expand Down
1 change: 1 addition & 0 deletions Lib/gzip.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ def open(filename, mode="rb", compresslevel=_COMPRESS_LEVEL_BEST,
raise TypeError("filename must be a str or bytes object, or a file")

if "t" in mode:
encoding = io.text_encoding(encoding)
return io.TextIOWrapper(binary_file, encoding, errors, newline)
else:
return binary_file
Expand Down
2 changes: 1 addition & 1 deletion Lib/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
from _io import (DEFAULT_BUFFER_SIZE, BlockingIOError, UnsupportedOperation,
open, open_code, FileIO, BytesIO, StringIO, BufferedReader,
BufferedWriter, BufferedRWPair, BufferedRandom,
IncrementalNewlineDecoder, TextIOWrapper)
IncrementalNewlineDecoder, text_encoding, TextIOWrapper)

OpenWrapper = _io.open # for compatibility with _pyio

Expand Down
1 change: 1 addition & 0 deletions Lib/lzma.py
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,7 @@ def open(filename, mode="rb", *,
preset=preset, filters=filters)

if "t" in mode:
encoding = io.text_encoding(encoding)
return io.TextIOWrapper(binary_file, encoding, errors, newline)
else:
return binary_file
Expand Down
4 changes: 4 additions & 0 deletions Lib/pathlib.py
Original file line number Diff line number Diff line change
Expand Up @@ -1241,6 +1241,8 @@ def open(self, mode='r', buffering=-1, encoding=None,
Open the file pointed by this path and return a file object, as
the built-in open() function does.
"""
if "b" not in mode:
encoding = io.text_encoding(encoding)
return io.open(self, mode, buffering, encoding, errors, newline,
opener=self._opener)

Expand All @@ -1255,6 +1257,7 @@ def read_text(self, encoding=None, errors=None):
"""
Open the file in text mode, read it, and close the file.
"""
encoding = io.text_encoding(encoding)
with self.open(mode='r', encoding=encoding, errors=errors) as f:
return f.read()

Expand All @@ -1274,6 +1277,7 @@ def write_text(self, data, encoding=None, errors=None, newline=None):
if not isinstance(data, str):
raise TypeError('data must be str, not %s' %
data.__class__.__name__)
encoding = io.text_encoding(encoding)
with self.open(mode='w', encoding=encoding, errors=errors, newline=newline) as f:
return f.write(data)

Expand Down
4 changes: 3 additions & 1 deletion Lib/site.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,9 @@ def addpackage(sitedir, name, known_paths):
fullname = os.path.join(sitedir, name)
_trace(f"Processing .pth file: {fullname!r}")
try:
f = io.TextIOWrapper(io.open_code(fullname))
# locale encoding is not ideal especially on Windows. But we have used
# it for a long time. setuptools uses the locale encoding too.
f = io.TextIOWrapper(io.open_code(fullname), encoding="locale")
except OSError:
return
with f:
Expand Down
Loading