Skip to content

PEP 597: Apply grammar, syntax and polish fixes, and clarify phrasing and terminology #1887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 4, 2021
Merged
227 changes: 117 additions & 110 deletions pep-0597.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,17 @@ Python-Version: 3.10
Abstract
========

Add a new warning category ``EncodingWarning``. It is emitted when
``encoding`` option is omitted and the default encoding is a locale
encoding.
Add a new warning category ``EncodingWarning``. It is emitted when the
``encoding`` argument to ``open()`` is omitted and the default
locale-specific encoding is used.

The warning is disabled by default. New ``-X warn_default_encoding``
command-line option and ``PYTHONWARNDEFAULTENCODING`` environment
variable are used to enable the warnings.
The warning is disabled by default. A new ``-X warn_default_encoding``
command-line option and a new ``PYTHONWARNDEFAULTENCODING`` environment
variable can be used to enable it.

``encoding="locale"`` option is added too. It is used to specify
locale encoding explicitly.
A ``"locale"`` argument value for ``encoding`` is added too. It
explicitly specifies that the locale encoding should be used, silencing
the warning.


Motivation
Expand All @@ -33,135 +34,142 @@ Using the default encoding is a common mistake
Developers using macOS or Linux may forget that the default encoding
is not always UTF-8.

For example, ``long_description = open("README.md").read()`` in
``setup.py`` is a common mistake. Many Windows users can not install
the package if there is at least one non-ASCII character (e.g. emoji)
in the ``README.md`` file which is encoded in UTF-8.
For example, using ``long_description = open("README.md").read()`` in
``setup.py`` is a common mistake. Many Windows users cannot install
such packages if there is at least one non-ASCII character
(e.g. emoji, author names, copyright symbols, and the like)
in their UTF-8-encoded ``README.md`` file.

For example, 489 packages of the 4000 most downloaded packages from
PyPI used non-ASCII characters in README. And 82 packages of them
can not be installed from source package when locale encoding is
ASCII. [1]_ They used the default encoding to read README or TOML
file.
Of the 4000 most downloaded packages from PyPI, 489 use non-ASCII
characters in their README, and 82 fail to install from source on
non-UTF-8 locales due to not specifying an encoding for a non-ASCII
file. [1]_

Another example is ``logging.basicConfig(filename="log.txt")``.
Some users expect UTF-8 is used by default, but locale encoding is
used actually. [2]_
Some users might expect it to use UTF-8 by default, but the locale
encoding is actually what is used. [2]_

Even Python experts assume that default encoding is UTF-8.
It creates bugs that happen only on Windows. See [3]_, [4]_, [5]_,
Even Python experts may assume that the default encoding is UTF-8.
This creates bugs that only happen on Windows; see [3]_, [4]_, [5]_,
and [6]_ for example.

Emitting a warning when the ``encoding`` option is omitted will help
to find such mistakes.
Emitting a warning when the ``encoding`` argument is omitted will help
find such mistakes.


Explicit way to use locale-specific encoding
--------------------------------------------

``open(filename)`` isn't explicit about which encoding is expected:

* Expects ASCII (not a bug, but inefficient on Windows)
* Expects UTF-8 (bug or platform-specific script)
* Expects the locale encoding.
* If ASCII is assumed, this isn't a bug, but may result in decreased
performance on Windows, particularly with non-Latin-1 locale encodings
* If UTF-8 is assumed, this may be a bug or a platform-specific script
* If the locale encoding is assumed, the behavior is as expected
(but could change if future versions of Python modify the default)

In this point of view, ``open(filename)`` is not readable.
From this point of view, ``open(filename)`` is not readable code.

``encoding=locale.getpreferredencoding(False)`` can be used to
specify the locale encoding explicitly. But it is too long and easy
to misuse. (e.g. forget to pass ``False`` to its parameter)
specify the locale encoding explicitly, but it is too long and easy
to misuse (e.g. one can forget to pass ``False`` as its argument).

This PEP provides an explicit way to specify the locale encoding.


Prepare to change the default encoding to UTF-8
-----------------------------------------------

Since UTF-8 becomes de-facto standard text encoding, we might change
the default text encoding to UTF-8 in the future.
Since UTF-8 has become the de-facto standard text encoding,
we might default to it for opening files in the future.

But this change will affect many applications and libraries. If we
start emitting ``DeprecationWarning`` everywhere ``encoding`` option
is omitted, it will be too noisy and painful.
However, such a change will affect many applications and libraries.
If we start emitting ``DeprecationWarning`` everywhere the ``encoding``
argument is omitted, it will be too noisy and painful.

Although this PEP doesn't propose to change the default encoding,
this PEP will help the change:
Although this PEP doesn't propose changing the default encoding,
it will help enable that change by:

* Reduce the number of omitted ``encoding`` options in many libraries
before we start emitting the ``DeprecationWarning`` by default.
* Reducing the number of omitted ``encoding`` arguments in libraries
before we start emitting a ``DeprecationWarning`` by default.

* Users will be able to use ``encoding="locale"`` option to suppress
the warning without dropping Python 3.10 support.
* Allowing users to pass ``encoding="locale"`` to suppress
the current warning and any ``DeprecationWarning`` added in the future,
as well as retaining consistent behavior if later Python versions
change the default, ensuring support for any Python version >=3.10.


Specification
=============

``EncodingWarning``
--------------------
-------------------

Add a new ``EncodingWarning`` warning class which is a subclass of
``Warning``. It is used to warn when the ``encoding`` option is
omitted and the default encoding is locale-specific.
Add a new ``EncodingWarning`` warning class as a subclass of
``Warning``. It is emitted when the ``encoding`` argument is omitted and
the default locale-specific encoding is used.


Options to enable the warning
------------------------------
-----------------------------

``-X warn_default_encoding`` option and the
The ``-X warn_default_encoding`` option and the
``PYTHONWARNDEFAULTENCODING`` environment variable are added. They
are used to enable ``EncodingWarning``.

``sys.flags.encoding_warning`` is also added. The flag represents
``sys.flags.warn_default_encoding`` is also added. The flag is true when
``EncodingWarning`` is enabled.

When the option is enabled, ``io.TextIOWrapper()``, ``open()``, and
other modules using them will emit ``EncodingWarning`` when the
``encoding`` is omitted.
When the flag is set, ``io.TextIOWrapper()``, ``open()`` and other
modules using them will emit ``EncodingWarning`` when the ``encoding``
argument is omitted.

Since ``EncodingWarning`` is a subclass of ``Warning``, they are
shown by default, unlike ``DeprecationWarning``.
shown by default (if the ``warn_default_encoding`` flag is set), unlike
``DeprecationWarning``.


``encoding="locale"`` option
----------------------------
``encoding="locale"``
---------------------

``io.TextIOWrapper`` accepts ``encoding="locale"`` option. It means
same to current ``encoding=None``. But ``io.TextIOWrapper`` doesn't
emit ``EncodingWarning`` when ``encoding="locale"`` is specified.
``io.TextIOWrapper`` will accept ``"locale"`` as a valid argument to
``encoding``. It has the same meaning as the current ``encoding=None``,
except that ``io.TextIOWrapper`` doesn't emit ``EncodingWarning`` when
``encoding="locale"`` is specified.


``io.text_encoding()``
-----------------------
----------------------

``io.text_encoding()`` is a helper function for functions having
``encoding=None`` option and passing it to ``io.TextIOWrapper()`` or
``io.text_encoding()`` is a helper for functions with an
``encoding=None`` parameter that pass it to ``io.TextIOWrapper()`` or
``open()``.

Pure Python implementation will be like this::
A pure Python implementation will look like this::

def text_encoding(encoding, stacklevel=1):
"""Helper function to choose the text encoding.
"""A helper function to choose the text encoding.

When *encoding* is not None, just return it.
Otherwise, return the default text encoding (i.e., "locale").
Otherwise, return the default text encoding (i.e. "locale").

This function emits EncodingWarning if *encoding* is None and
sys.flags.encoding_warning is true.
This function emits an EncodingWarning if *encoding* is None and
sys.flags.warn_default_encoding is true.

This function can be used in APIs having encoding=None option
and pass it to TextIOWrapper or open.
But please consider using encoding="utf-8" for new APIs.
This function can be used in APIs with an encoding=None parameter
that pass it to TextIOWrapper or open.
However, please consider using encoding="utf-8" for new APIs.
"""
if encoding is None:
if sys.flags.encoding_warning:
if sys.flags.warn_default_encoding:
import warnings
warnings.warn("'encoding' option is omitted",
EncodingWarning, stacklevel + 2)
warnings.warn(
"'encoding' argument not specified.",
EncodingWarning, stacklevel + 2)
encoding = "locale"
return encoding

For example, ``pathlib.Path.read_text()`` can use the function like:
For example, ``pathlib.Path.read_text()`` can use it like this:

.. code-block::

Expand All @@ -174,18 +182,18 @@ By using ``io.text_encoding()``, ``EncodingWarning`` is emitted for
the caller of ``read_text()`` instead of ``read_text()`` itself.


Affected stdlibs
-----------------
Affected standard library modules
---------------------------------

Many stdlibs will be affected by this change.
Many standard library modules will be affected by this change.

Most APIs accepting ``encoding=None`` will use ``io.text_encoding()``
as written in the previous section.

Where using locale encoding as the default encoding is reasonable,
Where using the locale encoding as the default encoding is reasonable,
``encoding="locale"`` will be used instead. For example,
the ``subprocess`` module will use locale encoding for the default
encoding of the pipes.
the ``subprocess`` module will use the locale encoding as the default
for pipes.

Many tests use ``open()`` without ``encoding`` specified to read
ASCII text files. They should be rewritten with ``encoding="ascii"``.
Expand All @@ -195,11 +203,11 @@ Rationale
=========

Opt-in warning
---------------
--------------

Although ``DeprecationWarning`` is suppressed by default, emitting
``DeprecationWarning`` always when the ``encoding`` option is omitted
would be too noisy.
Although ``DeprecationWarning`` is suppressed by default, always
emitting ``DeprecationWarning`` when the ``encoding`` argument is
omitted would be too noisy.

Noisy warnings may lead developers to dismiss the
``DeprecationWarning``.
Expand All @@ -208,43 +216,43 @@ Noisy warnings may lead developers to dismiss the
"locale" is not a codec alias
-----------------------------

We don't add the "locale" to the codec alias because locale can be
changed in runtime.
We don't add "locale" as a codec alias because the locale can be
changed at runtime.

Additionally, ``TextIOWrapper`` checks ``os.device_encoding()``
when ``encoding=None``. This behavior can not be implemented in
the codec.
when ``encoding=None``. This behavior cannot be implemented in
a codec.


Backward Compatibility
======================

The new warning is not emitted by default. So this PEP is 100%
backward compatible.
The new warning is not emitted by default, so this PEP is 100%
backwards-compatible.


Forward Compatibility
=====================

``encoding="locale"`` option is not forward compatible. Codes
using the option will not work on Python older than 3.10. It will
raise ``LookupError: unknown encoding: locale``.
Passing ``"locale"`` as the argument to ``encoding`` is not
forward-compatible. Code using it will not work on Python older than
3.10, and will instead raise ``LookupError: unknown encoding: locale``.

Until developers can drop Python 3.9 support, ``EncodingWarning``
can be used only for finding missing ``encoding="utf-8"`` options.
can only be used for finding missing ``encoding="utf-8"`` arguments.


How to teach this
How to Teach This
=================

For new users
-------------

Since ``EncodingWarning`` is used to write a cross-platform code,
no need to teach it to new users.
Since ``EncodingWarning`` is used to write cross-platform code,
there is no need to teach it to new users.

We can just recommend using UTF-8 for text files and use
``encoding="utf-8"`` when opening test files.
We can just recommend using UTF-8 for text files and using
``encoding="utf-8"`` when opening them.


For experienced users
Expand All @@ -257,9 +265,9 @@ default encoding.
You can use ``-X warn_default_encoding`` or
``PYTHONWARNDEFAULTENCODING=1`` to find this type of mistake.

Omitting ``encoding`` option is not a bug when opening text files
encoded in locale encoding. But ``encoding="locale"`` is recommended
after Python 3.10 because it is more explicit.
Omitting the ``encoding`` argument is not a bug when opening text files
encoded in the locale encoding, but ``encoding="locale"`` is recommended
in Python 3.10 and later because it is more explicit.


Reference Implementation
Expand All @@ -277,22 +285,21 @@ https://mail.python.org/archives/list/python-dev@python.org/thread/SFYUP2TWD5JZ5

* Why not implement this in linters?

* ``encoding="locale"`` and ``io.text_encoding()`` must be in
Python.
* ``encoding="locale"`` and ``io.text_encoding()`` must be implemented
in Python.

* It is difficult to find all caller of functions wrapping
``open()`` or ``TextIOWrapper()``. (See ``io.text_encoding()``
section.)
* It is difficult to find all callers of functions wrapping
``open()`` or ``TextIOWrapper()`` (see the ``io.text_encoding()``
section).

* Many developers will not use the option.

* Some developers use the option and report the warnings to
libraries they use. So the option is worth enough even though
many developers won't use it.
* Some will, and report the warnings to libraries they use,
so the option is worth it even if many developers don't enable it.

* For example, I find [7]_ and [8]_ by running
``pip install -U pip`` and find [9]_ by running ``tox``
with the reference implementation. It demonstrates how this
* For example, I found [7]_ and [8]_ by running
``pip install -U pip``, and [9]_ by running ``tox``
with the reference implementation. This demonstrates how this
option can be used to find potential issues.


Expand Down