-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
Should we support unicode in width/precision formatting fields? #135025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It was perhaps overlooked in Python 2 to Python 3 transition. I think that non-ASCII digits should be deprecated here. |
Maybe. My archeological research traced this story down to the PEP 3101 implementation. Note that PEP text is vague about width/precision fields. It says:
Though, I see no tests for this "feature".
I think so. On another hand, support doesn't look too costly and it should be easy to adjust the documentation and the fractions module code. CC @ericvsmith |
If we were doing it all over again, I'd argue that we should accept only ASCII numbers in format strings (for precision and width). At this point, as much as I'd like to deprecate it, I think we should just document it and move on. And I guess if we do that, we should allow it for Fractions, too. |
Does make sense. But,
maybe we can leave things as is here? This will complicate things for alternative implementations for no good reasons. (I also guess that this "feature" was introduced unintentionally.) It seems, PyPy3.11 doesn't support it: Python 3.11.11 (0253c85bf5f8, Feb 26 2025, 10:42:42)
[PyPy 7.3.19 with GCC 10.2.1 20210130 (Red Hat 10.2.1-11)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>> f"{float('123'):.١١f}"
Traceback (most recent call last):
File "<python-input-0>", line 1, in <module>
f"{float('123'):.١١f}"
ValueError: no precision given |
Accepting non-ASCII digits creates security risks, because some non-ASCII digits look like ASCII digits with different value. >>> format(1/7, '.੪')
'0.1429' Note that while non-ASCII digits are accepted in string to number conversion, they are not accepted in Python numerical literals. >>> float('੪.੫')
4.5
>>> ੪.੫
File "<python-input-1>", line 1
੪.੫
^
SyntaxError: invalid character '੪' (U+0A6A) Support of non-ASCII digits in regular expressions was deprecated in 3.11 and removed in 3.12 (see #91760). We are currently in process of making See also https://peps.python.org/pep-0672/#confusable-digits . cc @encukou |
I'd be +1 for deprecating these everywhere, including the We can't support all numeral systems anyway. From that point of view, supporting ones that use decimal digits is a rather arbitrary choice. ASCII only is consistent, predictable, and ultimately more secure. But I'm -1 for only deprecating this in less-important places, like formatting fields. That just feels like a way to avoid the discussion. If we, as the CPython project, have an opinion on this, it should be clear and consistent. For documentation, I think this should be pointed out as a CPython implementation detail. Other implementations should be free to not support it, if they don't mind the incompatibility. |
Uh oh!
There was an error while loading. Please reload this page.
Bug report
Bug description:
Currently, specification allows only
[0-9]
digits. Though, actual implementation permits unicode symbols for float/Decimal's, but not Fraction's:Quick tests shows no measurable performance penalty with unicode support:
a patch
CPython versions tested on:
CPython main branch
Operating systems tested on:
No response
The text was updated successfully, but these errors were encountered: