Closed
Description
This is really a corner case, but I ran across the problem today. The unicode data file for east asian widths states:
# - All code points, assigned or unassigned, that are not listed
# explicitly are given the value "N".
However, that seems to not be true in the unicodedata
module, eg:
$ python3
Python 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> char = chr(0xfe75) # arbitrary unassigned code point
>>> unicodedata.name(char)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: no such name
>>> unicodedata.east_asian_width(char)
'F'
I'd be happy to fix this, if people agree that it should be fixed. FWIW, PyPy has always returned 'N' in this situation. For assigned code points everything is fine.