Skip to content

formatting of single character codes in strings is truncated to 1 byte #3364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
goatchurchprime opened this issue Oct 10, 2017 · 3 comments
Labels

Comments

@goatchurchprime
Copy link

The problem is the %c or {:c} which is supposed to format a number into a unicode character, but in Micropython (tested on the esp8266 and esp32) seems to truncate the parameter at one byte.

MicroPython v1.9.2-276-ga9517c04 on 2017-10-09; ESP32 module with ESP32
Type "help()" for more information.
>>> print("a%cb" % (65+256))
aAb
>>> print("a{:c}b".format(65+256))
aAb
>>> print("a{:s}b".format(chr(65+256)))
aŁb
Python 3.6.1 |Anaconda custom (64-bit)| (default, May 11 2017, 13:09:58) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print("a%cb" % (65+256))
aŁb
>>> print("a{:c}b".format(65+256))
aŁb
>>> print("a{:c}b".format(65))
aAb
>>> print("a{:c}".format(0x10189))
a𐆉

(Hm. I did not know that unicode went beyond 2 bytes. That's got to be a hassle to implement.)

@ajlennon
Copy link

ajlennon commented Oct 10, 2017

My guess is this is related to use of the underlying printf() function which is probably ASCII based on the ESP8266? Sometimes it can be a build time define to switch between byte-wide ASCII and word-wide Unicode...

ref: #209

You might also want to have a look at how strings are stored internally - 8 bit or 16 bit as this can cause problems if you're expecting to deal with the wrong thing / wrong size of array etc.

@dpgeorge
Copy link
Member

Yes this is an issue. Unicode is not handled well in string formatting functions, for example there are related issues to width formatting, eg print('|%4s|' % '\u0180') only prints 2 spaces of padding, not 3 as expected.

@pfalcon
Copy link
Contributor

pfalcon commented Oct 28, 2017

This is prio-low, because CPython-compatible workarounds exist (use "%s", use chr(), etc.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants