formatting of single character codes in strings is truncated to 1 byte #3364

goatchurchprime · 2017-10-10T16:33:52Z

The problem is the %c or {:c} which is supposed to format a number into a unicode character, but in Micropython (tested on the esp8266 and esp32) seems to truncate the parameter at one byte.

MicroPython v1.9.2-276-ga9517c04 on 2017-10-09; ESP32 module with ESP32
Type "help()" for more information.
>>> print("a%cb" % (65+256))
aAb
>>> print("a{:c}b".format(65+256))
aAb
>>> print("a{:s}b".format(chr(65+256)))
aŁb

Python 3.6.1 |Anaconda custom (64-bit)| (default, May 11 2017, 13:09:58) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print("a%cb" % (65+256))
aŁb
>>> print("a{:c}b".format(65+256))
aŁb
>>> print("a{:c}b".format(65))
aAb
>>> print("a{:c}".format(0x10189))
a𐆉

(Hm. I did not know that unicode went beyond 2 bytes. That's got to be a hassle to implement.)

The text was updated successfully, but these errors were encountered:

ajlennon · 2017-10-10T17:47:47Z

My guess is this is related to use of the underlying printf() function which is probably ASCII based on the ESP8266? Sometimes it can be a build time define to switch between byte-wide ASCII and word-wide Unicode...

ref: #209

You might also want to have a look at how strings are stored internally - 8 bit or 16 bit as this can cause problems if you're expecting to deal with the wrong thing / wrong size of array etc.

dpgeorge · 2017-10-11T00:23:35Z

Yes this is an issue. Unicode is not handled well in string formatting functions, for example there are related issues to width formatting, eg print('|%4s|' % '\u0180') only prints 2 spaces of padding, not 3 as expected.

pfalcon · 2017-10-28T10:14:43Z

This is prio-low, because CPython-compatible workarounds exist (use "%s", use chr(), etc.).

dpgeorge added bug prio-low labels Oct 11, 2017

projectgus removed the prio-low label Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

formatting of single character codes in strings is truncated to 1 byte #3364

formatting of single character codes in strings is truncated to 1 byte #3364

goatchurchprime commented Oct 10, 2017

ajlennon commented Oct 10, 2017 •

edited

Loading

dpgeorge commented Oct 11, 2017

pfalcon commented Oct 28, 2017

formatting of single character codes in strings is truncated to 1 byte #3364

formatting of single character codes in strings is truncated to 1 byte #3364

Comments

goatchurchprime commented Oct 10, 2017

ajlennon commented Oct 10, 2017 • edited Loading

dpgeorge commented Oct 11, 2017

pfalcon commented Oct 28, 2017

ajlennon commented Oct 10, 2017 •

edited

Loading