py: Implement partial PEP-498 (f-string) support (v2) #6247

jimmo · 2020-07-16T03:10:48Z

This is a very minor fixup of #4998 by @klardotsh, just to address the outstanding review comments (code formatting, testing across all ports, etc).

Also enables by default on ESP32 and STM32 (as well as Unix and Windows from the original PR).

Thanks to @klardotsh for doing all the hard work implementing this!! Original PR description below:

This implements (most of) the PEP-498 spec for f-strings, with two
exceptions:

raw f-strings (fr or rf prefixes) raise NotImplementedError
one special corner case does not function as specified in the PEP
(more on that in a moment)

This is implemented in the core as a syntax translation, brute-forcing
all f-strings to run through String.format. For example, the statement
x='world'; print(f'hello {x}') gets translated at a syntax level
(injected into the lexer) to x='world'; print('hello {}'.format(x)).
While this may lead to weird column results in tracebacks, it seemed
like the fastest, most efficient, and likely most RAM-friendly option,
despite being implemented under the hood with a completely separate
vstr_t.

Since string concatenation of adjacent literals is implemented in the
lexer,
two side effects emerge:

All strings with at least one f-string portion are concatenated into a
single literal which must be run through String.format() wholesale,
and:
Concatenation of a raw string with interpolation characters with an
f-string will cause IndexError/KeyError, which is both different
from CPython and different from the corner case mentioned in the PEP
(which gave an example of the following:)

x = 10
y = 'hi'
assert ('a' 'b' f'{x}' '{c}' f'str<{y:^4}>' 'd' 'e') == 'ab10{c}str< hi >de'

The above-linked commit detailed a pretty solid case for leaving string
concatenation in the lexer rather than putting it in the parser, and
undoing that decision would likely be disproportionately costly on
resources for the sake of a probably-low-impact corner case. An
alternative to become complaint with this corner case of the PEP would
be to revert to string concatenation in the parser only when an
f-string is part of concatenation, though I've done no investigation on
the difficulty or costs of doing this.

A decent set of tests is included. I've manually tested this on the
unix port on Linux and on a Feather M4 Express (atmel-samd) and
things seem sane.

jimmo · 2020-07-16T03:36:28Z

This is currently +432 bytes on PYBV11.

This implements (most of) the PEP-498 spec for f-strings, with two exceptions: - raw f-strings (`fr` or `rf` prefixes) raise `NotImplementedError` - one special corner case does not function as specified in the PEP (more on that in a moment) This is implemented in the core as a syntax translation, brute-forcing all f-strings to run through `String.format`. For example, the statement `x='world'; print(f'hello {x}')` gets translated *at a syntax level* (injected into the lexer) to `x='world'; print('hello {}'.format(x))`. While this may lead to weird column results in tracebacks, it seemed like the fastest, most efficient, and *likely* most RAM-friendly option, despite being implemented under the hood with a completely separate `vstr_t`. Since [string concatenation of adjacent literals is implemented in the lexer](micropython@534b7c3), two side effects emerge: - All strings with at least one f-string portion are concatenated into a single literal which *must* be run through `String.format()` wholesale, and: - Concatenation of a raw string with interpolation characters with an f-string will cause `IndexError`/`KeyError`, which is both different from CPython *and* different from the corner case mentioned in the PEP (which gave an example of the following:) ```python x = 10 y = 'hi' assert ('a' 'b' f'{x}' '{c}' f'str<{y:^4}>' 'd' 'e') == 'ab10{c}str< hi >de' ``` The above-linked commit detailed a pretty solid case for leaving string concatenation in the lexer rather than putting it in the parser, and undoing that decision would likely be disproportionately costly on resources for the sake of a probably-low-impact corner case. An alternative to become complaint with this corner case of the PEP would be to revert to string concatenation in the parser *only when an f-string is part of concatenation*, though I've done no investigation on the difficulty or costs of doing this. A decent set of tests is included. I've manually tested this on the `unix` port on Linux and on a Feather M4 Express (`atmel-samd`) and things seem sane.

stinos · 2020-09-03T11:21:02Z

ping :)

dpgeorge · 2020-09-03T14:28:42Z

This rebases cleanly on v1.13 and has the following code-size changes:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:  +960 +0.190% 
unix nanbox:  +720 +0.161% 
      stm32:  +432 +0.112% PYBV10
     cc3200:    +0 +0.000% 
    esp8266:    +4 +0.001% GENERIC
      esp32:  +524 +0.038% GENERIC[incl +48(data)]
        nrf:    +0 +0.000% pca10040
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS

So the feature can be completely disabled, which is good.

@jimmo what was the status of this, were there somethings to improve/test?

jimmo · 2020-09-04T03:23:32Z

I started writing more comprehensive tests and started to doubt whether this is the right approach. f-strings are complicated! That said, I don't have a better idea!

I don't think we need to aim for a perfect implementation (which I think is just about impossible to do in a sufficiently "micro" way), I wanted to characterise exactly what we do and don't support, and ensure that we do something sensible for the unsupported cases. Right now the lexer gets a bit confused by nested braces. (You can put f-strings inside f-strings).

I'll dig up my notes and add more details soon.

dpgeorge · 2020-09-04T04:26:39Z

Ok, thanks. I'm happy to support f-strings in just a simple way to start with, with known limitations. And "simple" may anyway end up being enough.

Add supervisor.set_usb_identification(manufacturer, product, vid, pid) function

jimmo force-pushed the topic-pep-498-fstrings-upstream-fixups branch from c398ff9 to bc03a1b Compare July 16, 2020 03:36

jimmo force-pushed the topic-pep-498-fstrings-upstream-fixups branch 3 times, most recently from 4786dc5 to 12e5f38 Compare July 16, 2020 04:50

klardotsh and others added 2 commits July 16, 2020 15:12

stm32,esp32: Enable MICROPY_PY_FSTRING by default.

6a9bccc

jimmo force-pushed the topic-pep-498-fstrings-upstream-fixups branch from 12e5f38 to 6a9bccc Compare July 16, 2020 05:13

dpgeorge added the py-core Relates to py/ directory in source label Sep 3, 2020

jimmo closed this Aug 12, 2021

jimmo mentioned this pull request Aug 12, 2021

py: Implement partial PEP-498 (f-string) support (v3) #7649

Closed

Wind-stormger pushed a commit to BPI-STEAM/micropython that referenced this pull request Oct 13, 2022

Merge pull request micropython#6247 from s-ol/descriptor-override

e045415

Add supervisor.set_usb_identification(manufacturer, product, vid, pid) function

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

py: Implement partial PEP-498 (f-string) support (v2) #6247

py: Implement partial PEP-498 (f-string) support (v2) #6247

Uh oh!

jimmo commented Jul 16, 2020

Uh oh!

jimmo commented Jul 16, 2020

Uh oh!

stinos commented Sep 3, 2020

Uh oh!

dpgeorge commented Sep 3, 2020

Uh oh!

jimmo commented Sep 4, 2020

Uh oh!

dpgeorge commented Sep 4, 2020

Uh oh!

Uh oh!

Uh oh!

py: Implement partial PEP-498 (f-string) support (v2) #6247

py: Implement partial PEP-498 (f-string) support (v2) #6247

Uh oh!

Conversation

jimmo commented Jul 16, 2020

Uh oh!

jimmo commented Jul 16, 2020

Uh oh!

stinos commented Sep 3, 2020

Uh oh!

dpgeorge commented Sep 3, 2020

Uh oh!

jimmo commented Sep 4, 2020

Uh oh!

dpgeorge commented Sep 4, 2020

Uh oh!

Uh oh!