Skip to content

py: Implement partial PEP-498 (f-string) support (v2) #6247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

jimmo
Copy link
Member

@jimmo jimmo commented Jul 16, 2020

This is a very minor fixup of #4998 by @klardotsh, just to address the outstanding review comments (code formatting, testing across all ports, etc).

Also enables by default on ESP32 and STM32 (as well as Unix and Windows from the original PR).

Thanks to @klardotsh for doing all the hard work implementing this!! Original PR description below:


This implements (most of) the PEP-498 spec for f-strings, with two
exceptions:

  • raw f-strings (fr or rf prefixes) raise NotImplementedError
  • one special corner case does not function as specified in the PEP
    (more on that in a moment)

This is implemented in the core as a syntax translation, brute-forcing
all f-strings to run through String.format. For example, the statement
x='world'; print(f'hello {x}') gets translated at a syntax level
(injected into the lexer) to x='world'; print('hello {}'.format(x)).
While this may lead to weird column results in tracebacks, it seemed
like the fastest, most efficient, and likely most RAM-friendly option,
despite being implemented under the hood with a completely separate
vstr_t.

Since string concatenation of adjacent literals is implemented in the
lexer
,
two side effects emerge:

  • All strings with at least one f-string portion are concatenated into a
    single literal which must be run through String.format() wholesale,
    and:
  • Concatenation of a raw string with interpolation characters with an
    f-string will cause IndexError/KeyError, which is both different
    from CPython and different from the corner case mentioned in the PEP
    (which gave an example of the following:)
x = 10
y = 'hi'
assert ('a' 'b' f'{x}' '{c}' f'str<{y:^4}>' 'd' 'e') == 'ab10{c}str< hi >de'

The above-linked commit detailed a pretty solid case for leaving string
concatenation in the lexer rather than putting it in the parser, and
undoing that decision would likely be disproportionately costly on
resources for the sake of a probably-low-impact corner case. An
alternative to become complaint with this corner case of the PEP would
be to revert to string concatenation in the parser only when an
f-string is part of concatenation
, though I've done no investigation on
the difficulty or costs of doing this.

A decent set of tests is included. I've manually tested this on the
unix port on Linux and on a Feather M4 Express (atmel-samd) and
things seem sane.

@jimmo jimmo force-pushed the topic-pep-498-fstrings-upstream-fixups branch from c398ff9 to bc03a1b Compare July 16, 2020 03:36
@jimmo
Copy link
Member Author

jimmo commented Jul 16, 2020

This is currently +432 bytes on PYBV11.

@jimmo jimmo force-pushed the topic-pep-498-fstrings-upstream-fixups branch 3 times, most recently from 4786dc5 to 12e5f38 Compare July 16, 2020 04:50
klardotsh and others added 2 commits July 16, 2020 15:12
This implements (most of) the PEP-498 spec for f-strings, with two
exceptions:

- raw f-strings (`fr` or `rf` prefixes) raise `NotImplementedError`
- one special corner case does not function as specified in the PEP
(more on that in a moment)

This is implemented in the core as a syntax translation, brute-forcing
all f-strings to run through `String.format`. For example, the statement
`x='world'; print(f'hello {x}')` gets translated *at a syntax level*
(injected into the lexer) to `x='world'; print('hello {}'.format(x))`.
While this may lead to weird column results in tracebacks, it seemed
like the fastest, most efficient, and *likely* most RAM-friendly option,
despite being implemented under the hood with a completely separate
`vstr_t`.

Since [string concatenation of adjacent literals is implemented in the
lexer](micropython@534b7c3),
two side effects emerge:

- All strings with at least one f-string portion are concatenated into a
single literal which *must* be run through `String.format()` wholesale,
and:
- Concatenation of a raw string with interpolation characters with an
f-string will cause `IndexError`/`KeyError`, which is both different
from CPython *and* different from the corner case mentioned in the PEP
(which gave an example of the following:)

```python
x = 10
y = 'hi'
assert ('a' 'b' f'{x}' '{c}' f'str<{y:^4}>' 'd' 'e') == 'ab10{c}str< hi >de'
```

The above-linked commit detailed a pretty solid case for leaving string
concatenation in the lexer rather than putting it in the parser, and
undoing that decision would likely be disproportionately costly on
resources for the sake of a probably-low-impact corner case. An
alternative to become complaint with this corner case of the PEP would
be to revert to string concatenation in the parser *only when an
f-string is part of concatenation*, though I've done no investigation on
the difficulty or costs of doing this.

A decent set of tests is included. I've manually tested this on the
`unix` port on Linux and on a Feather M4 Express (`atmel-samd`) and
things seem sane.
@jimmo jimmo force-pushed the topic-pep-498-fstrings-upstream-fixups branch from 12e5f38 to 6a9bccc Compare July 16, 2020 05:13
@stinos
Copy link
Contributor

stinos commented Sep 3, 2020

ping :)

@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label Sep 3, 2020
@dpgeorge
Copy link
Member

dpgeorge commented Sep 3, 2020

This rebases cleanly on v1.13 and has the following code-size changes:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:  +960 +0.190% 
unix nanbox:  +720 +0.161% 
      stm32:  +432 +0.112% PYBV10
     cc3200:    +0 +0.000% 
    esp8266:    +4 +0.001% GENERIC
      esp32:  +524 +0.038% GENERIC[incl +48(data)]
        nrf:    +0 +0.000% pca10040
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS

So the feature can be completely disabled, which is good.

@jimmo what was the status of this, were there somethings to improve/test?

@jimmo
Copy link
Member Author

jimmo commented Sep 4, 2020

I started writing more comprehensive tests and started to doubt whether this is the right approach. f-strings are complicated! That said, I don't have a better idea!

I don't think we need to aim for a perfect implementation (which I think is just about impossible to do in a sufficiently "micro" way), I wanted to characterise exactly what we do and don't support, and ensure that we do something sensible for the unsupported cases. Right now the lexer gets a bit confused by nested braces. (You can put f-strings inside f-strings).

I'll dig up my notes and add more details soon.

@dpgeorge
Copy link
Member

dpgeorge commented Sep 4, 2020

Ok, thanks. I'm happy to support f-strings in just a simple way to start with, with known limitations. And "simple" may anyway end up being enough.

@jimmo jimmo closed this Aug 12, 2021
Wind-stormger pushed a commit to BPI-STEAM/micropython that referenced this pull request Oct 13, 2022
Add supervisor.set_usb_identification(manufacturer, product, vid, pid) function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants