Surprising tokenization of f-strings #135251

nedbat · 2025-06-08T09:58:35Z

Bug report

Bug description:

Tokenizing an f-string with double braces produces tokens with single braces:

import tokenize, token

TEXT = b"f'{hello:.23f} this: {{braces}} done'"
f = iter([TEXT]).__next__

for ty, st, _, _, _ in tokenize.tokenize(f):
    print(f"{token.tok_name[ty]}, {st!r}")

Running this with 3.12 shows:

ENCODING, 'utf-8'
FSTRING_START, "f'"
OP, '{'
NAME, 'hello'
OP, ':'
FSTRING_MIDDLE, '.23f'
OP, '}'
FSTRING_MIDDLE, ' this: {'
FSTRING_MIDDLE, 'braces}'
FSTRING_MIDDLE, ' done'
FSTRING_END, "'"
NEWLINE, ''
ENDMARKER, ''

Should the FSTRING_MIDDLE tokens have single braces? Will it stay this way? Are they guaranteed to be split at the braces as shown here, or might they become one FSTRING_MIDDLE token ' this: {braces} done'? To recreate the original source, is it safe to always double the braces found in an FSTRING_MIDDLE token, or are there edge cases I haven't thought of?

Related to nedbat/coveragepy#1980

CPython versions tested on:

3.12, 3.13, 3.14, CPython main branch

Operating systems tested on:

No response

The text was updated successfully, but these errors were encountered:

ericvsmith · 2025-06-09T15:54:05Z

As far as the "splitting at the braces thing" goes: I would not rely on this not changing. Back when I wrote the original f-string tokenizer, this was just an optimization so I could play C games with pointers to null-terminated strings. I'd temporarily change "this: {{braces}} done" to be these strings, in turn:

"this: {\0"
"braces}\0"
" done\0"

After I was done, I'd replace the '\0' with whatever was originally there. Doing it this way, I didn't have to allocate space for a new string without the doubled braces. I can easily image a future where this tradeoff changes, or is only used for strings longer than some temporary fixed size buffer, or something like that.

I don't know if the PEP 701 tokenizer kept this behavior deliberately for compatibility, or if it was just easier for them, too.

For the rest of your questions: @pablogsal

nedbat added the type-bug An unexpected behavior, bug, or error label Jun 8, 2025

picnixz added interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-parser labels Jun 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Surprising tokenization of f-strings #135251

Surprising tokenization of f-strings #135251

nedbat commented Jun 8, 2025 •

edited

Loading

ericvsmith commented Jun 9, 2025

Uh oh!

Uh oh!

Surprising tokenization of f-strings #135251

Surprising tokenization of f-strings #135251

Comments

nedbat commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

ericvsmith commented Jun 9, 2025

Uh oh!

nedbat commented Jun 8, 2025 •

edited

Loading