Surprising tokenization of f-strings #135251

nedbat · 2025-06-08T09:58:35Z

Bug report

Bug description:

Tokenizing an f-string with double braces produces tokens with single braces:

import tokenize, token

TEXT = b"f'{hello:.23f} this: {{braces}} done'"
f = iter([TEXT]).__next__

for ty, st, _, _, _ in tokenize.tokenize(f):
    print(f"{token.tok_name[ty]}, {st!r}")

Running this with 3.12 shows:

ENCODING, 'utf-8'
FSTRING_START, "f'"
OP, '{'
NAME, 'hello'
OP, ':'
FSTRING_MIDDLE, '.23f'
OP, '}'
FSTRING_MIDDLE, ' this: {'
FSTRING_MIDDLE, 'braces}'
FSTRING_MIDDLE, ' done'
FSTRING_END, "'"
NEWLINE, ''
ENDMARKER, ''

Should the FSTRING_MIDDLE tokens have single braces? Will it stay this way? Are they guaranteed to be split at the braces as shown here, or might they become one FSTRING_MIDDLE token ' this: {braces} done'? To recreate the original source, is it safe to always double the braces found in an FSTRING_MIDDLE token, or are there edge cases I haven't thought of?

Related to nedbat/coveragepy#1980

CPython versions tested on:

3.12, 3.13, 3.14, CPython main branch

Operating systems tested on:

No response

The text was updated successfully, but these errors were encountered:

ericvsmith · 2025-06-09T15:54:05Z

As far as the "splitting at the braces thing" goes: I would not rely on this not changing. Back when I wrote the original f-string tokenizer, this was just an optimization so I could play C games with pointers to null-terminated strings. I'd temporarily change "this: {{braces}} done" to be these strings, in turn:

"this: {\0"
"braces}\0"
" done\0"

After I was done, I'd replace the '\0' with whatever was originally there. Doing it this way, I didn't have to allocate space for a new string without the doubled braces. I can easily image a future where this tradeoff changes, or is only used for strings longer than some temporary fixed size buffer, or something like that.

I don't know if the PEP 701 tokenizer kept this behavior deliberately for compatibility, or if it was just easier for them, too.

For the rest of your questions: @pablogsal

terryjreedy · 2025-06-09T18:13:41Z

Ned, you can sign up in .github/CODEOWNERS to be notified (emailed) when a PR is submitted that changes particular files (at least those that are tracked).

pablogsal · 2025-06-09T23:02:31Z

We kept this behavior for compatibility. We also had to deal with this in the untokenizer as well:

cpython/Lib/tokenize.py

Lines 255 to 260 in a58026a

    
           if '{' in token or '}' in token: 
        
               token = self.escape_brackets(token) 
        
               last_line = token.splitlines()[-1] 
        
               end_line, end_col = end 
        
               extra_chars = last_line.count("{{") + last_line.count("}}") 
        
               end = (end_line, end_col + extra_chars)

We could explore changing this if everyone agrees as this also was kind of a problem in the REPL:

cpython/Lib/_pyrepl/utils.py

Lines 45 to 49 in a58026a

    
           if (token.type in {T.FSTRING_MIDDLE, T.TSTRING_MIDDLE} 
        
               and token.string.endswith(("{", "}"))): 
        
               # gh-134158: a visible trailing brace comes from a double brace in input 
        
               end_offset += 1

On the other hand the change would be backwards incompatible....so I am not sure what's the best thing to do here

nedbat · 2025-06-09T23:04:49Z

Thanks for all the details. I've adjusted coverage.py for the tokenization as it is now, and don't depend on the tokens breaking at the braces. So no need for change on my behalf. If you do make a change, my tests should alert me!

pablogsal · 2025-06-09T23:21:21Z

are you ok if we close this issue? What are your thoughts @ericvsmith ?

nedbat added the type-bug An unexpected behavior, bug, or error label Jun 8, 2025

picnixz added interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-parser labels Jun 8, 2025

pablogsal closed this as completed Jun 10, 2025

terryjreedy closed this as not planned Won't fix, can't repro, duplicate, stale Jun 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Surprising tokenization of f-strings #135251

Surprising tokenization of f-strings #135251

nedbat commented Jun 8, 2025 •

edited

Loading

ericvsmith commented Jun 9, 2025

Uh oh!

terryjreedy commented Jun 9, 2025

Uh oh!

pablogsal commented Jun 9, 2025 •

edited

Loading

Uh oh!

nedbat commented Jun 9, 2025

Uh oh!

pablogsal commented Jun 9, 2025

Uh oh!

Uh oh!

Surprising tokenization of f-strings #135251

Surprising tokenization of f-strings #135251

Comments

nedbat commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

ericvsmith commented Jun 9, 2025

Uh oh!

terryjreedy commented Jun 9, 2025

Uh oh!

pablogsal commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nedbat commented Jun 9, 2025

Uh oh!

pablogsal commented Jun 9, 2025

Uh oh!

nedbat commented Jun 8, 2025 •

edited

Loading

pablogsal commented Jun 9, 2025 •

edited

Loading