-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
Surprising tokenization of f-strings #135251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As far as the "splitting at the braces thing" goes: I would not rely on this not changing. Back when I wrote the original f-string tokenizer, this was just an optimization so I could play C games with pointers to null-terminated strings. I'd temporarily change "this: {{braces}} done" to be these strings, in turn:
After I was done, I'd replace the '\0' with whatever was originally there. Doing it this way, I didn't have to allocate space for a new string without the doubled braces. I can easily image a future where this tradeoff changes, or is only used for strings longer than some temporary fixed size buffer, or something like that. I don't know if the PEP 701 tokenizer kept this behavior deliberately for compatibility, or if it was just easier for them, too. For the rest of your questions: @pablogsal |
Ned, you can sign up in .github/CODEOWNERS to be notified (emailed) when a PR is submitted that changes particular files (at least those that are tracked). |
We kept this behavior for compatibility. We also had to deal with this in the untokenizer as well: Lines 255 to 260 in a58026a
We could explore changing this if everyone agrees as this also was kind of a problem in the REPL: Lines 45 to 49 in a58026a
On the other hand the change would be backwards incompatible....so I am not sure what's the best thing to do here |
Thanks for all the details. I've adjusted coverage.py for the tokenization as it is now, and don't depend on the tokens breaking at the braces. So no need for change on my behalf. If you do make a change, my tests should alert me! |
are you ok if we close this issue? What are your thoughts @ericvsmith ? |
Uh oh!
There was an error while loading. Please reload this page.
Bug report
Bug description:
Tokenizing an f-string with double braces produces tokens with single braces:
Running this with 3.12 shows:
Should the FSTRING_MIDDLE tokens have single braces? Will it stay this way? Are they guaranteed to be split at the braces as shown here, or might they become one FSTRING_MIDDLE token
' this: {braces} done'
? To recreate the original source, is it safe to always double the braces found in an FSTRING_MIDDLE token, or are there edge cases I haven't thought of?Related to nedbat/coveragepy#1980
CPython versions tested on:
3.12, 3.13, 3.14, CPython main branch
Operating systems tested on:
No response
The text was updated successfully, but these errors were encountered: