Skip to content

gh-131507: Add support for syntax highlighting in PyREPL #133247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

ambv
Copy link
Contributor

@ambv ambv commented May 1, 2025

This is a much improved version of gh-131562. It uses the tokenizer for better speed and pattern matching for more robust handling of soft keywords. While it still won't hit all cases correctly, it's better than idlelib's regular expression-based colorizer and unlike our glorious PEG parser it supports incomplete input, which is crucial for an interactive shell.

Relatedly, pasting support was tweaked to be way faster. Now the entire contents of Frankenstein can be pasted within 3 seconds both on Unix and Windows as long as bracketed pasting is supported by the terminal. This is a necessary tweak for syntax highlighting not to cripple performance of pastes above 5kB.

There is experimental support for theming through _colorize.set_theme() that's mentioned in "What's New" but otherwise undocumented so far.


📚 Documentation preview 📚: https://cpython-previews--133247.org.readthedocs.build/

@ambv ambv added 🔨 test-with-buildbots Test PR w/ buildbots; report in status section topic-repl Related to the interactive shell labels May 1, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @ambv for commit ffebbbe 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F133247%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 1, 2025
Copy link
Member

@tomasr8 tomasr8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being too nitpicky, but I still find the blue used for keywords to be too dark when coupled with dark-themed terminals (e.g. #131562 (comment)). Do you think we could use a more contrasting color?

@ambv
Copy link
Contributor Author

ambv commented May 1, 2025

I disagree about the blue color, the fact that Ubuntu colors it too dim is not actionable for the Python project. Adjust your terminal.

@ambv
Copy link
Contributor Author

ambv commented May 1, 2025

The refleak failures are unrelated, see #133258.

@ambv ambv added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 1, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @ambv for commit 9b60382 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F133247%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 1, 2025
@chris-eibl
Copy link
Member

@ambv: your changes in this PR regarding chunked reading in Windows are more or less what I did in #132889, so maybe I should close my PR?

@chris-eibl
Copy link
Member

Having a closer look, I see you are only reading in chunks in case of getpending, so my PR might have added value. Let's merge this first and then I can rebase my PR if it is worthwile to do chunked reading in the regular case, too?

@ambv
Copy link
Contributor Author

ambv commented May 2, 2025

@chris-eibl I simplified the Unix case to remove the additional buffering since it complicates the codebase. Your PR was adding that same buffering to Windows. Given that we now achieve the same performance without double-buffering, I'd say we don't need the other PR.

Comment on lines +12 to +23
type ColorTag = (
Literal["PROMPT"]
| Literal["KEYWORD"]
| Literal["BUILTIN"]
| Literal["COMMENT"]
| Literal["STRING"]
| Literal["NUMBER"]
| Literal["OP"]
| Literal["DEFINITION"]
| Literal["SOFT_KEYWORD"]
| Literal["RESET"]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type ColorTag = (
Literal["PROMPT"]
| Literal["KEYWORD"]
| Literal["BUILTIN"]
| Literal["COMMENT"]
| Literal["STRING"]
| Literal["NUMBER"]
| Literal["OP"]
| Literal["DEFINITION"]
| Literal["SOFT_KEYWORD"]
| Literal["RESET"]
)
type ColorTag = Literal[
"PROMPT",
"KEYWORD",
"BUILTIN",
"COMMENT",
"STRING",
"NUMBER",
"OP",
"DEFINITION",
"SOFT_KEYWORD",
"RESET",
]

Comment on lines +559 to +560
Syntax highlighting in PyREPL
-----------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's list this in the "Summary – release highlights".

"STRING": colors.GREEN,
"NUMBER": colors.YELLOW,
"OP": colors.RESET,
"DEFINITION": colors.BOLD_WHITE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on macOS with dark theme:

image

But I think we should avoid bold white for light theme:

image

Instead, we can add BOLD = "\x1b[1m" to ANSIColors with no specific colour.

Suggested change
"DEFINITION": colors.BOLD_WHITE,
"DEFINITION": colors.BOLD,

Then we get a bold version of whatever colour the user's terminal has set:

image image

self.reader.dirty = True
class perform_bracketed_paste(Command):
def do(self) -> None:
done = "\x1b[201~"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps:

Suggested change
done = "\x1b[201~"
done = "\x1b[201~" # Disable bracketed paste

wrapcount = (sum(char_widths) + prompt_len) // self.console.width
trace("wrapcount = {wrapcount}", wrapcount=wrapcount)
# trace("wrapcount = {wrapcount}", wrapcount=wrapcount)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete? Or useful to keep?

Suggested change
# trace("wrapcount = {wrapcount}", wrapcount=wrapcount)

Comment on lines +113 to +120
if (
msg.startswith("unterminated string literal")
or msg.startswith("unterminated f-string literal")
or msg.startswith("unterminated t-string literal")
or msg.startswith("EOF in multi-line string")
or msg.startswith("unterminated triple-quoted f-string literal")
or msg.startswith("unterminated triple-quoted t-string literal")
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (
msg.startswith("unterminated string literal")
or msg.startswith("unterminated f-string literal")
or msg.startswith("unterminated t-string literal")
or msg.startswith("EOF in multi-line string")
or msg.startswith("unterminated triple-quoted f-string literal")
or msg.startswith("unterminated triple-quoted t-string literal")
):
if msg.startswith(
(
"unterminated string literal",
"unterminated f-string literal",
"unterminated t-string literal",
"EOF in multi-line string",
"unterminated triple-quoted f-string literal",
"unterminated triple-quoted t-string literal",
)
):

Single call with a tuple can be 3x as fast:

python3 -m timeit -s 'msg = "unterminated triple-quoted t-string literal (detected at line 2)"' 'msg.startswith("unterminated string literal") or msg.startswith("unterminated f-string literal") or msg.startswith("unterminated t-string literal") or msg.startswith("EOF in multi-line string") or msg.startswith("unterminated triple-quoted f-string literal") or msg.startswith("unterminated triple-quoted t-string literal")'
5000000 loops, best of 5: 99.8 nsec per looppython3 -m timeit -s 'msg = "unterminated triple-quoted t-string literal (detected at line 2)"' 'msg.startswith(("unterminated string literal", "unterminated f-string literal", "unterminated t-string literal", "EOF in multi-line string", "unterminated triple-quoted f-string literal", "unterminated triple-quoted t-string literal"))'
10000000 loops, best of 5: 33.9 nsec per loop

# the next call to `disp_str()` will revive it.
chars[-1] += _colorize.theme["RESET"]

# trace("disp_str({buffer}) = {s}, {b}", buffer=repr(buffer), s=chars, b=char_widths)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Suggested change
# trace("disp_str({buffer}) = {s}, {b}", buffer=repr(buffer), s=chars, b=char_widths)

@skirpichev
Copy link
Member

Sorry for being too nitpicky, but I still find the blue used for keywords to be too dark when coupled with dark-themed terminals

I think that the default theme polishing can be addressed in following pr(s?), as per pr description:

There is experimental support for theming through _colorize.set_theme() that's mentioned in "What's New" but otherwise undocumented so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review topic-repl Related to the interactive shell
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants