gh-131507: Add support for syntax highlighting in PyREPL #133247

ambv · 2025-05-01T11:02:32Z

This is a much improved version of gh-131562. It uses the tokenizer for better speed and pattern matching for more robust handling of soft keywords. While it still won't hit all cases correctly, it's better than idlelib's regular expression-based colorizer and unlike our glorious PEG parser it supports incomplete input, which is crucial for an interactive shell.

Relatedly, pasting support was tweaked to be way faster. Now the entire contents of Frankenstein can be pasted within 3 seconds both on Unix and Windows as long as bracketed pasting is supported by the terminal. This is a necessary tweak for syntax highlighting not to cripple performance of pastes above 5kB.

There is experimental support for theming through _colorize.set_theme() that's mentioned in "What's New" but otherwise undocumented so far.

Issue: Syntax highlighting in PyREPL #131507

📚 Documentation preview 📚: https://cpython-previews--133247.org.readthedocs.build/

…ted paste

bedevere-bot · 2025-05-01T11:31:56Z

🤖 New build scheduled with the buildbot fleet by @ambv for commit ffebbbe 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F133247%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

tomasr8

Sorry for being too nitpicky, but I still find the blue used for keywords to be too dark when coupled with dark-themed terminals (e.g. #131562 (comment)). Do you think we could use a more contrasting color?

Lib/_pyrepl/utils.py

ambv · 2025-05-01T14:20:33Z

I disagree about the blue color, the fact that Ubuntu colors it too dim is not actionable for the Python project. Adjust your terminal.

ambv · 2025-05-01T15:19:46Z

The refleak failures are unrelated, see #133258.

chris-eibl · 2025-05-02T06:49:17Z

Having a closer look, I see you are only reading in chunks in case of getpending, so my PR might have added value. Let's merge this first and then I can rebase my PR if it is worthwile to do chunked reading in the regular case, too?

ambv · 2025-05-02T08:36:57Z

@chris-eibl I simplified the Unix case to remove the additional buffering since it complicates the codebase. Your PR was adding that same buffering to Windows. Given that we now achieve the same performance without double-buffering, I'd say we don't need the other PR.

Lib/_colorize.py

Doc/whatsnew/3.14.rst

Lib/_colorize.py

Lib/_pyrepl/commands.py

Lib/_pyrepl/reader.py

Lib/_pyrepl/utils.py

skirpichev · 2025-05-02T09:43:23Z

Sorry for being too nitpicky, but I still find the blue used for keywords to be too dark when coupled with dark-themed terminals

I think that the default theme polishing can be addressed in following pr(s?), as per pr description:

There is experimental support for theming through _colorize.set_theme() that's mentioned in "What's New" but otherwise undocumented so far.

ambv · 2025-05-02T12:42:33Z

So this passed all buildbots save for s390x. The s390x failure is unrelated, Eric is dealing with it: #133265.

Co-authored-by: Victorien <65306057+Viicos@users.noreply.github.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>

chris-eibl · 2025-05-02T14:01:38Z

@chris-eibl I simplified the Unix case to remove the additional buffering since it complicates the codebase. Your PR was adding that same buffering to Windows.

Sorry for only spotting the windows_console.py change - I must be biased :)

Yeah, simple is always better 👍

Given that we now achieve the same performance without double-buffering, I'd say we don't need the other PR.

I can confirm: this PR is now way faster pasting in the virtual terminal mode on Windows 🚀

Lib/_pyrepl/utils.py

pablogsal · 2025-05-02T16:56:36Z

Lib/_pyrepl/commands.py

+        done = "\x1b[201~"
+        data = ""
+        import time
+        start = time.time()


Leftover from testing?

The trace below shows time. I can move the import up.

Then use perf_counter please

Lib/_pyrepl/utils.py

Lib/_pyrepl/reader.py

Lib/_pyrepl/unix_console.py

Lib/_pyrepl/windows_console.py

Lib/_pyrepl/utils.py

pablogsal

I left a bunch of questions and nitpicks but overall this looks fantastic.

I've reviewed the tokenizer-related parts of the implementation and I'm comfortable with the current approach. We have already discussed this offline but for everyone else reading: while the soft keyword detection uses heuristics, this is a reasonable compromise as bringing in a more correct solution would require a full parser run, which would be much heavier and slower for this use case and we don't even have the technology now to do partial input in the PEG parser so whatever we do will be in python and much slower.

I've also run some performance benchmarks and things look really good — the impact on responsiveness is minimal, even with syntax highlighting enabled by default. And on Windows seem to be super fast.

I'm happy to explore further optimizations in the future, such as avoiding repeated tokenization during screen refreshes. But as it stands, this looks solid. Great work!

pablogsal · 2025-05-02T17:39:10Z

The only open question is what to do for setting the theme officially (right now is a "experimental" API) but I think since that affects other parts of the interpreter (such as tracebacks) this is out of scope of this particular PR, so I propose to discuss this separately

…entheses

chris-eibl · 2025-05-02T18:10:32Z

In a legacy Windows console, the prompt is no longer colored, but e.g. a SyntaxError still is.

Also, syntax highlighting is turned off (this maybe is wanted behaviour?)

chris-eibl · 2025-05-02T18:14:22Z

can_colorize is True, but _colorize.theme has empty strings for all values.

ambv · 2025-05-02T18:21:42Z

Good catch, @chris-eibl. I'll fix it forward in a subsequent PR.

chris-eibl · 2025-05-02T18:29:33Z

Calling _colorize.set_theme() right after virtual terminal processing is enabled via

cpython/Lib/_pyrepl/windows_console.py

Lines 150 to 155 in fac41f5

    
           SetConsoleMode( 
        
               OutHandle, 
        
               ENABLE_WRAP_AT_EOL_OUTPUT 
        
               | ENABLE_PROCESSED_OUTPUT 
        
               | ENABLE_VIRTUAL_TERMINAL_PROCESSING, 
        
           )

does fix it for me.

ambv added 15 commits April 29, 2025 18:05

pythongh-131507: Add support for syntax highlighting in PyREPL

e921a80

Add Blurb

fb95911

Fix irrelevant Windows tests

b428513

Replace idlelib.colorizer with a faster solution

2bdcd06

Slurp the entire input buffer before refreshing display during bracke…

8c70c45

…ted paste

Implement getpending() on Windows

4d7ae36

Adapt tests

9585bd6

Support soft keywords (fight fire with fire)

b1f2557

Fix test

01e1129

Remove unnecessary import

20eff49

Add test for prev_next_window

8d3648a

Windows: bracketed pasting of 448692 chars done in 2.38s ✊

656fea3

Remove colors from Windows low-level console tests

dac8961

Fix lint and stuff

7891fa7

Add experimental theming support for syntax highlighting and the prompt

362a21b

ambv requested review from hugovk, pablogsal and lysnikolaou as code owners May 1, 2025 11:02

bedevere-app bot mentioned this pull request May 1, 2025

Syntax highlighting in PyREPL #131507

Open

bedevere-app bot added the awaiting core review label May 1, 2025

Fix lint

ffebbbe

ambv added 🔨 test-with-buildbots Test PR w/ buildbots; report in status section topic-repl Related to the interactive shell labels May 1, 2025

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 1, 2025

tomasr8 reviewed May 1, 2025

View reviewed changes

Lib/_pyrepl/utils.py Outdated Show resolved Hide resolved

Lib/_pyrepl/utils.py Outdated Show resolved Hide resolved

ambv added 2 commits May 1, 2025 19:17

Merge branch 'main' into pyrepl-syntax-highlighting-tokens

388e494

Add t-string support to syntax highlighting

9b60382

Viicos reviewed May 2, 2025

View reviewed changes

Lib/_colorize.py Outdated Show resolved Hide resolved

hugovk reviewed May 2, 2025

View reviewed changes

ambv and others added 3 commits May 2, 2025 14:48

Apply suggestions from code review

f835dba

Co-authored-by: Victorien <65306057+Viicos@users.noreply.github.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>

Add _colorize.ANSIColors.BOLD

9003d05

Remove - and + from first sets matching for match

ff1f92b

chris-eibl mentioned this pull request May 2, 2025

GH-130328: further speedup of pasting in new REPL on Windows by reading in chunks #132889

Closed