-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-119118: Fix performance regression in tokenize module #119615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Cache line object to avoid creating a Unicode object for all of the tokens in the same line. - Speed up byte offset to column offset conversion by using the smallest buffer possible to measure the difference.
Hummm, seems also that this solution fails with
|
Another idea: we do have the token already as unicode ( |
I'm discussing another fix with @lysnikolaou offline |
Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
The docs failure seems unrelated. @hugovk do we know what may be happening here? |
The latest results are:
The test failures appear to be unrelated. We can probably merge this. |
Thanks @lysnikolaou for the PR, and @pablogsal for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13. |
…nGH-119615) * pythongh-119118: Fix performance regression in tokenize module - Cache line object to avoid creating a Unicode object for all of the tokens in the same line. - Speed up byte offset to column offset conversion by using the smallest buffer possible to measure the difference. (cherry picked from commit d87b015) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
GH-119682 is a backport of this pull request to the 3.13 branch. |
…nGH-119615) * pythongh-119118: Fix performance regression in tokenize module - Cache line object to avoid creating a Unicode object for all of the tokens in the same line. - Speed up byte offset to column offset conversion by using the smallest buffer possible to measure the difference. (cherry picked from commit d87b015) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
GH-119683 is a backport of this pull request to the 3.12 branch. |
…19615) (#119682) - Cache line object to avoid creating a Unicode object for all of the tokens in the same line. - Speed up byte offset to column offset conversion by using the smallest buffer possible to measure the difference. (cherry picked from commit d87b015) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
…19615) (#119683) - Cache line object to avoid creating a Unicode object for all of the tokens in the same line. - Speed up byte offset to column offset conversion by using the smallest buffer possible to measure the difference. (cherry picked from commit d87b015) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
…n#119615) * pythongh-119118: Fix performance regression in tokenize module - Cache line object to avoid creating a Unicode object for all of the tokens in the same line. - Speed up byte offset to column offset conversion by using the smallest buffer possible to measure the difference. Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
…n#119615) * pythongh-119118: Fix performance regression in tokenize module - Cache line object to avoid creating a Unicode object for all of the tokens in the same line. - Speed up byte offset to column offset conversion by using the smallest buffer possible to measure the difference. Co-authored-by: Pablo Galindo <pablogsal@gmail.com>
tokenize.generate_tokens()
performance regression in 3.12 #119118