-
-
Notifications
You must be signed in to change notification settings - Fork 32.8k
bpo-34515: lib2to3: support non-ASCII identifiers #8950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
this will fix google/yapf#607 |
Lib/lib2to3/tests/test_parser.py
Outdated
def test_non_ascii_identifiers(self): | ||
self.validate("Örter = 'places'\ngrün = 'green'") | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a good idea to add a CJK-specific test (or non-Latin-1), such as
蟒 = 3
錦蛇 = 1
See also
https://github.com/python/cpython/blob/master/Lib/test/test_unicode_identifiers.py
1b3072b
to
50f189e
Compare
Lib/lib2to3/pgen2/tokenize.py
Outdated
@@ -56,7 +56,7 @@ def _combinations(*l): | |||
Whitespace = r'[ \f\t]*' | |||
Comment = r'#[^\r\n]*' | |||
Ignore = Whitespace + any(r'\\\r?\n' + Whitespace) + maybe(Comment) | |||
Name = r'[a-zA-Z_]\w*' | |||
Name = r'\w(?<!\d)\w*' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lib/tokenize.py
appears to parse numbers before identifiers to avoid having a look-behind assertion here. Can we take that approach here, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benjaminp copied codes in Lib/tokenize.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Lib/tokenize.py
, I see:
Name = r'\w+'
50f189e
to
37d8770
Compare
Thanks @holymonson for the PR, and @benjaminp for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7. |
GH-9333 is a backport of this pull request to the 3.7 branch. |
@benjaminp What we actually want is to merge the two pure Python tokenizers. This pull request makes this harder. |
See BPO-33338. |
That does seem like a better solution. Do you want me to revert this?
|
I'm thinking. My change will only affect Python 3.8 so the backport PR (GH-9333) does make life of YAPF users better in the interim. I'll revert on master only when I rebase my tokenizer merge pull request. |
IOW, let's leave it for now. |
https://bugs.python.org/issue34515