Skip to content

Conversation

holymonson
Copy link
Contributor

@holymonson holymonson commented Aug 27, 2018

@holymonson
Copy link
Contributor Author

this will fix google/yapf#607

def test_non_ascii_identifiers(self):
self.validate("Örter = 'places'\ngrün = 'green'")


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a good idea to add a CJK-specific test (or non-Latin-1), such as

蟒 = 3
錦蛇 = 1

See also
https://github.com/python/cpython/blob/master/Lib/test/test_unicode_identifiers.py

@holymonson holymonson force-pushed the non_ascii_identifiers branch from 1b3072b to 50f189e Compare August 27, 2018 12:51
@@ -56,7 +56,7 @@ def _combinations(*l):
Whitespace = r'[ \f\t]*'
Comment = r'#[^\r\n]*'
Ignore = Whitespace + any(r'\\\r?\n' + Whitespace) + maybe(Comment)
Name = r'[a-zA-Z_]\w*'
Name = r'\w(?<!\d)\w*'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lib/tokenize.py appears to parse numbers before identifiers to avoid having a look-behind assertion here. Can we take that approach here, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjaminp copied codes in Lib/tokenize.py.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Lib/tokenize.py, I see:

Name = r'\w+'

@holymonson holymonson force-pushed the non_ascii_identifiers branch from 50f189e to 37d8770 Compare September 15, 2018 02:53
@benjaminp benjaminp merged commit 10a428b into python:master Sep 15, 2018
@miss-islington
Copy link
Contributor

Thanks @holymonson for the PR, and @benjaminp for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 15, 2018
…-8950)

(cherry picked from commit 10a428b)

Co-authored-by: Monson Shao <holymonson@gmail.com>
@bedevere-bot
Copy link

GH-9333 is a backport of this pull request to the 3.7 branch.

@ambv
Copy link
Contributor

ambv commented Sep 15, 2018

@benjaminp What we actually want is to merge the two pure Python tokenizers. This pull request makes this harder.

@ambv
Copy link
Contributor

ambv commented Sep 15, 2018

See BPO-33338.

@benjaminp
Copy link
Contributor

benjaminp commented Sep 15, 2018 via email

@ambv
Copy link
Contributor

ambv commented Sep 15, 2018

I'm thinking. My change will only affect Python 3.8 so the backport PR (GH-9333) does make life of YAPF users better in the interim. I'll revert on master only when I rebase my tokenizer merge pull request.

@ambv
Copy link
Contributor

ambv commented Sep 15, 2018

IOW, let's leave it for now.

miss-islington added a commit that referenced this pull request Sep 15, 2018
(cherry picked from commit 10a428b)

Co-authored-by: Monson Shao <holymonson@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants