bpo-34515: lib2to3: support non-ASCII identifiers #8950

holymonson · 2018-08-27T06:54:17Z

https://bugs.python.org/issue34515

holymonson · 2018-08-27T07:18:18Z

kamahen · 2018-08-27T11:43:40Z

Lib/lib2to3/tests/test_parser.py

+    def test_non_ascii_identifiers(self):
+        self.validate("Örter = 'places'\ngrün = 'green'")
+
+


Probably a good idea to add a CJK-specific test (or non-Latin-1), such as

蟒 = 3
錦蛇 = 1

See also
https://github.com/python/cpython/blob/master/Lib/test/test_unicode_identifiers.py

benjaminp · 2018-09-10T18:51:30Z

Lib/lib2to3/pgen2/tokenize.py

@@ -56,7 +56,7 @@ def _combinations(*l):
 Whitespace = r'[ \f\t]*'
 Comment = r'#[^\r\n]*'
 Ignore = Whitespace + any(r'\\\r?\n' + Whitespace) + maybe(Comment)
-Name = r'[a-zA-Z_]\w*'
+Name = r'\w(?<!\d)\w*'


Lib/tokenize.py appears to parse numbers before identifiers to avoid having a look-behind assertion here. Can we take that approach here, too?

@benjaminp copied codes in Lib/tokenize.py.

In Lib/tokenize.py, I see:

Name = r'\w+'

miss-islington · 2018-09-15T17:32:32Z

Thanks @holymonson for the PR, and @benjaminp for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒⛏🤖

…-8950) (cherry picked from commit 10a428b) Co-authored-by: Monson Shao <holymonson@gmail.com>

bedevere-bot · 2018-09-15T17:32:49Z

GH-9333 is a backport of this pull request to the 3.7 branch.

ambv · 2018-09-15T17:36:35Z

@benjaminp What we actually want is to merge the two pure Python tokenizers. This pull request makes this harder.

ambv · 2018-09-15T17:37:32Z

See BPO-33338.

benjaminp · 2018-09-15T17:40:34Z

That does seem like a better solution. Do you want me to revert this?

ambv · 2018-09-15T17:44:49Z

I'm thinking. My change will only affect Python 3.8 so the backport PR (GH-9333) does make life of YAPF users better in the interim. I'll revert on master only when I rebase my tokenizer merge pull request.

ambv · 2018-09-15T17:45:01Z

IOW, let's leave it for now.

(cherry picked from commit 10a428b) Co-authored-by: Monson Shao <holymonson@gmail.com>

the-knights-who-say-ni added the CLA signed label Aug 27, 2018

bedevere-bot added the awaiting review label Aug 27, 2018

holymonson mentioned this pull request Aug 27, 2018

PEP 3131 -- Supporting Non-ASCII Identifiers google/yapf#607

Closed

kamahen reviewed Aug 27, 2018

View reviewed changes

holymonson force-pushed the non_ascii_identifiers branch from 1b3072b to 50f189e Compare August 27, 2018 12:51

benjaminp reviewed Sep 10, 2018

View reviewed changes

holymonson added 4 commits September 15, 2018 10:33

lib2to3: support non-ASCII identifiers

6cf1258

add news

0774738

add more test

fb3a039

imitate tokenize.py

37d8770

holymonson force-pushed the non_ascii_identifiers branch from 50f189e to 37d8770 Compare September 15, 2018 02:53

benjaminp added the needs backport to 3.7 label Sep 15, 2018

benjaminp merged commit 10a428b into python:master Sep 15, 2018

bedevere-bot removed the awaiting review label Sep 15, 2018

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 15, 2018

closes bpo-34515: Support non-ASCII identifiers in lib2to3. (pythonGH…

a9f58c2

…-8950) (cherry picked from commit 10a428b) Co-authored-by: Monson Shao <holymonson@gmail.com>

bedevere-bot removed the needs backport to 3.7 label Sep 15, 2018

miss-islington added a commit that referenced this pull request Sep 15, 2018

closes bpo-34515: Support non-ASCII identifiers in lib2to3. (GH-8950)

51dbae8

(cherry picked from commit 10a428b) Co-authored-by: Monson Shao <holymonson@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bpo-34515: lib2to3: support non-ASCII identifiers #8950

bpo-34515: lib2to3: support non-ASCII identifiers #8950

Uh oh!

holymonson commented Aug 27, 2018 •

edited by bedevere-bot

Loading

Uh oh!

holymonson commented Aug 27, 2018

Uh oh!

kamahen Aug 27, 2018

Uh oh!

benjaminp Sep 10, 2018

Uh oh!

holymonson Sep 15, 2018

Uh oh!

benjaminp Sep 15, 2018

Uh oh!

miss-islington commented Sep 15, 2018

Uh oh!

bedevere-bot commented Sep 15, 2018

Uh oh!

ambv commented Sep 15, 2018

Uh oh!

ambv commented Sep 15, 2018

Uh oh!

benjaminp commented Sep 15, 2018 via email

Uh oh!

ambv commented Sep 15, 2018

Uh oh!

ambv commented Sep 15, 2018

Uh oh!

Uh oh!

		def test_non_ascii_identifiers(self):
		self.validate("Örter = 'places'\ngrün = 'green'")

Uh oh!

bpo-34515: lib2to3: support non-ASCII identifiers #8950

bpo-34515: lib2to3: support non-ASCII identifiers #8950

Uh oh!

Conversation

holymonson commented Aug 27, 2018 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

holymonson commented Aug 27, 2018

Uh oh!

kamahen Aug 27, 2018

Choose a reason for hiding this comment

Uh oh!

benjaminp Sep 10, 2018

Choose a reason for hiding this comment

Uh oh!

holymonson Sep 15, 2018

Choose a reason for hiding this comment

Uh oh!

benjaminp Sep 15, 2018

Choose a reason for hiding this comment

Uh oh!

miss-islington commented Sep 15, 2018

Uh oh!

bedevere-bot commented Sep 15, 2018

Uh oh!

ambv commented Sep 15, 2018

Uh oh!

ambv commented Sep 15, 2018

Uh oh!

benjaminp commented Sep 15, 2018 via email

Uh oh!

ambv commented Sep 15, 2018

Uh oh!

ambv commented Sep 15, 2018

Uh oh!

Uh oh!

holymonson commented Aug 27, 2018 •

edited by bedevere-bot

Loading