Skip to content

gh-102856: Python tokenizer implementation for PEP 701 #104323

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
May 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Doc/library/token-list.inc

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions Doc/library/token.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,13 @@ The following token type values aren't used by the C tokenizer but are needed fo
the :mod:`tokenize` module.

.. data:: COMMENT
:noindex:

Token value used to indicate a comment.


.. data:: NL
:noindex:

Token value used to indicate a non-terminating newline. The
:data:`NEWLINE` token indicates the end of a logical line of Python code;
Expand Down
4 changes: 2 additions & 2 deletions Grammar/Tokens
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,9 @@ SOFT_KEYWORD
FSTRING_START
FSTRING_MIDDLE
FSTRING_END
COMMENT
NL
ERRORTOKEN

# These aren't used by the C tokenizer but are needed for tokenize.py
COMMENT
NL
ENCODING
1 change: 1 addition & 0 deletions Include/internal/pycore_global_objects_fini_generated.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Include/internal/pycore_global_strings.h
Original file line number Diff line number Diff line change
Expand Up @@ -406,6 +406,7 @@ struct _Py_global_strings {
STRUCT_FOR_ID(exception)
STRUCT_FOR_ID(exp)
STRUCT_FOR_ID(extend)
STRUCT_FOR_ID(extra_tokens)
STRUCT_FOR_ID(facility)
STRUCT_FOR_ID(factory)
STRUCT_FOR_ID(false)
Expand Down
1 change: 1 addition & 0 deletions Include/internal/pycore_runtime_init_generated.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion Include/internal/pycore_token.h
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,9 @@ extern "C" {
#define FSTRING_START 61
#define FSTRING_MIDDLE 62
#define FSTRING_END 63
#define ERRORTOKEN 64
#define COMMENT 64
#define NL 65
#define ERRORTOKEN 66
#define N_TOKENS 68
#define NT_OFFSET 256

Expand Down
3 changes: 3 additions & 0 deletions Include/internal/pycore_unicodeobject_generated.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions Lib/inspect.py
Original file line number Diff line number Diff line change
Expand Up @@ -2187,15 +2187,15 @@ def _signature_strip_non_python_syntax(signature):
if string == ',':
current_parameter += 1

if (type == ERRORTOKEN) and (string == '$'):
if (type == OP) and (string == '$'):
assert self_parameter is None
self_parameter = current_parameter
continue

add(string)
if (string == ','):
add(' ')
clean_signature = ''.join(text)
clean_signature = ''.join(text).strip()
return clean_signature, self_parameter


Expand Down
10 changes: 10 additions & 0 deletions Lib/tabnanny.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,10 @@ def check(file):
errprint("%r: Token Error: %s" % (file, msg))
return

except SyntaxError as msg:
errprint("%r: Token Error: %s" % (file, msg))
return

except IndentationError as msg:
errprint("%r: Indentation Error: %s" % (file, msg))
return
Expand Down Expand Up @@ -272,6 +276,12 @@ def format_witnesses(w):
return prefix + " " + ', '.join(firsts)

def process_tokens(tokens):
try:
_process_tokens(tokens)
except TabError as e:
raise NannyNag(e.lineno, e.msg, e.text)

def _process_tokens(tokens):
INDENT = tokenize.INDENT
DEDENT = tokenize.DEDENT
NEWLINE = tokenize.NEWLINE
Expand Down
4 changes: 2 additions & 2 deletions Lib/test/test_tabnanny.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ def test_when_nannynag_error_verbose(self):
with TemporaryPyFile(SOURCE_CODES["nannynag_errored"]) as file_path:
out = f"{file_path!r}: *** Line 3: trouble in tab city! ***\n"
out += "offending line: '\\tprint(\"world\")\\n'\n"
out += "indent not equal e.g. at tab size 1\n"
out += "inconsistent use of tabs and spaces in indentation\n"

tabnanny.verbose = 1
self.verify_tabnanny_check(file_path, out=out)
Expand Down Expand Up @@ -315,7 +315,7 @@ def validate_cmd(self, *args, stdout="", stderr="", partial=False, expect_failur
def test_with_errored_file(self):
"""Should displays error when errored python file is given."""
with TemporaryPyFile(SOURCE_CODES["wrong_indented"]) as file_path:
stderr = f"{file_path!r}: Indentation Error: "
stderr = f"{file_path!r}: Token Error: "
stderr += ('unindent does not match any outer indentation level'
' (<tokenize>, line 3)')
self.validate_cmd(file_path, stderr=stderr, expect_failure=True)
Expand Down
Loading