Skip to content

Commit 01b6064

Browse files
committed
Update grammar.rst and compiler.rst to describe the PEG parser
1 parent 907ebaf commit 01b6064

File tree

2 files changed

+30
-73
lines changed

2 files changed

+30
-73
lines changed

compiler.rst

+12-47
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ Abstract
1010

1111
In CPython, the compilation from source code to bytecode involves several steps:
1212

13-
1. Parse source code into a parse tree (:file:`Parser/pgen.c`)
14-
2. Transform parse tree into an Abstract Syntax Tree (:file:`Python/ast.c`)
13+
1. Tokenize the source code (:file:`Parser/tokenizer.c`)
14+
2. Parse the stream of tokens into an Abstract Syntax Tree (:file:`Parser/parser.c`)
1515
3. Transform AST into a Control Flow Graph (:file:`Python/compile.c`)
1616
4. Emit bytecode based on the Control Flow Graph (:file:`Python/compile.c`)
1717

@@ -23,49 +23,18 @@ in terms of the how the entire system works. You will most likely need
2323
to read some source to have an exact understanding of all details.
2424

2525

26-
Parse Trees
27-
-----------
26+
Parsing
27+
-------
2828

29-
Python's parser is an LL(1) parser mostly based off of the
30-
implementation laid out in the Dragon Book [Aho86]_.
29+
As of Python 3.9, Python's parser is a PEG parser of a somewhat
30+
unusual design (since its input is a stream of tokens rather than a
31+
stream of characters as is more common with PEG parsers).
3132

32-
The grammar file for Python can be found in :file:`Grammar/Grammar` with the
33-
numeric value of grammar rules stored in :file:`Include/graminit.h`. The
34-
list of types of tokens (literal tokens, such as ``:``, numbers, etc.) can
35-
be found in :file:`Grammar/Tokens` with the numeric value stored in
36-
:file:`Include/token.h`. The parse tree is made up
37-
of ``node *`` structs (as defined in :file:`Include/node.h`).
38-
39-
Querying data from the node structs can be done with the following
40-
macros (which are all defined in :file:`Include/node.h`):
41-
42-
``CHILD(node *, int)``
43-
Returns the nth child of the node using zero-offset indexing
44-
``RCHILD(node *, int)``
45-
Returns the nth child of the node from the right side; use
46-
negative numbers!
47-
``NCH(node *)``
48-
Number of children the node has
49-
``STR(node *)``
50-
String representation of the node; e.g., will return ``:`` for a
51-
``COLON`` token
52-
``TYPE(node *)``
53-
The type of node as specified in :file:`Include/graminit.h`
54-
``REQ(node *, TYPE)``
55-
Assert that the node is the type that is expected
56-
``LINENO(node *)``
57-
Retrieve the line number of the source code that led to the
58-
creation of the parse rule; defined in :file:`Python/ast.c`
59-
60-
For example, consider the rule for 'while':
61-
62-
.. productionlist::
63-
while_stmt: "while" `expression` ":" `suite` : ["else" ":" `suite`]
64-
65-
The node representing this will have ``TYPE(node) == while_stmt`` and
66-
the number of children can be 4 or 7 depending on whether there is an
67-
'else' statement. ``REQ(CHILD(node, 2), COLON)`` can be used to access
68-
what should be the first ``:`` and require it be an actual ``:`` token.
33+
The grammar file for Python can be found in
34+
:file:`Grammar/python.gram`. The numeric values for literal tokens
35+
(such as ``:``, numbers, etc.) can be found in :file:`Grammar/Tokens`.
36+
Various C files, including :file:`Parser/parser.c` are generated from
37+
these (see :doc:`grammar`).
6938

7039

7140
Abstract Syntax Trees (AST)
@@ -569,10 +538,6 @@ thanks to having to support both classic and new-style classes.
569538
References
570539
----------
571540

572-
.. [Aho86] Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman.
573-
`Compilers: Principles, Techniques, and Tools`,
574-
https://www.amazon.com/exec/obidos/tg/detail/-/0201100886/104-0162389-6419108
575-
576541
.. [Wang97] Daniel C. Wang, Andrew W. Appel, Jeff L. Korn, and Chris
577542
S. Serra. `The Zephyr Abstract Syntax Description Language.`_
578543
In Proceedings of the Conference on Domain-Specific Languages, pp.

grammar.rst

+18-26
Original file line numberDiff line numberDiff line change
@@ -7,52 +7,44 @@ Abstract
77
--------
88

99
There's more to changing Python's grammar than editing
10-
:file:`Grammar/Grammar`. This document aims to be a
11-
checklist of places that must also be fixed.
10+
:file:`Grammar/python.gram`. Here's a checklist.
1211

13-
It is probably incomplete. If you see omissions, submit a bug or patch.
14-
15-
This document is not intended to be an instruction manual on Python
16-
grammar hacking, for several reasons.
17-
18-
19-
Rationale
20-
---------
21-
22-
People are getting this wrong all the time; it took well over a
23-
year before someone `noticed <https://bugs.python.org/issue676521>`_
24-
that adding the floor division
25-
operator (``//``) broke the :mod:`parser` module.
12+
NOTE: These instructions are for Python 3.9 and beyond. Earlier
13+
versions use a different parser technology. You probably shouldn't
14+
try to change the grammar of earlier Python versions, but if you
15+
really want to, use GitHub to track down the earlier version of this
16+
file in the devguide. (Python 3.9 itself actually supports both
17+
parsers; the old parser can be invoked by passing ``-X oldparser``.)
2618

2719

2820
Checklist
2921
---------
3022

3123
Note: sometimes things mysteriously don't work. Before giving up, try ``make clean``.
3224

33-
* :file:`Grammar/Grammar`: OK, you'd probably worked this one out. :-) After changing
34-
it, run ``make regen-grammar``, to regenerate :file:`Include/graminit.h` and
35-
:file:`Python/graminit.c`. (This runs Python's parser generator, ``Python/pgen``).
25+
* :file:`Grammar/python.gram`: The grammar, with actions that build AST nodes. After changing
26+
it, run ``make regen-pegen``, to regenerate :file:`Parser/parser.c`.
27+
(This runs Python's parser generator, ``Tools/peg_generator``).
3628

3729
* :file:`Grammar/Tokens` is a place for adding new token types. After
3830
changing it, run ``make regen-token`` to regenerate :file:`Include/token.h`,
3931
:file:`Parser/token.c`, :file:`Lib/token.py` and
40-
:file:`Doc/library/token-list.inc`. If you change both ``Grammar`` and ``Tokens``,
41-
run ``make regen-tokens`` before ``make regen-grammar``.
32+
:file:`Doc/library/token-list.inc`. If you change both ``python.gram`` and ``Tokens``,
33+
run ``make regen-token`` before ``make regen-pegen``.
4234

43-
* :file:`Parser/Python.asdl` may need changes to match the Grammar. Then run ``make
35+
* :file:`Parser/Python.asdl` may need changes to match the grammar. Then run ``make
4436
regen-ast`` to regenerate :file:`Include/Python-ast.h` and :file:`Python/Python-ast.c`.
4537

4638
* :file:`Parser/tokenizer.c` contains the tokenization code. This is where you would
4739
add a new type of comment or string literal, for example.
4840

49-
* :file:`Python/ast.c` will need changes to create the AST objects involved with the
50-
Grammar change.
41+
* :file:`Python/ast.c` will need changes to validate AST objects involved with the
42+
grammar change.
5143

52-
* The :doc:`compiler` has its own page.
44+
* :file:`Python/ast_unparse.c` will need changes to unparse AST objects involved with the
45+
grammar change ("unparsing" is used to turn annotations into strings per :pep:`563`).
5346

54-
* The :mod:`parser` module. Add some of your new syntax to ``test_parser``,
55-
bang on :file:`Modules/parsermodule.c` until it passes.
47+
* The :doc:`compiler` has its own page.
5648

5749
* Add some usage of your new syntax to ``test_grammar.py``.
5850

0 commit comments

Comments
 (0)