@@ -10,8 +10,8 @@ Abstract
10
10
11
11
In CPython, the compilation from source code to bytecode involves several steps:
12
12
13
- 1. Parse source code into a parse tree (:file: `Parser/pgen .c `)
14
- 2. Transform parse tree into an Abstract Syntax Tree (:file: `Python/ast .c `)
13
+ 1. Tokenize the source code (:file: `Parser/tokenizer .c `)
14
+ 2. Parse the stream of tokens into an Abstract Syntax Tree (:file: `Parser/parser .c `)
15
15
3. Transform AST into a Control Flow Graph (:file: `Python/compile.c `)
16
16
4. Emit bytecode based on the Control Flow Graph (:file: `Python/compile.c `)
17
17
@@ -23,49 +23,18 @@ in terms of the how the entire system works. You will most likely need
23
23
to read some source to have an exact understanding of all details.
24
24
25
25
26
- Parse Trees
27
- -----------
26
+ Parsing
27
+ -------
28
28
29
- Python's parser is an LL(1) parser mostly based off of the
30
- implementation laid out in the Dragon Book [Aho86 ]_.
29
+ As of Python 3.9, Python's parser is a PEG parser of a somewhat
30
+ unusual design (since its input is a stream of tokens rather than a
31
+ stream of characters as is more common with PEG parsers).
31
32
32
- The grammar file for Python can be found in :file: `Grammar/Grammar ` with the
33
- numeric value of grammar rules stored in :file: `Include/graminit.h `. The
34
- list of types of tokens (literal tokens, such as ``: ``, numbers, etc.) can
35
- be found in :file: `Grammar/Tokens ` with the numeric value stored in
36
- :file: `Include/token.h `. The parse tree is made up
37
- of ``node * `` structs (as defined in :file: `Include/node.h `).
38
-
39
- Querying data from the node structs can be done with the following
40
- macros (which are all defined in :file: `Include/node.h `):
41
-
42
- ``CHILD(node *, int) ``
43
- Returns the nth child of the node using zero-offset indexing
44
- ``RCHILD(node *, int) ``
45
- Returns the nth child of the node from the right side; use
46
- negative numbers!
47
- ``NCH(node *) ``
48
- Number of children the node has
49
- ``STR(node *) ``
50
- String representation of the node; e.g., will return ``: `` for a
51
- ``COLON `` token
52
- ``TYPE(node *) ``
53
- The type of node as specified in :file: `Include/graminit.h `
54
- ``REQ(node *, TYPE) ``
55
- Assert that the node is the type that is expected
56
- ``LINENO(node *) ``
57
- Retrieve the line number of the source code that led to the
58
- creation of the parse rule; defined in :file: `Python/ast.c `
59
-
60
- For example, consider the rule for 'while':
61
-
62
- .. productionlist ::
63
- while_stmt: "while" `expression ` ":" `suite ` : ["else" ":" `suite `]
64
-
65
- The node representing this will have ``TYPE(node) == while_stmt `` and
66
- the number of children can be 4 or 7 depending on whether there is an
67
- 'else' statement. ``REQ(CHILD(node, 2), COLON) `` can be used to access
68
- what should be the first ``: `` and require it be an actual ``: `` token.
33
+ The grammar file for Python can be found in
34
+ :file: `Grammar/python.gram `. The numeric values for literal tokens
35
+ (such as ``: ``, numbers, etc.) can be found in :file: `Grammar/Tokens `.
36
+ Various C files, including :file: `Parser/parser.c ` are generated from
37
+ these (see :doc: `grammar `).
69
38
70
39
71
40
Abstract Syntax Trees (AST)
@@ -569,10 +538,6 @@ thanks to having to support both classic and new-style classes.
569
538
References
570
539
----------
571
540
572
- .. [Aho86 ] Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman.
573
- `Compilers: Principles, Techniques, and Tools `,
574
- https://www.amazon.com/exec/obidos/tg/detail/-/0201100886/104-0162389-6419108
575
-
576
541
.. [Wang97 ] Daniel C. Wang, Andrew W. Appel, Jeff L. Korn, and Chris
577
542
S. Serra. `The Zephyr Abstract Syntax Description Language. `_
578
543
In Proceedings of the Conference on Domain-Specific Languages, pp.
0 commit comments