Skip to content

py/parse.c: Implement function keyword argument syntax restriction. #8479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

RayaneCTX
Copy link
Contributor

@RayaneCTX RayaneCTX commented Apr 1, 2022

Synopsis: Implements the function keyword argument syntax restriction added with Python 3.8 (listed under issue #7899).

Details: This is done by adding a switch in the push_result_rule function in parse.c to turn off the parentheses removal optimization for atom_paren rules encountered while parsing an arglist rule (meaning this atom_paren rule is a function argument) and when the enclosed node is a lone ID. Keeping the atom_paren rule in the parse tree allows the compiler to catch the syntax error when the rule happens to be a keyword argument.

In order to determine whether an atom_paren rule is a function argument, a bit is added to the parser structure, which is set in mp_parse when an arglist rule is being parsed (unset once done with it). Other options were considered: hard-coding the distance on the rule stack between an arglist rule and an atom_paren rule in the added switch, using a global boolean variable, using a local boolean variable (by expanding the function signature of the push_result_rule function). Adding a field to the parser structure seemed to provide the best balance between code size increase, code changes, memory usage overhead, code maintainability, etc.

Because the compile_atom_paren function did not expect this rule to be a parent for a lone ID, a switch is added in that function to handle this.

A new test file fun_kwargs_syntax.py is added to make sure the syntax restriction is in effect, and a test is added to fun2.py to make sure extraneous parentheses around ID function arguments does not
cause any issues.

Implemented function keyword argument syntax restriction added with
Python 3.8 (listed under issue micropython#7899). This is done by adding a switch
in the push_result_rule function in parse.c to turn off the parentheses
removal optimization for atom_paren rules encountered while parsing
an arglist rule (meaning this atom_paren rule is a function argument)
and when the enclosed node is a lone ID. Keeping the atom_paren rule
in the parse tree allows the compiler to catch the syntax error when
the rule happens to be a keyword argument.

In order to determine whether an atom_paren rule is a function argument,
a bit is added to the parser structure, which is set in mp_parse when
an arglist rule is being parsed (unset once done with it). Other
options were considered: hard-coding the distance on the rule stack
between an arglist rule and an atom_paren rule in the added switch,
using a global boolean variable, using a local boolean variable (by
expanding the function signature of the push_result_rule function).
Adding a field to the parser structure seemed to provide the best
balance between code size increase, code changes, memory usage
overhead, code maintainability, etc.

Because the compile_atom_paren function did not expect this rule to
be a parent for a lone ID, a switch is added in that function to
handle this.

A new test file fun_kwargs_syntax.py is added to make sure the
syntax restriction is in effect, and a test is added to fun2.py to
make sure extraneous parenthese around ID function arguments does not
cause any issues.

Signed-off-by: Rayane Chatrieux <rayane.chatrieux@gmail.com>
@dpgeorge
Copy link
Member

dpgeorge commented Apr 2, 2022

Thanks for the contribution!

As far as I understand, this is now restricting parenthesis around id's in a keyword argument, like f((a)=2), is that correct? Because things like f(a or b=2) are already a SyntaxError in MicroPython.

@@ -1073,12 +1080,21 @@ mp_parse_tree_t mp_parse(mp_lexer_t *lex, mp_parse_input_kind_t input_kind) {
default: {
assert((rule_act & RULE_ACT_KIND_MASK) == RULE_ACT_LIST);

if (rule_id == RULE_arglist && !MP_PARSE_IS_PARSING_ARGLIST(&parser)) {
parser.cur_view |= MP_PARSE_PARSING_ARGLIST;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of maintaining this bit, did you consider peeking into the rule stack? This information should already be there, you just need to do something like parser->rule_stack[parser->rule_stack_top - 1].rule_id == RULE_arglist.

Copy link
Contributor Author

@RayaneCTX RayaneCTX Apr 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did, and I think the correct code to do this would be parser->rule_stack[parser->rule_stack_top - 15].rule_id == RULE_arglist, where 15 is that distance between the RULE_arglist and RULE_atom_paren that would guarantee that the second is a child of the first. I was able to implement the restriction this way. My concern with doing this is that it creates a tight dependence on the grammar file, i.e. if someone makes a change to it that should seemingly have no effect on this, this could silently break. I propose the bit solution to hopefully add some robustness.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the approach her work with nested function calls, eg f(f(), (a)=2)?

Copy link
Contributor Author

@RayaneCTX RayaneCTX Apr 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. On this input code, after parsing through the first argument f(), the parser will revisit the arglist rule, allowing it to re-assert the bit, before recursing down through the (a) = 2 argument. Added a test a did a quick run just now to confirm.

@RayaneCTX
Copy link
Contributor Author

RayaneCTX commented Apr 2, 2022

Thanks for the contribution!

As far as I understand, this is now restricting parenthesis around id's in a keyword argument, like f((a)=2), is that correct? Because things like f(a or b=2) are already a SyntaxError in MicroPython.

Yes, the intent is to restrict syntax such as f((a)=2) only.

@codecov-commenter
Copy link

Codecov Report

Merging #8479 (3622ed5) into master (665f0e2) will decrease coverage by 0.14%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #8479      +/-   ##
==========================================
- Coverage   98.40%   98.25%   -0.15%     
==========================================
  Files         153      154       +1     
  Lines       20188    20296     +108     
==========================================
+ Hits        19865    19941      +76     
- Misses        323      355      +32     
Impacted Files Coverage Δ
py/parse.h 100.00% <ø> (ø)
py/compile.c 99.94% <100.00%> (-0.01%) ⬇️
py/parse.c 99.16% <100.00%> (ø)
py/bc.c 81.88% <0.00%> (-1.21%) ⬇️
py/persistentcode.c 97.17% <0.00%> (-0.10%) ⬇️
py/showbc.c 98.32% <0.00%> (-0.04%) ⬇️
py/vm.c 98.98% <0.00%> (-0.01%) ⬇️
py/bc.h 100.00% <0.00%> (ø)
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 665f0e2...3622ed5. Read the comment docs.

@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label Apr 10, 2022
tannewt added a commit to tannewt/circuitpython that referenced this pull request Oct 17, 2023
…n-main

Translations update from Hosted Weblate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants