Include comment text in token #4426

harupy · 2023-01-06T14:30:07Z

Include comment text in the token for a linter like Ruff which checks comments to ignore errors.

DimitrisJim · 2023-01-06T14:54:52Z

You use the comment text in some way in Ruff? IIRC CPython doesn't keep the text around but I could be remembering wrong.

fanninpm · 2023-01-06T15:05:06Z

I'm assuming @charliermarsh wants something like the # noqa: E404 annotations in Flake8. It turns out that Flake8 grabs the physical line and regexes it (which is rather inefficient).

harupy · 2023-01-06T15:09:34Z

@DimitrisJim

CPython doesn't keep the text around but I could be remembering wrong.

Looks like it does:

> cat a.py
# foo
print(1)

> python -m tokenize a.py
0,0-0,0:            ENCODING       'utf-8'        
1,0-1,5:            COMMENT        '# foo'        
1,5-1,6:            NL             '\n'           
2,0-2,5:            NAME           'print'        
2,5-2,6:            OP             '('            
2,6-2,7:            NUMBER         '1'            
2,7-2,8:            OP             ')'            
2,8-2,9:            NEWLINE        ''             
3,0-3,0:            ENDMARKER      ''

harupy · 2023-01-06T15:14:01Z

@DimitrisJim I found this.

https://github.com/python/cpython/blob/15c44789bb125b93e96815a336ec73423c47508e/Parser/tokenizer.c#L1603

    /* Skip comment, unless it's a type comment */
    if (c == '#') {
        const char *prefix, *p, *type_start;
        int current_starting_col_offset;

        while (c != EOF && c != '\n') {
            c = tok_nextc(tok);
        }

        if (tok->type_comments) {
            p = tok->start;
            current_starting_col_offset = tok->starting_col_offset;
            prefix = type_comment_prefix;
            while (*prefix && p < tok->cur) {
                if (*prefix == ' ') {
                    while (*p == ' ' || *p == '\t') {
                        p++;
                        current_starting_col_offset++;
                    }
                } else if (*prefix == *p) {
                    p++;
                    current_starting_col_offset++;
                } else {
                    break;
                }

                prefix++;
            }

It looks like cpython only produces tokens for type comments.

charliermarsh · 2023-01-06T15:22:31Z

Right now, we use the locations on the token and extract the comments from the source code. It works fine. It'd be convenient to have the comment text directly, and likely more performant for Ruff, but if it's a CPython incompatibility and the RustPython team would prefer not to include it, I'd understand that too.

harupy · 2023-01-06T15:30:51Z

@charliermarsh Thanks for the comment!

harupy · 2023-01-06T15:30:59Z

https://github.com/python/cpython/blob/72263f2a20002ceff443e3a231c713f2e14fe3fe/Lib/tokenize.py#L17

It is designed to match the working of the Python tokenizer exactly, except
that it produces COMMENT tokens for comments and gives type OP for all
operators. ...

It's unclear why they made this decision.

charliermarsh · 2023-01-06T15:35:35Z

@harupy - Always grateful for all the work you're doing to improve Ruff and RustPython :)

DimitrisJim · 2023-01-06T15:50:33Z

Right now, we use the locations on the token and extract the comments from the source code.

Oh, that's rough, pun intended. I think its fine if the comment contents are caught since it doesn't change the semantics in any way, its a relatively small change that's easily maintainable and it helps the Ruff team simplify things.

harupy · 2023-01-06T15:52:08Z

@DimitrisJim Awesome! I love the pun.

DimitrisJim

lgtm, let's let @youknowone also give the ok and merge it.

compiler/parser/python.lalrpop

charliermarsh · 2023-01-07T03:05:32Z

Thanks @DimitrisJim, grateful for all the collaboration that's happening between these two projects!

youknowone

This incompatibility is not a big deal. Let's go.

RustPython/RustPython#4426 has been merged. We can simplify code using text in comment tokens.

Include comment text in token

d532160

DimitrisJim approved these changes Jan 6, 2023

View reviewed changes

harupy commented Jan 7, 2023

View reviewed changes

compiler/parser/python.lalrpop Show resolved Hide resolved

youknowone approved these changes Jan 7, 2023

View reviewed changes

youknowone merged commit ddc2e1b into RustPython:main Jan 7, 2023

harupy mentioned this pull request Jan 7, 2023

Use text in comment token astral-sh/ruff#1714

Merged

charliermarsh pushed a commit to astral-sh/ruff that referenced this pull request Jan 7, 2023

Use text in comment token (#1714)

5cdd7cc

RustPython/RustPython#4426 has been merged. We can simplify code using text in comment tokens.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Include comment text in token #4426

Include comment text in token #4426

Uh oh!

harupy commented Jan 6, 2023 •

edited

Loading

Uh oh!

DimitrisJim commented Jan 6, 2023

Uh oh!

fanninpm commented Jan 6, 2023

Uh oh!

harupy commented Jan 6, 2023 •

edited

Loading

Uh oh!

harupy commented Jan 6, 2023 •

edited

Loading

Uh oh!

charliermarsh commented Jan 6, 2023

Uh oh!

harupy commented Jan 6, 2023

Uh oh!

harupy commented Jan 6, 2023 •

edited

Loading

Uh oh!

charliermarsh commented Jan 6, 2023

Uh oh!

DimitrisJim commented Jan 6, 2023

Uh oh!

harupy commented Jan 6, 2023

Uh oh!

DimitrisJim left a comment •

edited

Loading

Uh oh!

Uh oh!

charliermarsh commented Jan 7, 2023

Uh oh!

youknowone left a comment

Uh oh!

Uh oh!

Include comment text in token #4426

Include comment text in token #4426

Uh oh!

Conversation

harupy commented Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DimitrisJim commented Jan 6, 2023

Uh oh!

fanninpm commented Jan 6, 2023

Uh oh!

harupy commented Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harupy commented Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charliermarsh commented Jan 6, 2023

Uh oh!

harupy commented Jan 6, 2023

Uh oh!

harupy commented Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charliermarsh commented Jan 6, 2023

Uh oh!

DimitrisJim commented Jan 6, 2023

Uh oh!

harupy commented Jan 6, 2023

Uh oh!

DimitrisJim left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

charliermarsh commented Jan 7, 2023

Uh oh!

youknowone left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

harupy commented Jan 6, 2023 •

edited

Loading

harupy commented Jan 6, 2023 •

edited

Loading

harupy commented Jan 6, 2023 •

edited

Loading

harupy commented Jan 6, 2023 •

edited

Loading

DimitrisJim left a comment •

edited

Loading