Skip to content

SkipTo Erroneously Steps Into Grammar's ignoreExpr #475

@Elkniwt

Description

@Elkniwt

I believe I've found a bug in the way SkipTo increments tmploc as it scans for its expr.

TL;DR: For SkipTo, it is not sufficient to call preParse only once, before the main while tmploc <= instrlen: loop in its parseImpl, since the tmploc += 1 skip could go into a region where a grammar's ignoreExpr matches, which could then contain a match for SkipTo's expr.

For example:

some_grammar = Word(alphanums) + ":=" + SkipTo(';') + ';'
some_grammar.ignore(python_style_comment)
some_grammar.parse_string("""
var1 := 2 # 3; <== this semi-colon will match!
      + 1;
""",parse_all=True)

(The grammar's?) preParse will skip ignoreExprs to get to pre_loc in order to pass it in to SkipTo's parseImpl, but SkipTo has this in parseImpl:

            try:
                self_expr_parse(instring, tmploc, doActions=False, callPreParse=False)
            except (ParseException, IndexError):
                # no match, advance loc in string
                tmploc += 1  **<===== will increment right in to an ignoreExpr!**

SkipTo probably also needs to call a form of preParse in order to increment tmploc and avoid stepping in to a matching ignoreExpr that was set via grammar.ignore(expr).

Even if the same expr is passed to SkipTo via its ignore=expr parameter, the grammar.ignoreExpr will be skipped first, in ignoreExpr.preParse (unless maybe one should set callPreparse to false for SkipTo, and pass in the proper ignoreExprs?). Otherwise, it feels like preParse should be called to increment tmploc at the top of each iteration of SkipTo's main loop.

I don't know the proper fix for this, but I've found that this works:

...
        tmploc = loc
        while tmploc <= instrlen:
            if self.callPreparse:
                tmploc = self.preParse(instring, tmploc) # skip grammar-ignored expressions

            if self_failOn_canParseNext is not None:
...

around line 5308 or so of core.py.

Am I making any sense?

--Jim

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions