Skip to content

Infinite loop or exception when trying to parse empty lines #593

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Kylotan opened this issue Jan 5, 2025 · 4 comments
Closed

Infinite loop or exception when trying to parse empty lines #593

Kylotan opened this issue Jan 5, 2025 · 4 comments
Assignees

Comments

@Kylotan
Copy link

Kylotan commented Jan 5, 2025

I have a grammar where I want to be able to parse an empty line as a "null statement", like Python's pass. It's probably not strictly necessary for my language but it will be helpful during development.

Problem is, the two approaches I have tried don't work.

This code below ends up in an infinite loop:

import unittest

class PyParsingTests(unittest.TestCase):
    def test_compound_statements(self):
        import pyparsing as pp
        # No warning is emitted
        pp.enable_all_warnings()
        # doesn't matter whether I remove newline from the set of skippable whitespace characters, or not
        # pp.ParserElement.set_default_whitespace_chars(' \t')

        empty_line = pp.rest_of_line
        null_statement = empty_line
        # Doesn't matter which of the two formulations below I use - same result in each case
        #compound_statement = pp.OneOrMore(null_statement)
        compound_statement = null_statement + null_statement[...]

        # I know this is deprecated, but using here just in case. No RecursiveGrammarException is raised
        #compound_statement.validate()

        # Expected result here - parses 3 'empty_line' elements.
        # Observed result - seems to loop forever
        compound_statement.parse_string("\n\n\n", parse_all=True)

        # Same happens even without the parse_all
        #compound_statement.parse_string("\n\n\n")

        # And with whitespace in each line
        #compound_statement.parse_string(" \n \n \n")


if __name__ == '__main__':
    unittest.main()

I guessed that this is because pp.rest_of_line does not consume the end of line character, meaning the parser would never make progress. This makes sense but I can't imagine the infinite loop is desired.

If I amend the empty_line definition to this: empty_line = pp.rest_of_line + "\n", then I get the following exceptions:

Error
Traceback (most recent call last):
  File "E:\Code\whatever\.venv\Lib\site-packages\pyparsing\core.py", line 846, in _parseNoCache
    loc, tokens = self.parseImpl(instring, pre_loc, do_actions)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Code\whatever\.venv\Lib\site-packages\pyparsing\core.py", line 2492, in parseImpl
    if instring[loc] == self.firstMatchChar:
       ~~~~~~~~^^^^^
IndexError: string index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\Code\whatever\tests\test_random.py", line 22, in test_compound_statements
    compound_statement.parse_string("\n\n\n", parse_all=True)
  File "E:\Code\whatever\.venv\Lib\site-packages\pyparsing\core.py", line 1212, in parse_string
    raise exc.with_traceback(None)
pyparsing.exceptions.ParseException: Expected '\n', found end of text  (at char 3), (line:4, col:1)

I am not sure how I would avoid the second exception (which seems to be complaining that it can't parse a 4th line, even though it only wants "one or more", and the first exception being unhandled before the second is thrown just looks like a bug.

@ptmcg
Copy link
Member

ptmcg commented Mar 16, 2025

Looking at this this weekend. There is an odd behavior in the Python re module (which I use in the Regex class, and rest_of_line is implemented as Regex(".*"):

import re
rest_of_line = re.compile(r".*")

source = "ABC"
for i in range(5):
    print(i, rest_of_line.match(source, pos=i))

prints

0 <re.Match object; span=(0, 3), match='ABC'>
1 <re.Match object; span=(1, 3), match='BC'>
2 <re.Match object; span=(2, 3), match='C'>
3 <re.Match object; span=(3, 3), match=''>
4 <re.Match object; span=(3, 3), match=''>

It seems that I assumed that any attempt at matching any regex beyond the end of the input string would return None, but such is not the case.

I've just about got this fixed in the upcoming version of pyparsing. Until then, you can work around the issue by defining your empty_line as:

        NL = pp.LineEnd().suppress()
        EOT = pp.StringEnd().suppress()
        empty_line = ~EOT + pp.rest_of_line + NL

@ptmcg
Copy link
Member

ptmcg commented Mar 16, 2025

And I'm sorry this took so long to get back to you. This took a bit of digging, but I'm glad to find this interesting wrinkle in the Python re module.

@ptmcg ptmcg self-assigned this Mar 17, 2025
@ptmcg
Copy link
Member

ptmcg commented Mar 18, 2025

Will be fixed in 3.2.2.

@ptmcg
Copy link
Member

ptmcg commented Mar 24, 2025

Released in 3.2.2

@ptmcg ptmcg closed this as completed Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants