Infinite loop or exception when trying to parse empty lines #593

Kylotan · 2025-01-05T13:16:23Z

I have a grammar where I want to be able to parse an empty line as a "null statement", like Python's pass. It's probably not strictly necessary for my language but it will be helpful during development.

Problem is, the two approaches I have tried don't work.

This code below ends up in an infinite loop:

import unittest

class PyParsingTests(unittest.TestCase):
    def test_compound_statements(self):
        import pyparsing as pp
        # No warning is emitted
        pp.enable_all_warnings()
        # doesn't matter whether I remove newline from the set of skippable whitespace characters, or not
        # pp.ParserElement.set_default_whitespace_chars(' \t')

        empty_line = pp.rest_of_line
        null_statement = empty_line
        # Doesn't matter which of the two formulations below I use - same result in each case
        #compound_statement = pp.OneOrMore(null_statement)
        compound_statement = null_statement + null_statement[...]

        # I know this is deprecated, but using here just in case. No RecursiveGrammarException is raised
        #compound_statement.validate()

        # Expected result here - parses 3 'empty_line' elements.
        # Observed result - seems to loop forever
        compound_statement.parse_string("\n\n\n", parse_all=True)

        # Same happens even without the parse_all
        #compound_statement.parse_string("\n\n\n")

        # And with whitespace in each line
        #compound_statement.parse_string(" \n \n \n")


if __name__ == '__main__':
    unittest.main()

I guessed that this is because pp.rest_of_line does not consume the end of line character, meaning the parser would never make progress. This makes sense but I can't imagine the infinite loop is desired.

If I amend the empty_line definition to this: empty_line = pp.rest_of_line + "\n", then I get the following exceptions:

Error
Traceback (most recent call last):
  File "E:\Code\whatever\.venv\Lib\site-packages\pyparsing\core.py", line 846, in _parseNoCache
    loc, tokens = self.parseImpl(instring, pre_loc, do_actions)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Code\whatever\.venv\Lib\site-packages\pyparsing\core.py", line 2492, in parseImpl
    if instring[loc] == self.firstMatchChar:
       ~~~~~~~~^^^^^
IndexError: string index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\Code\whatever\tests\test_random.py", line 22, in test_compound_statements
    compound_statement.parse_string("\n\n\n", parse_all=True)
  File "E:\Code\whatever\.venv\Lib\site-packages\pyparsing\core.py", line 1212, in parse_string
    raise exc.with_traceback(None)
pyparsing.exceptions.ParseException: Expected '\n', found end of text  (at char 3), (line:4, col:1)

I am not sure how I would avoid the second exception (which seems to be complaining that it can't parse a 4th line, even though it only wants "one or more", and the first exception being unhandled before the second is thrown just looks like a bug.

The text was updated successfully, but these errors were encountered:

ptmcg · 2025-03-16T23:17:24Z

Looking at this this weekend. There is an odd behavior in the Python re module (which I use in the Regex class, and rest_of_line is implemented as Regex(".*"):

import re
rest_of_line = re.compile(r".*")

source = "ABC"
for i in range(5):
    print(i, rest_of_line.match(source, pos=i))

prints

0 <re.Match object; span=(0, 3), match='ABC'>
1 <re.Match object; span=(1, 3), match='BC'>
2 <re.Match object; span=(2, 3), match='C'>
3 <re.Match object; span=(3, 3), match=''>
4 <re.Match object; span=(3, 3), match=''>

It seems that I assumed that any attempt at matching any regex beyond the end of the input string would return None, but such is not the case.

I've just about got this fixed in the upcoming version of pyparsing. Until then, you can work around the issue by defining your empty_line as:

        NL = pp.LineEnd().suppress()
        EOT = pp.StringEnd().suppress()
        empty_line = ~EOT + pp.rest_of_line + NL

ptmcg · 2025-03-16T23:18:39Z

And I'm sorry this took so long to get back to you. This took a bit of digging, but I'm glad to find this interesting wrinkle in the Python re module.

ptmcg · 2025-03-18T03:06:11Z

Will be fixed in 3.2.2.

ptmcg · 2025-03-24T12:31:08Z

Released in 3.2.2

ptmcg self-assigned this Mar 17, 2025

ptmcg closed this as completed Mar 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinite loop or exception when trying to parse empty lines #593

Infinite loop or exception when trying to parse empty lines #593

Kylotan commented Jan 5, 2025

ptmcg commented Mar 16, 2025

ptmcg commented Mar 16, 2025

ptmcg commented Mar 18, 2025

ptmcg commented Mar 24, 2025

Infinite loop or exception when trying to parse empty lines #593

Infinite loop or exception when trying to parse empty lines #593

Comments

Kylotan commented Jan 5, 2025

ptmcg commented Mar 16, 2025

ptmcg commented Mar 16, 2025

ptmcg commented Mar 18, 2025

ptmcg commented Mar 24, 2025