diff --git a/CHANGES b/CHANGES index abb74f38..eca5e61b 100644 --- a/CHANGES +++ b/CHANGES @@ -7,10 +7,92 @@ RELEASE PLANNING NOTES: In the pyparsing release 3.3.0, use of many of the pre-PEP8 methods (such as `ParserElement.parseString`) will start to raise `DeprecationWarnings`. I plan to completely drop the pre-PEP8 methods in pyparsing 4.0, though we won't see that release -until some time in 2025. So there is plenty of time to convert existing parsers to +until some time in 2026. So there is plenty of time to convert existing parsers to the new function names before the old functions are completely removed. (Big help from Devin J. Pohly in structuring the code to enable this peaceful transition.) +=========================================================================================== + The version 3.3.0 release will begin emitting `DeprecationWarnings` for pyparsing methods + that have been renamed to PEP8-compliant names (introduced in pyparsing 3.0.0, in August, + 2021, with legacy names retained as aliases). In preparation, I have added in pyparsing + 3.2.2 a utility for finding and replacing the legacy method names with the new names. + This utility is located at `pyparsing/tools/cvt_pep8_names.py`. This script will scan all + Python files specified on the command line, and if the `-u` option is selected, will + replace all occurrences of the old method names with the new PEP8-compliant names, + updating the files in place. + + Here is an example that converts all the files in the pyparsing `/examples` directory: + + python -m pyparsing.tools.cvt_pyparsing_pep8_names -u examples/*.py + + The new names are compatible with pyparsing versions 3.0.0 and later. +=========================================================================================== + + +Required Python versions by pyparsing version +--------------------------------------------- + ++--------------------------------------------------+-------------------+ +| pyparsing version | Required Python | ++==================================================+===================+ +| 3.2.0 - 3.2.2 | 3.9 or later | +| 3.0.8 - 3.1.4 | 3.6.8 or later | +| 3.0.0 - 3.0.7 (these versions are discouraged) | 3.6 or later | +| 2.4.7 | 2.7 or later | +| 1.5.7 | 2.6 - 2.7 | ++--------------------------------------------------+-------------------+ + + +Version 3.2.2 - March, 2025 +--------------------------- +- Released `cvt_pyparsing_pep8_names.py` conversion utility to upgrade pyparsing-based + programs and libraries that use legacy camelCase names to use the new PEP8-compliant + snake_case method names. The converter can also be imported into other scripts as + + from pyparsing.tools.cvt_pyparsing_pep8_names import pep8_converter + +- Fixed bug in `nested_expr` where nested contents were stripped of whitespace when + the default whitespace characters were cleared (raised in this StackOverflow + question https://stackoverflow.com/questions/79327649 by Ben Alan). Also addressed + bug in resolving PEP8 compliant argument name and legacy argument name. + +- Fixed bug in `rest_of_line` and the underlying `Regex` class, in which matching a + pattern that could match an empty string (such as `".*"` or `"[A-Z]*"` would not raise + a `ParseException` at or beyond the end of the input string. This could cause an + infinite parsing loop when parsing `rest_of_line` at the end of the input string. + Reported by user Kylotan, thanks! (Issue #593) + +- Enhancements and extra input validation for `pyparsing.util.make_compressed_re` - see + usage in `examples/complex_chemical_formulas.py` and result in the generated railroad + diagram `examples/complex_chemical_formulas_diagram.html`. Properly escapes characters + like "." and "*" that have special meaning in regular expressions. + +- Fixed bug in `one_of()` to properly escape characters that are regular expression markers + (such as '*', '+', '?', etc.) before building the internal regex. + +- Better exception message for `MatchFirst` and `Or` expressions, showing all alternatives + rather than just the first one. Fixes Issue #592, reported by Focke, thanks! + +- Added return type annotation of "-> None" for all `__init__()` methods, to satisfy + `mypy --strict` type checking. PR submitted by FeRD, thank you! + +- Added optional argument `show_hidden` to `create_diagram` to show + elements that are used internally by pyparsing, but are not part of the actual + parser grammar. For instance, the `Tag` class can insert values into the parsed + results but it does not actually parse any input, so by default it is not included + in a railroad diagram. By calling `create_diagram` with `show_hidden` = `True`, + these internal elements will be included. (You can see this in the tag_metadata.py + script in the examples directory.) + +- Fixed bug in `number_words.py` example. Also added `ebnf_number_words.py` to demonstrate + using the `ebnf.py` EBNF parser generator to build a similar parser directly from + EBNF. + +- Fixed syntax warning raised in `bigquery_view_parser.py`, invalid escape sequence "\s". + Reported by sameer-google, nice catch! (Issue #598) + +- Added support for Python 3.14. + Version 3.2.1 - December, 2024 ------------------------------ @@ -31,7 +113,7 @@ Version 3.2.1 - December, 2024 - Added generated diagrams for many of the examples. -- Replaced old examples/0README.html file with examples/README.md file. +- Replaced old `examples/0README.html` file with `examples/README.md` file. Version 3.2.0 - October, 2024 diff --git a/README.rst b/README.rst index 24d603c7..cfb9889f 100644 --- a/README.rst +++ b/README.rst @@ -26,7 +26,7 @@ Here is a program to parse ``"Hello, World!"`` (or any greeting of the form from pyparsing import Word, alphas greet = Word(alphas) + "," + Word(alphas) + "!" hello = "Hello, World!" - print(hello, "->", greet.parseString(hello)) + print(hello, "->", greet.parse_string(hello)) The program outputs the following:: @@ -36,7 +36,7 @@ The Python representation of the grammar is quite readable, owing to the self-explanatory class names, and the use of '+', '|' and '^' operator definitions. -The parsed results returned from ``parseString()`` is a collection of type +The parsed results returned from ``parse_string()`` is a collection of type ``ParseResults``, which can be accessed as a nested list, a dictionary, or an object with named attributes. diff --git a/docs/HowToUsePyparsing.rst b/docs/HowToUsePyparsing.rst index f23047f0..03d4d925 100644 --- a/docs/HowToUsePyparsing.rst +++ b/docs/HowToUsePyparsing.rst @@ -998,7 +998,7 @@ Exception classes and Troubleshooting expr = pp.Word(pp.alphanums).set_name("word").set_debug() print(ppt.with_line_numbers(data)) - expr[...].parseString(data) + expr[...].parse_string(data) prints:: @@ -1155,7 +1155,7 @@ Helper methods expr = infix_notation(int_expr, [ - (one_of("+ -"), 2, opAssoc.LEFT), + (one_of("+ -"), 2, OpAssoc.LEFT), ], lpar="<", rpar=">" @@ -1170,7 +1170,7 @@ Helper methods expr = infix_notation(int_expr, [ - (one_of("+ -"), 2, opAssoc.LEFT), + (one_of("+ -"), 2, OpAssoc.LEFT), ], lpar=Literal("<"), rpar=Literal(">") @@ -1489,6 +1489,8 @@ This will result in the railroad diagram being written to ``street_address_diagr - ``show_groups`` - bool flag whether groups should be highlighted with an unlabeled surrounding box +- ``show_hidden`` - bool flag whether internal pyparsing elements that are normally omitted in diagrams should be shown (default=False) + - ``embed`` - bool flag whether generated HTML should omit , , and tags to embed the resulting HTML in an enclosing HTML source (such as PyScript HTML) diff --git a/docs/whats_new_in_3_2.rst b/docs/whats_new_in_3_2.rst index daf85ad4..c210d800 100644 --- a/docs/whats_new_in_3_2.rst +++ b/docs/whats_new_in_3_2.rst @@ -117,7 +117,7 @@ New / Enhanced Examples Robert Nystrom's "Crafting Interpreters" (http://craftinginterpreters.com/). - Added ``complex_chemical_formulas.py`` example, to add parsing capability for - formulas such as "3(C₆Hâ‚…OH)â‚‚". + formulas such as "Ba(BrO₃)₂·H₂O". - Updated ``tag_emitter.py`` to use new ``Tag`` class, introduced in pyparsing 3.1.3. diff --git a/examples/README.md b/examples/README.md index 73f3fd0c..efb50325 100644 --- a/examples/README.md +++ b/examples/README.md @@ -3,6 +3,16 @@ This directory contains a number of examples of parsers created using pyparsing. They fall into a few general categories (several examples include supporting railroad diagrams): + +* [Pyparsing tutorial and language feature demonstrations](#pyparsing-tutorial-and-language-feature-demonstrations) +* [Language parsers](#language-parsers) +* [Domain Specific Language parsers](#domain-specific-language-parsers) +* [Search and query language parsers](#search-and-query-language-parsers) +* [Data format parsers](#data-format-parsers) +* [Logical and arithmetic infix notation parsers and examples](#logical-and-arithmetic-infix-notation-parsers-and-examples) +* [Helpful utilities](#helpful-utilities) + + ## Pyparsing tutorial and language feature demonstrations * Hello World! * [greeting.py](./greeting.py) @@ -18,7 +28,7 @@ categories (several examples include supporting railroad diagrams): * Unicode text handling * [tag_metadata.py](./tag_metadata.py) [(diagram)](./tag_metadata_diagram.html) * chemical formulas - * [chemical_formula.py](./chemical_formula.py) + * [chemical_formulas.py](./chemical_formulas.py) * [complex_chemical_formulas.py](./complex_chemical_formulas.py) * API checker * [apicheck.py](./apicheck.py) [(diagram)](./apicheck_diagram.html) @@ -59,12 +69,15 @@ categories (several examples include supporting railroad diagrams): * rosetta code * [rosettacode.py](./rosettacode.py) [(diagram)](./rosettacode_diagram.html) ## Domain Specific Language parsers - * adventureEngine + * adventureEngine - interactive fiction parser and game runner * [adventureEngine.py](./adventureEngine.py) [(diagram)](./adventure_game_parser_diagram.html) - * pgn + * pgn - Chess notation parser * [pgn.py](./pgn.py) - * TAP + * TAP - Test results parser * [TAP.py](./TAP.py) [(diagram)](./TAP_diagram.html) + * EBNF - Extended Backus-Naur Format parser (and compiler to a running pyparsing parser) + * [ebnf.py](./ebnf.py) [(diagram)](./ebnf_diagram.html) + * [ebnf_number_words.py](./ebnf_number_words.py) [(diagram)](./ebnf_number_parser_diagram.html) ## Search and query language parsers * basic search * [searchparser.py](./searchparser.py) [demo](./searchParserAppDemo.py) diff --git a/examples/adventureEngine.py b/examples/adventureEngine.py index c4d155b6..b258b7c7 100644 --- a/examples/adventureEngine.py +++ b/examples/adventureEngine.py @@ -454,7 +454,7 @@ def make_bnf(self): quitVerb = pp.one_of("QUIT Q", caseless=True) lookVerb = pp.one_of("LOOK L", caseless=True) doorsVerb = pp.CaselessLiteral("DOORS") - helpVerb = pp.one_of("H HELP ?", caseless=True) + helpVerb = pp.one_of("H HELP ?", caseless=True).set_name("HELP | H | ?") itemRef = pp.OneOrMore(pp.Word(pp.alphas)).set_parse_action(self.validate_item_name).setName("item_ref") nDir = pp.one_of("N NORTH", caseless=True).set_parse_action(pp.replace_with("N")) @@ -512,7 +512,12 @@ def make_bnf(self): )("command").set_name("command") with contextlib.suppress(Exception): - parser.create_diagram("adventure_game_parser_diagram.html", vertical=2, show_groups=True) + parser.create_diagram( + "adventure_game_parser_diagram.html", + vertical=3, + show_groups=True, + show_results_names=True + ) return parser diff --git a/examples/adventure_game_parser_diagram.html b/examples/adventure_game_parser_diagram.html index e5ff5ca0..2a556301 100644 --- a/examples/adventure_game_parser_diagram.html +++ b/examples/adventure_game_parser_diagram.html @@ -21,55 +21,71 @@

command

- + - - -INVENTORY | INV | IINVENTORY | INV | I - -USE | UUSE | U -item_refitem_ref - - -IN | ONIN | ON - - -item_refitem_ref - -OPEN | OOPEN | O -item_refitem_ref - -CLOSE | CLCLOSE | CL -item_refitem_ref - -DROP | LEAVEDROP | LEAVE -item_refitem_ref - - -TAKE | PICKUPTAKE | PICKUP - -'PICK' -'UP' -item_refitem_ref - - - - -MOVE | GOMOVE | GO - -NORTH | NNORTH | N -SOUTH | SSOUTH | S -EAST | EEAST | E -WEST | WWEST | W -LOOK | LLOOK | L - -EXAMINE | EX | XEXAMINE | EX | X -item_refitem_ref -DOORSDOORS -HELP | H | ?HELP | H | ? -QUIT | QQUIT | Q -
- - -
-

time_ref_present

-
-
- - - - -Tag:time_ref_present=True
diff --git a/examples/ebnf.py b/examples/ebnf.py index 4843d40c..96749f7e 100644 --- a/examples/ebnf.py +++ b/examples/ebnf.py @@ -1,14 +1,16 @@ # This module tries to implement ISO 14977 standard with pyparsing. # pyparsing version 1.1 or greater is required. +from typing import Any # ISO 14977 standardize The Extended Backus-Naur Form(EBNF) syntax. # You can read a final draft version here: # https://www.cl.cam.ac.uk/~mgk25/iso-ebnf.html # # Submitted 2004 by Seo Sanghyeon +# Updated to current pyparsing styles 2025 by Paul McGuire # -from pyparsing import * +import pyparsing as pp all_names = """ @@ -27,147 +29,157 @@ syntax """.split() +LBRACK, RBRACK, LBRACE, RBRACE, LPAR, RPAR, DASH, STAR, EQ, SEMI = pp.Suppress.using_each( + "[]{}()-*=;" +) -integer = Word(nums) -meta_identifier = Word(alphas, alphanums + "_") -terminal_string = Suppress("'") + CharsNotIn("'") + Suppress("'") ^ Suppress( - '"' -) + CharsNotIn('"') + Suppress('"') -definitions_list = Forward() -optional_sequence = Suppress("[") + definitions_list + Suppress("]") -repeated_sequence = Suppress("{") + definitions_list + Suppress("}") -grouped_sequence = Suppress("(") + definitions_list + Suppress(")") +integer = pp.common.integer() +meta_identifier = pp.common.identifier() +terminal_string = pp.Regex( + r'"[^"]*"' + r"|" + r"'[^']*'" +).add_parse_action(pp.remove_quotes) + +definitions_list = pp.Forward() +optional_sequence = LBRACK + definitions_list + RBRACK +repeated_sequence = LBRACE + definitions_list + RBRACE +grouped_sequence = LPAR + definitions_list + RPAR syntactic_primary = ( optional_sequence - ^ repeated_sequence - ^ grouped_sequence - ^ meta_identifier - ^ terminal_string + | repeated_sequence + | grouped_sequence + | meta_identifier + | terminal_string ) -syntactic_factor = Optional(integer + Suppress("*")) + syntactic_primary -syntactic_term = syntactic_factor + Optional(Suppress("-") + syntactic_factor) -single_definition = delimitedList(syntactic_term, ",") -definitions_list << delimitedList(single_definition, "|") -syntax_rule = meta_identifier + Suppress("=") + definitions_list + Suppress(";") +syntactic_factor = pp.Optional(integer + STAR) + syntactic_primary +syntactic_term = syntactic_factor + pp.Optional(DASH + syntactic_factor) +single_definition = pp.DelimitedList(syntactic_term, ",") +definitions_list <<= pp.DelimitedList(single_definition, "|") +syntax_rule = meta_identifier + EQ + definitions_list + SEMI ebnfComment = ( - ("(*" + ZeroOrMore(CharsNotIn("*") | ("*" + ~Literal(")"))) + "*)") + ("(*" + (pp.CharsNotIn("*") | ("*" + ~pp.Literal(")")))[...] + "*)") .streamline() .setName("ebnfComment") ) -syntax = OneOrMore(syntax_rule) +syntax = syntax_rule[1, ...] syntax.ignore(ebnfComment) -def do_integer(str, loc, toks): +def do_integer(toks): return int(toks[0]) -def do_meta_identifier(str, loc, toks): +def do_meta_identifier(toks): if toks[0] in symbol_table: return symbol_table[toks[0]] else: - forward_count.value += 1 - symbol_table[toks[0]] = Forward() + symbol_table[toks[0]] = pp.Forward() return symbol_table[toks[0]] -def do_terminal_string(str, loc, toks): - return Literal(toks[0]) +def do_terminal_string(toks): + return pp.Literal(toks[0]) -def do_optional_sequence(str, loc, toks): - return Optional(toks[0]) +def do_optional_sequence(toks): + return pp.Optional(toks[0]) -def do_repeated_sequence(str, loc, toks): - return ZeroOrMore(toks[0]) +def do_repeated_sequence(toks): + return pp.ZeroOrMore(toks[0]) -def do_grouped_sequence(str, loc, toks): - return Group(toks[0]) +def do_grouped_sequence(toks): + return pp.Group(toks[0]) -def do_syntactic_primary(str, loc, toks): +def do_syntactic_primary(toks): return toks[0] -def do_syntactic_factor(str, loc, toks): - if len(toks) == 2: +def do_syntactic_factor(toks): + if len(toks) == 2 and toks[0] > 1: # integer * syntactic_primary - return And([toks[1]] * toks[0]) + return pp.And([toks[1]] * toks[0]) else: # syntactic_primary return [toks[0]] -def do_syntactic_term(str, loc, toks): +def do_syntactic_term(toks): if len(toks) == 2: # syntactic_factor - syntactic_factor - return NotAny(toks[1]) + toks[0] + return pp.NotAny(toks[1]) + toks[0] else: # syntactic_factor return [toks[0]] -def do_single_definition(str, loc, toks): +def do_single_definition(toks): toks = toks.asList() if len(toks) > 1: # syntactic_term , syntactic_term , ... - return And(toks) + return pp.And(toks) else: # syntactic_term return [toks[0]] -def do_definitions_list(str, loc, toks): +def do_definitions_list(toks): toks = toks.asList() if len(toks) > 1: # single_definition | single_definition | ... - return Or(toks) + return pp.Or(toks) else: # single_definition return [toks[0]] -def do_syntax_rule(str, loc, toks): +def do_syntax_rule(toks): # meta_identifier = definitions_list ; assert toks[0].expr is None, "Duplicate definition" - forward_count.value -= 1 - toks[0] << toks[1] + toks[0] <<= toks[1] return [toks[0]] -def do_syntax(str, loc, toks): +def do_syntax(): # syntax_rule syntax_rule ... return symbol_table -symbol_table = {} - - -class forward_count: - pass - - -forward_count.value = 0 for name in all_names: expr = vars()[name] action = vars()["do_" + name] - expr.setName(name) - expr.setParseAction(action) - # ~ expr.setDebug() + expr.set_name(name) + expr.add_parse_action(action) + # expr.setDebug() + + +symbol_table: dict[str, pp.Forward] = {} -def parse(ebnf, given_table={}): +def parse(ebnf, given_table=None, *, enable_debug=False): + given_table = given_table or {} symbol_table.clear() symbol_table.update(given_table) - forward_count.value = 0 - table = syntax.parseString(ebnf)[0] - assert forward_count.value == 0, "Missing definition" - for name in table: - expr = table[name] - expr.setName(name) - # ~ expr.setDebug() + table = syntax.parse_string(ebnf, parse_all=True)[0] + missing_definitions = [ + k for k, v in table.items() + if k not in given_table and v.expr is None + ] + assert not missing_definitions, f"Missing definitions for {missing_definitions}" + for name, expr in table.items(): + expr.set_name(name) + expr.set_debug(enable_debug) return table + + +if __name__ == '__main__': + try: + syntax.create_diagram("ebnf_diagram.html") + except Exception as e: + print("Failed to create diagram for EBNF syntax parser" + f" - {type(e).__name__}: {e}") diff --git a/examples/ebnf_diagram.html b/examples/ebnf_diagram.html new file mode 100644 index 00000000..74ec4443 --- /dev/null +++ b/examples/ebnf_diagram.html @@ -0,0 +1,656 @@ + + + + + + + + + + + + + + + +
+

syntax

+
+
+ + + + + +syntax_rulesyntax_rule + +
+
+ +
+

syntax_rule

+
+
+ + + + + +meta_identifiermeta_identifier + +'=' +[suppress] +definitions_listdefinitions_list + +';' +[suppress] +
+
+ +
+

definitions_list

+
+
+ + + + + + + +single_definitionsingle_definition + + + + + +'|' +[suppress] +single_definitionsingle_definition + +
+
+ +
+

single_definition

+
+
+ + + + + + +syntactic_termsyntactic_term + + + + + +',' +[suppress] +syntactic_termsyntactic_term + +
+
+ +
+

syntactic_term

+
+
+ + + + + +syntactic_factorsyntactic_factor + + + + +'-' +[suppress] +syntactic_factorsyntactic_factor +
+
+ +
+

syntactic_factor

+
+
+ + + + + + + + +integerinteger + +'*' +[suppress] +syntactic_primarysyntactic_primary +
+
+ +
+

integer

+
+
+ + + + +W:(0-9) +
+
+ +
+

syntactic_primary

+
+
+ + + + + +optional_sequenceoptional_sequence +repeated_sequencerepeated_sequence +grouped_sequencegrouped_sequence +meta_identifiermeta_identifier +terminal_stringterminal_string +
+
+ +
+

optional_sequence

+
+
+ + + + + + +'[' +[suppress] +definitions_listdefinitions_list + +']' +[suppress] +
+
+ +
+

repeated_sequence

+
+
+ + + + + + +'{' +[suppress] +definitions_listdefinitions_list + +'}' +[suppress] +
+
+ +
+

grouped_sequence

+
+
+ + + + + + +'(' +[suppress] +definitions_listdefinitions_list + +')' +[suppress] +
+
+ +
+

meta_identifier

+
+
+ + + + +W:(A-Z_a-zªµºÀ-Ö..., 0-9A-Z_a-zªµ·...) +
+
+ +
+

terminal_string

+
+
+ + + + +"[^"]*"|'[^']*' +
+
+ + + + diff --git a/examples/ebnf_number_parser_diagram.html b/examples/ebnf_number_parser_diagram.html new file mode 100644 index 00000000..2f1b534b --- /dev/null +++ b/examples/ebnf_number_parser_diagram.html @@ -0,0 +1,531 @@ + + + + + + + + + + + + + + + +
+

number

+
+
+ + + + + + + + + +thousandsthousands + + +andand + + + +hundredshundreds + + +andand + + +one_to_99one_to_99 +
+
+ +
+

thousands

+
+
+ + + + + + +one_to_99one_to_99 +'thousand' +
+
+ +
+

one_to_99

+
+
+ + + + + + +unitsunits +teensteens +tenten + +multiples_of_tenmultiples_of_ten + + + + + +'-' +unitsunits +
+
+ +
+

units

+
+
+ + + + + + +'one' +'two' +'three' +'four' +'five' +'six' +'seven' +'eight' +'nine' +
+
+ +
+

teens

+
+
+ + + + + + +'eleven' +'twelve' +'thirteen' +'fourteen' +'fifteen' +'sixteen' +'seventeen' +'eighteen' +'nineteen' +
+
+ +
+

ten

+
+
+ + + + + +'ten' +
+
+ +
+

multiples_of_ten

+
+
+ + + + + + +'twenty' +'thirty' +'forty' +'fifty' +'sixty' +'seventy' +'eighty' +'ninety' +
+
+ +
+

and

+
+
+ + + + + + +'and' +'-' +
+
+ +
+

hundreds

+
+
+ + + + + + +hundreds_multhundreds_mult +'hundred' +
+
+ +
+

hundreds_mult

+
+
+ + + + + + +unitsunits +teensteens + +multiples_of_tenmultiples_of_ten + + +'-' +unitsunits +
+
+ + + + diff --git a/examples/ebnf_number_words.py b/examples/ebnf_number_words.py new file mode 100644 index 00000000..8d6b46f2 --- /dev/null +++ b/examples/ebnf_number_words.py @@ -0,0 +1,77 @@ +# +# ebnftest_number_parser.py +# +# BNF from number_parser.py: +# +# optional_and ::= ["and" | "-"] +# optional_dash ::= ["-"] +# units ::= "one" | "two" | "three" | ... | "nine" +# tens ::= "twenty" | "thirty" | ... | "ninety" +# one_to_99 ::= units | ten | teens | (tens [optional_dash units]) +# ten ::= "ten" +# teens ::= "eleven" | "twelve" | ... | "nineteen" +# hundreds ::= (units | teens_only | tens optional_dash units) "hundred" +# thousands ::= one_to_99 "thousand" +# +# # number from 1-999,999 +# number ::= [thousands [optional_and]] [hundreds[optional_and]] one_to_99 +# | [thousands [optional_and]] hundreds +# | thousands +# + +import ebnf + +grammar = """ + (* + EBNF for number_words.py + *) + number = [thousands, [and]], [hundreds, [and]], [one_to_99]; + thousands = one_to_99, "thousand"; + hundreds_mult = units | teens | multiples_of_ten, ["-"], units; + hundreds = hundreds_mult, "hundred"; + teens = + "eleven" + | "twelve" + | "thirteen" + | "fourteen" + | "fifteen" + | "sixteen" + | "seventeen" + | "eighteen" + | "nineteen" + ; + one_to_99 = units | teens | ten | multiples_of_ten, [["-"], units]; + ten = "ten"; + multiples_of_ten = "twenty" | "thirty" | "forty" | "fifty" | "sixty" | "seventy" | "eighty" | "ninety"; + units = "one" | "two" | "three" | "four" | "five" | "six" | "seven" | "eight" | "nine"; + and = "and" | "-"; + """ + +parsers = ebnf.parse(grammar) +number_parser = parsers["number"] + +try: + number_parser.create_diagram("ebnf_number_parser_diagram.html") +except Exception as e: + print("Failed to create diagram for EBNF-generated number parser" + f" - {type(e).__name__}: {e}") + +number_parser.run_tests( + """ + one + seven + twelve + twenty six + forty-two + two hundred + twelve hundred + one hundred and eleven + seven thousand and six + twenty five hundred and one + ninety nine thousand nine hundred and ninety nine + + # invalid + twenty hundred + """, + full_dump=False +) \ No newline at end of file diff --git a/examples/ebnftest.py b/examples/ebnftest.py index 7b1ff759..88b88bf1 100644 --- a/examples/ebnftest.py +++ b/examples/ebnftest.py @@ -6,70 +6,54 @@ # Submitted 2004 by Seo Sanghyeon # print("Importing pyparsing...") -from pyparsing import * +import pyparsing as pp print("Constructing EBNF parser with pyparsing...") import ebnf grammar = """ -syntax = (syntax_rule), {(syntax_rule)}; -syntax_rule = meta_identifier, '=', definitions_list, ';'; -definitions_list = single_definition, {'|', single_definition}; -single_definition = syntactic_term, {',', syntactic_term}; -syntactic_term = syntactic_factor,['-', syntactic_factor]; -syntactic_factor = [integer, '*'], syntactic_primary; -syntactic_primary = optional_sequence | repeated_sequence | - grouped_sequence | meta_identifier | terminal_string; -optional_sequence = '[', definitions_list, ']'; -repeated_sequence = '{', definitions_list, '}'; -grouped_sequence = '(', definitions_list, ')'; -(* -terminal_string = "'", character - "'", {character - "'"}, "'" | - '"', character - '"', {character - '"'}, '"'; - meta_identifier = letter, {letter | digit}; -integer = digit, {digit}; -*) + (* + ISO 14977 standardize The Extended Backus-Naur Form(EBNF) syntax. + You can read a final draft version here: + https://www.cl.cam.ac.uk/~mgk25/iso-ebnf.html + *) + syntax = (syntax_rule), {(syntax_rule)}; + syntax_rule = meta_identifier, '=', definitions_list, ';'; + definitions_list = single_definition, {'|', single_definition}; + single_definition = syntactic_term, {',', syntactic_term}; + syntactic_term = syntactic_factor,['-', syntactic_factor]; + syntactic_factor = [integer, '*'], syntactic_primary; + syntactic_primary = optional_sequence | repeated_sequence | + grouped_sequence | meta_identifier | terminal_string; + optional_sequence = '[', definitions_list, ']'; + repeated_sequence = '{', definitions_list, '}'; + grouped_sequence = '(', definitions_list, ')'; + (* + terminal_string = "'", character - "'", {character - "'"}, "'" | + '"', character - '"', {character - '"'}, '"'; + meta_identifier = letter, {letter | digit}; + integer = digit, {digit}; + *) """ -table = {} -# ~ table['character'] = Word(printables, exact=1) -# ~ table['letter'] = Word(alphas + '_', exact=1) -# ~ table['digit'] = Word(nums, exact=1) -table["terminal_string"] = sglQuotedString -table["meta_identifier"] = Word(alphas + "_", alphas + "_" + nums) -table["integer"] = Word(nums) +table: dict[str, pp.ParserElement] = { + # "character": pp.Char(pp.printables), + # "letter": pp.Char(pp.alphas + '_'), + # "digit": pp.Char(nums), + "terminal_string": pp.sgl_quoted_string | pp.dbl_quoted_string, + "meta_identifier": pp.Word(pp.alphas + "_", pp.alphas + "_" + pp.nums), + "integer": pp.common.integer, +} print("Parsing EBNF grammar with EBNF parser...") parsers = ebnf.parse(grammar, table) ebnf_parser = parsers["syntax"] -commentcharcount = 0 -commentlocs = set() - - -def tallyCommentChars(s, l, t): - global commentcharcount, commentlocs - # only count this comment if we haven't seen it before - if l not in commentlocs: - charCount = len(t[0]) - len(list(filter(str.isspace, t[0]))) - commentcharcount += charCount - commentlocs.add(l) - return l, t - - -# ordinarily, these lines wouldn't be necessary, but we are doing extra stuff with the comment expression -ebnf.ebnfComment.setParseAction(tallyCommentChars) ebnf_parser.ignore(ebnf.ebnfComment) -print("Parsing EBNF grammar with generated EBNF parser...\n") -parsed_chars = ebnf_parser.parseString(grammar) -parsed_char_len = len(parsed_chars) +ebnf_parser.create_diagram("ebnftest_diagram.html") -print("],\n".join(str(parsed_chars.asList()).split("],"))) - -# ~ grammar_length = len(grammar) - len(filter(str.isspace, grammar))-commentcharcount - -# ~ assert parsed_char_len == grammar_length - -print("Ok!") +print("Parsing EBNF grammar with generated EBNF parser...\n") +parsed_chars = ebnf_parser.parse_string(grammar, parse_all=True) +print("\n".join(str(pc) for pc in parsed_chars.as_list())) diff --git a/examples/infix_math_parser.py b/examples/infix_math_parser.py new file mode 100644 index 00000000..5ff9ef3f --- /dev/null +++ b/examples/infix_math_parser.py @@ -0,0 +1,172 @@ +"""Defines a recursive parser for parsing mathematical expressions in infix notation. + +Supports binary, unary, and variadic operations. These can also be customized +in the InfixExpressionParser class variables. Utilizes some regex to improve its +performance. + +Examples of parsing: + +The expression "f_1 + f_2 - 1e-3" is parsed into [['f_1', '+', 'f_2', '-', 0.001]] + +The expression "Max(Ln(x) + Lb(Abs(y)), Ceil(Sqrt(garlic) * 3), (potato ** 2) / 4, Abs(cosmic) + 10)" +is parsed into +[['Max', [[['Ln', ['x']], '+', ['Lb', [['Abs', ['y']]]]], ['Ceil', [[['Sqrt', ['garlic']], '*', 3]]], [['potato', '**', 2], '/', 4], [['Abs', ['cosmic']], '+', 10]]]] +""" + +from pyparsing import ( + Forward, + Group, + Literal, + ParserElement, + Suppress, + DelimitedList, + infix_notation, + one_of, + OpAssoc, + pyparsing_common, + ParseResults, + Regex, +) + +# Enable Packrat for better performance in recursive parsing +ParserElement.enablePackrat(None) + + +class InfixExpressionParser: + """A class for defining an infix notation parsers.""" + + # Supported infix binary operators, i.e., '1+1'. The key is the notation of the operator in infix format, + # and the value the notation in parsed format. + BINARY_OPERATORS: dict[str, str] = { + "+": "Add", + "-": "Subtract", + "*": "Multiply", + "/": "Divide", + "**": "Power", + } + + # Supported infix unary operators, i.e., 'Cos(90)'. The key is the notation of the operator in infix format, + # and the value the notation in parsed format. + UNARY_OPERATORS: dict[str, str] = { + "Cos": "Cos", + "Sin": "Sin", + "Tan": "Tan", + "Exp": "Exp", + "Ln": "Ln", + "Lb": "Lb", + "Lg": "Lg", + "LogOnePlus": "LogOnePlus", + "Sqrt": "Sqrt", + "Square": "Square", + "Abs": "Abs", + "Ceil": "Ceil", + "Floor": "Floor", + "Arccos": "Arccos", + "Arccosh": "Arccosh", + "Arcsin": "Arcsin", + "Arcsinh": "Arcsinh", + "Arctan": "Arctan", + "Arctanh": "Arctanh", + "Cosh": "Cosh", + "Sinh": "Sinh", + "Tanh": "Tanh", + "Rational": "Rational", + } + + # Supported infix variadic operators (operators that take one or more comma separated arguments), + # i.e., 'Max(1,2, Cos(3)). The key is the notation of the operator in infix format, + # and the value the notation in parsed format. + VARIADIC_OPERATORS: dict[str, str] = {"Max": "Max"} + + def __init__(self): + """A parser for infix notation, e.g., the human readable way of notating mathematical expressions. + + The parser can parse infix notation stored in a string. For instance, + "Cos(2 + f_1) - 7.2 + Max(f_2, -f_3)" is parsed to the list: + ['Cos', [[2, '+', 'f_1']]], '-', 7.2, '+', ['Max', ['f_2', ['-', 'f_3']]. + + """ + # Scope limiters + lparen = Suppress("(") + rparen = Suppress(")") + + # Define keywords (Note that binary operators must be defined manually) + symbols_variadic = set(InfixExpressionParser.VARIADIC_OPERATORS) + symbols_unary = set(InfixExpressionParser.UNARY_OPERATORS) + + # Define binary operation symbols (this is the manual part) + # If new binary operators are to be added, they must be defined here. + signop = one_of("+ -") + multop = one_of("* /") + plusop = one_of("+ -") + expop = Literal("**") + + # Dynamically create Keyword objects for variadic functions + variadic_pattern = r"\b(" + f"{'|'.join([*symbols_variadic])}" + r")\b" + variadic_func_names = Regex(variadic_pattern).set_name("variadic function") + + # Dynamically create Keyword objects for unary functions + unary_pattern = r"\b(" + f"{'|'.join([*symbols_unary])}" + r")\b" + unary_func_names = Regex(unary_pattern).set_name("unary function") + + # Define operands + # Integers + integer = pyparsing_common.integer.set_name("integer") + + # Scientific notation + scientific = pyparsing_common.sci_real.set_name("float") + + # Complete regex pattern with exclusions and identifier pattern + exclude = f"{'|'.join([*symbols_variadic, *symbols_unary])}" + pattern = r"(?!\b(" + exclude + r")\b)(\b[a-zA-Z_][a-zA-Z0-9_]*\b)" + variable = Regex(pattern).set_name("variable") + + operands = variable | scientific | integer + + # Forward declarations of variadic and unary function calls + variadic_call = Forward() + unary_call = Forward() + + # The parsed expressions are assumed to follow a standard infix syntax. The operands + # of the infix syntax can be either the literal 'operands' defined above (these are singletons), + # or either a variadic function call or a unary function call. These latter two will be + # defined to be recursive. + # + # Note that the order of the operators in the second argument (the list) of infix_notation matters! + # The operation with the highest precedence is listed first. + infix_expn = infix_notation( + operands | variadic_call | unary_call, + [ + (expop, 2, OpAssoc.LEFT), + (signop, 1, OpAssoc.RIGHT), + (multop, 2, OpAssoc.LEFT), + (plusop, 2, OpAssoc.LEFT), + ], + ) + + # These are recursive definitions of the forward declarations of the two type of function calls. + # In essence, the recursion continues until a singleton operand is encountered. + variadic_call <<= Group( + variadic_func_names + lparen + Group(DelimitedList(infix_expn)) + rparen + ) + unary_call <<= Group(unary_func_names + lparen + Group(infix_expn) + rparen) + + self.expn = infix_expn + + def parse(self, str_expr: str) -> ParseResults: + """Parse a string expression into a list.""" + return self.expn.parse_string(str_expr, parse_all=True) + + +if __name__ == "__main__": + infix_parser = InfixExpressionParser() + + expressions = [ + "f_1 + f_2 - 1e-3", + "(x_1 + (x_2 * (c_1 + 3.3) / (x_3 - 2))) * 1.5", + "Max(Ln(x) + Lb(Abs(y)), Ceil(Sqrt(garlic) * 3), (potato ** 2) / 4, Abs(cosmic) + 10)", + "Max(Sqrt(Abs(x) + y ** 2), Lg(Max(cosmic, potato)), Ceil(Tanh(x) + Arctan(garlic)))", + "((garlic**3 - 2**Lb(cosmic)) + Ln(x**2 + 1)) / (Sqrt(Square(y) + LogOnePlus(potato + 3.1)))", + ] + + infix_parser.expn.run_tests(expressions) diff --git a/examples/number_words.py b/examples/number_words.py index aa3ea09f..8eeb577d 100644 --- a/examples/number_words.py +++ b/examples/number_words.py @@ -12,22 +12,22 @@ # # # BNF: -""" - optional_and ::= ["and" | "-"] - optional_dash ::= ["-"] - units ::= one | two | three | ... | nine - teens ::= ten | teens_only - tens ::= twenty | thirty | ... | ninety - one_to_99 ::= units | teens | (tens [optional_dash units]) - teens_only ::= eleven | twelve | ... | nineteen - hundreds ::= (units | teens_only | tens optional_dash units) "hundred" - thousands ::= one_to_99 "thousand" - - # number from 1-999,999 - number ::= [thousands [optional_and]] [hundreds[optional_and]] one_to_99 - | [thousands [optional_and]] hundreds - | thousands -""" +# optional_and ::= ["and" | "-"] +# optional_dash ::= ["-"] +# units ::= "one" | "two" | "three" | ... | "nine" +# ten ::= "ten" +# tens ::= "twenty" | "thirty" | ... | "ninety" +# one_to_99 ::= units | ten | teens | (tens [optional_dash units]) +# teens ::= "eleven" | "twelve" | ... | "nineteen" +# hundreds ::= (units | teens | tens optional_dash units) "hundred" +# thousands ::= one_to_99 "thousand" +# +# # number from 1-999,999 +# number ::= [thousands [optional_and]] [hundreds[optional_and]] one_to_99 +# | [thousands [optional_and]] hundreds +# | thousands +# + import pyparsing as pp from operator import mul @@ -70,24 +70,26 @@ def multiply(t): opt_and = pp.Opt((pp.CaselessKeyword("and") | "-").suppress()).set_name("'and/-'") units = define_numeric_word_range("one two three four five six seven eight nine", 1, 9) -teens_only = define_numeric_word_range( +teens = define_numeric_word_range( "eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen", 11, 19, ) ten = define_numeric_word_range("ten", 10) -teens = ten | teens_only tens = define_numeric_word_range( "twenty thirty forty fifty sixty seventy eighty ninety", 20, 90, 10 ) -one_to_99 = (units | teens | (tens + pp.Opt(opt_dash + units))).set_name("1-99") -one_to_99.add_parse_action(sum) hundred = define_numeric_word_range("hundred", 100) thousand = define_numeric_word_range("thousand", 1000) -hundreds = (units | teens_only | (tens + opt_dash + units)) + hundred +one_to_99_except_tens = (units | teens | (tens + opt_dash + units)).set_name("1-99 except tens") +one_to_99_except_tens.add_parse_action(sum) +one_to_99 = (one_to_99_except_tens | ten | tens).set_name("1-99") +one_to_99.add_parse_action(sum) + +hundreds = one_to_99_except_tens + hundred hundreds.set_name("100s") one_to_999 = ( @@ -128,6 +130,9 @@ def multiply(t): two hundred twelve hundred one hundred and eleven + seven thousand and six + twenty five hundred + twenty five hundred and one ninety nine thousand nine hundred and ninety nine nine hundred thousand nine hundred and ninety nine nine hundred and ninety nine thousand nine hundred and ninety nine diff --git a/examples/number_words_diagram.html b/examples/number_words_diagram.html index 626f7cb8..d7784206 100644 --- a/examples/number_words_diagram.html +++ b/examples/number_words_diagram.html @@ -18,7 +18,7 @@
-

numeric_words

+

numeric_words

@@ -30,22 +30,22 @@

numeric_words

-1000s1000s -'and/-''and/-' +1000s1000s +'and/-''and/-' -100s100s -'and/-''and/-' -1-991-99 +100s100s +'and/-''and/-' +1-991-99 -1000s1000s -'and/-''and/-' -100s100s -1000s1000s +
+
+ +
+

1-99

+
+
+ - -'one' -'two' -'three' -'four' -'five' -'six' -'seven' -'eight' -'nine'