Skip to content

dot_parser: Define parser elements in a class #464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 5, 2025

Conversation

ferdnyc
Copy link
Member

@ferdnyc ferdnyc commented Apr 4, 2025

This PR is centered around one major, should-be-api/user-transparent change to the dot_parser code which will change everything for us:

The final ParserElement that we use for parsing, as well as all of the component parsing fragments used to build it, are now defined in the top level of a class definition named GraphParser.

The class is not meant to be instantiated (though it can be), and it consists only of class-member variables, all of which are the ParserElement components of our parser definition. The final element, GraphParser.parser, is what used to be stored in the global graphparser symbol. (Which is preserved for compatibility, even though our own code won't use it anymore.)

Having access to the parser components is a necessary step towards being able to finally debug, test, and modify the parser. With the new class, I can do things like this:

>>> from pydot.dot_parser import GraphParser

>>> # If I want to test our parsing of attribute lists...
>>> GraphParser.attr_list.parse_string(
... '''[color="red", style="dashed, filled", width=12.5]''')
ParseResults([P_AttrList({'color': '"red"', 'style': '"dashed, filled"', 'width': '12.5'})], {})

>>> # Or edge statements...
>>> elist = GraphParser.edge_stmt.parse_string(
... '''a:5:sw -- b:4:ne [color=blue]''')
>>> print(elist[0].to_string())
a:5:sw -- b:4:ne [color=blue];

>>> # Or if I want to start trying to figure out why our
>>> # parsing of complex edges is totally broken...
>>> g = GraphParser.parser.parse_string(
... '''graph G { a -- { b; c; } -- d; }''')
>>> print(g[0].to_string(indent=2))
graph G {
  a -- subgraph {
    b;
    c;
  };
}

>>> # Well, that's clearly not right. What is the edge
>>> # parser doing?
>>> for e in GraphParser.edge_stmt.parse_string(
... '''a -- b -- c [color=blue]'''):
...     print(e.to_string())
... 
a -- b [color=blue];
b -- c [color=blue];

>>> # So far so good, but...
>>> GraphParser.edge_stmt.parse_string(
... '''a:5:sw -- { n; m } -- b:4:ne [color=blue]''')
ParseResults([<pydot.core.Edge object at 0x7fcc1089e5d0>], {})
>>> elist = GraphParser.edge_stmt.parse_string(
... '''a:5:sw -- { n; m } -- b:4:ne [color=blue]''')
>>> elist[0].to_string()
'a:5:sw -- subgraph {\nn;\nm;\n} [color=blue];'

So now I at least have a starting point to look into the code. It's also possible to disable parse actions (just call .set_parse_action() with no arguments on any of the ParserElement objects that have one, to clear it out) in order to get a look at the values being sent into the parse action.

And we can write unit tests for all of the parser pieces, now, instead of only testing it as a black box.

Other changes made

  • dot_parser now only imports, and only refers to, pydot.core.* classes; the compatibility exports at the pydot level are never used. That makes the dependency graph more manageable.
  • By the same token, the top-level import of pydot.dot_parser is gone from core.py. It's now only imported on demand, when graph_from_dot_data() needs it.
  • All type hints in dot_parser are now real references to imported classes, it no longer uses string annotations.
  • Unused s and loc arguments are removed from parse action functions that don't use them (something PyParsing has supported for a long while, now).
  • The no-op __init__ is removed from the HTML class, since its unused type-ignore comment was annoying MyPy and it served no purpose anyway. OK, correction, HTML.__init__() can't be removed, just its # type: ignore comment.

Notes

This will cause huge conflicts with my other open PR, but it's worth the disruption. I'll do the work of resolving those once they occur.

ferdnyc added 5 commits April 4, 2025 14:04
Instead of a global variable `graphparser` that's set by calling
a function the first time it's referenced, set up the parser
definition in a class `GraphParser`, so that it can be accessed
as `GraphParser.parser`.

This has the advantage that all of its _component_ parts (like
the definition of `subgraph_stmt`, or `attr_list` can also
be accessed as `GraphParser.subgraph_stmt` or
`GraphParser.attr_list`, etc. (Much better for debugging/testing.)

In addition, to improve dependency relationships:
- Only import pydot.core into dot_parser, not `pydot`
- Change all `pydot.Foo` references to `pydot.core.Foo`
- Use concrete `pydot.core.Foo`s in type annotations
  (instead of strings)
Strictly for compatibility, in case anyone is using it directly.
They probably shouldn't be, but this way we won't break their code.
@ferdnyc ferdnyc force-pushed the define-parser-class branch from 804e0be to ce3c952 Compare April 4, 2025 19:06
Copy link

github-actions bot commented Apr 4, 2025

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  src/pydot
  core.py
  dot_parser.py 362, 378
Project Total  

This report was generated by python-coverage-comment-action

@ferdnyc
Copy link
Member Author

ferdnyc commented Apr 4, 2025

So, I just added another commit that moves the lower-end mypy-check test run from the Python 3.8 environment to the Python 3.9 environment.

Rationale

The code as it currently stands passes MyPy checks on Python 3.9 - 3.13, it only upsets MyPy on Python 3.8 because PyParsing is untyped on that version, and always will be as it's no longer supported by newer PyParsing.

And since Python 3.8 is also EOL, while we may still support it, I don't think we should write our code for that version.

If the code works on Python 3.8 (and it does), and if it passes type checks on supported Python versions (which it does), then I think that's the highest level of Python 3.8 support we can be expected to provide.

@ferdnyc ferdnyc requested a review from lkk7 April 4, 2025 19:17
Copy link
Member

@lkk7 lkk7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I read it twice and didn't see anything to complain about.
It passes the tests, so...

@lkk7 lkk7 merged commit 7dd60c3 into pydot:main Apr 5, 2025
23 checks passed
@ferdnyc ferdnyc deleted the define-parser-class branch April 5, 2025 11:40
@ferdnyc
Copy link
Member Author

ferdnyc commented Apr 5, 2025

Now I just have to take advantage of this newfound power and actually figure out why push_edge_stmt is so very, VERY broken.

And also possibly hack in support for the undocumented syntax I found last week (#465).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants