Skip to content

Conversation

ferdnyc
Copy link
Member

@ferdnyc ferdnyc commented Jun 17, 2025

To tighten up the way the API is used, add an * to the argument
list of appropriate methods, to signify that the arguments beyond
that point are keyword-only.

Specifically...

  • in to_string methods, indent is keyword-only (as is the
    internal indent_level)

  • in del_ methods, the optional index is keyword-only

  • quote_id_if_necessary's unquoted_keywords is keyword-only

  • in all constructors except Common.__init__, obj_dict is
    keyword-only, and moved to the end of the keyword argument list.
    (It's not kw-only in Common, because it's the ONLY argument and
    the other constructors call it as super().__init__(obj_dict).)

  • Constructor arguments like strict, suppress_disconnected, etc.
    for Graph types are keyword-only.

  • graph_type is explicitly not kw-only, meaning a graph
    could be constructed as pydot.Graph("G", "digraph") which seems
    perfectly fine.

  • Dot.create used to be called internally by Dot.write as
    Dot.create(prog, format, encoding=) which didn't seem like a
    very comfortable API for outside users, so now the create method
    takes format as the only positional argument, and the call is
    changed to Dot.create(format, prog=, encoding=)

  • Dot.write itself takes only the path as a positional argument,
    the rest are keyword-only. While it might be nice to allow something
    like Dot.write('/some/path', 'pdf'), since the format is the
    first positional argument of Dot.create() that's potentially
    confusing. (Ideally maybe Dot.write() should be able to infer the
    format based on the filename.)

Rationale

It used to be possible, at least in theory, to make calls like these:

g = pydot.Dot("G", None, "digraph", True, False, True)

g.to_string(3)           # (indent=3)
g.to_string("    ")      # (indent="    ") 
g.del_edge("a", "b", 3)  # (index=3)

pydot.quote_id_if_necessary(some_id, ("node", "edge", "graph"))

d = pydot.Dot("", g.obj_dict)
d.write('/tmp/out.svg', "dot", "svg", "utf-8")
d.create("neato", "svg", "latin1")

...all of which seem inadvisable and best avoided.

We're pretty good about keyword args in our own code. (I only had to change one call to conform with this, an ancient create() call in Dot.write that used a weird set of positional arguments.) But, still, better that it's enforced by the method definitions. So, create some enforcement.

With this change, it's made explicit that the correct way to make the calls above is:

# (Either of these is fine)
g = pydot.Graph("G", "digraph", strict=True, simplify=True)
g = pydot.Graph("G", graph_type="digraph", strict=True, simplify=True)

g.to_string(indent=3)
g.to_string(indent="    ")
g.del_edge("a", "b", index=3)

pydot.quote_id_if_necessary(
    some_id, unquoted_keywords=("node", "edge", "graph")
)
d = pydot.Dot(obj_dict=g.obj_dict)
d.write('/tmp/out.svg', format="svg", prog="dot", encoding="utf-8")
d.create("svg", prog="neato", encoding="latin1")

Other changes

A second commit tightens the type for .to_string(indent) from Any to str | int | float.

ferdnyc added 2 commits June 17, 2025 17:23
Instead of `Any`, only accept `str | int | float`.
To tighten up the way the API is used, add an `*` to the argument
list of appropriate methods, to signify that the arguments beyond
that point are keyword-only.

Specifically:
- in `to_string` methods, `indent` is keyword-only (as is the
  internal `indent_level`)

- in `del_` methods, the optional `index` is keyword-only

- `quote_id_if_necessary`'s `unquoted_keywords` is keyword-only

- in all constructors except `Common.__init__`, `obj_dict` is
  keyword-only, and moved to the end of the keyword argument list.
  (It's not kw-only in `Common`, because it's the ONLY argument and
  the other constructors call it as `super().__init__(obj_dict)`.)

- Constructor arguments like `strict`, `suppress_disconnected`, etc.
  for `Graph` types are keyword-only.

- `graph_type` is explicitly _not_ kw-only, meaning a graph
  could be constructed as `pydot.Graph("G", "digraph")` which seems
  perfectly fine.

- `Dot.create` used to be called internally by `Dot.write` as
  `Dot.create(prog, format, encoding=)` which didn't seem like a
  very comfortable API for outside users, so now the `create` method
  takes `format` as the only positional argument, and the call is
  changed to `Dot.create(format, prog=, encoding=)`

- `Dot.write` itself takes only the `path` as a positional argument,
  the rest are keyword-only. While it might be nice to allow something
  like `Dot.write('/some/path', 'pdf')`, since the format is the
  _first_ positional argument of `Dot.create()` that's potentially
  confusing. (Ideally maybe `Dot.write()` should be able to infer the
  format based on the filename.)
Copy link

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  src/pydot
  core.py
Project Total  

This report was generated by python-coverage-comment-action

@ferdnyc
Copy link
Member Author

ferdnyc commented Jun 17, 2025

This is technically an API change.

In fact, it's a fairly significant one if you happen to be abusing the API with things like pydot.Graph("name", None, "graph", True, True, False), because it will break those calls.

I think it's still a good idea, but I guess it should be targeted for Pydot 5.0.

@ferdnyc ferdnyc added this to the Pydot 5.0 milestone Jun 17, 2025
@ferdnyc
Copy link
Member Author

ferdnyc commented Jun 17, 2025

I really am wondering if I should make d.write('path', 'format') legal, though. Or, alternatively, if we should try to infer the format based on the filename passed. (e.g.):

d.write('/tmp/mygraph.svg')  # (format="svg")
d.write('/tmp/mygraph.png')  # (format="png")
d.write('/tmp/mygraph.jpg')  # (format="jpg")

dot already helps us a lot with that, by e.g. permitting all of jpg, jpeg, and jpe for the JPEG format. That means that we could probably just take the file extension, whatever it is, from the filename and use that as the format.

We might want a couple of special cases (like .txt means either dot or canon, or .html means... svg_inline?), but it wouldn't involve that much code.

(Basically, I just really hate that our default format is "ps" because who wants PostScript anymore?)

Copy link
Member

@lkk7 lkk7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably break a lot of old code, but it's a cool change 😄 Definitely for 5.0! (BTW, 5.0? It flew by so quickly)
Maybe I'll scan some more well-known repos, like networkX, in my free time to see how this would affect them. Although good luck finding them there: https://github.com/pydot/pydot/network/dependents

@ferdnyc
Copy link
Member Author

ferdnyc commented Jun 18, 2025

Ideally I'd get un-lazy and add some tests before this gets merged, anyway. So, no rush.

@ferdnyc
Copy link
Member Author

ferdnyc commented Jul 24, 2025

In addition to the tests I've so far still been too lazy to write, I wonder if there's enough old, breakable code out there to make it worth providing a transitional wrapper layer for this?

Without too much work, I could add a module src/pydot/legacy.py that includes wrapper classes for all of the core.py ones, but using the old signatures. For example,

# src/pydot/legacy.py
import pydot.core

class Node(pydot.core.Node):

   def __init__(
        self,
        name: str = "",
        obj_dict: AttributeDict | None = None,
        **attrs: Any,
    ) -> None:
    super().__init__(name, obj_dict=obj_dict, **attrs)


class Graph(pydot.core.Graph):

    def __init__(
        self,
        graph_name: str = "G",
        obj_dict: AttributeDict | None = None,
        graph_type: str = "digraph",
        strict: bool = False,
        suppress_disconnected: bool = False,
        simplify: bool = False,
        **attrs: Any,
    ) -> None:
        super().__init__(
            graph_name,
            graph_type,
            obj_dict=obj_dict,
            strict=strict,
            suppress_disconnected=suppress_disconnected,
            simplify=simplify,
            **attrs)

   def del_node(self, name: str | Node, index: int | None = None) -> bool:
       return super().del_node(name, index=index)

   # etc...

Might need some @override decorators to appease mypy, but basically that's the gist of it. Inheritance means that docstrings and unchanged methods will all be copied from the parent class.

Then, someone upgrading who has calls that violate the new API has the choice between either fixing their calls, or changing their imports to pull in the pydot.legacy versions of the classes instead, as a stopgap. Because in a potentially large-ish codebase, fixing every pydot call could be a lot more work than this:

- from pydot import Graph, Node, Edge
+ from pydot.legacy import Graph, Node, Edge

Or, if I also import all of the unchanged symbols into pydot.legacy, perhaps even this:

- import pydot
+ import pydot.legacy as pydot

Obviously, pydot.legacy would have to be born deprecated, and explicitly going away within one or two major releases at most.

...Ooh, one potential wrinkle: All of the methods that create objects may also have to be overridden, since (for example) Graph.get_node_list is defined like this:

    def get_node_list(self) -> list[Node]:
        node_objs: list[Node] = []
        for node in self.obj_dict["nodes"]:
            obj_dict_list = self.obj_dict["nodes"][node]
            node_objs.extend([Node(obj_dict=obj_d) for obj_d in obj_dict_list])
        return node_objs

...and for pydot.legacy.Graph we want it to be creating and returning pydot.legacy.Node objects, not pydot.core.Node objects.

(This is probably a sign that e.g. pydot.core.Graph should have methods like new_node(), new_edge(), etc. that it uses to create objects, instead of calling the constructors directly in its get_* methods (or any others). That way, the new_* methods can be overridden in the legacy subclass, as well as any other derived classes that we or anyone else may have, to tweak internal object creation without needing to make any other changes.)

Edit: Actually, not even new_node(), that sounds like a factory method. Which this isn't, in the traditional sense. So let's call it make_node(), or even make_node_obj() if we want to be more explicit. A method which takes only one obj_dict argument, passes it to the appropriate constructor, and returns the result.

@lkk7
Copy link
Member

lkk7 commented Jul 25, 2025

I'd like to answer at length, but I agree with everything, so...
Yeah, it's a bit of work, and then there's the whole deprecation process. Need to make sure that we have clear deprecation warnings, and are prepared for annoyed users 😄 But this is one of the easiest migrations I've seen:

- import pydot
+ import pydot.legacy as pydot

So it's fine by me.

@ferdnyc
Copy link
Member Author

ferdnyc commented Jul 26, 2025

But this is one of the easiest migrations I've seen:

- import pydot
+ import pydot.legacy as pydot

If it works. I still have to puzzle through all of the implications of that, beyond the Graph class's uses of other class constructors.

Offhand, the entirety of dot_parser would also require some sort of "legacy mode switch" -- but I think that should be doable, especially now that pydot.dot_parser is no longer imported into pydot.core, but only imported on demand if graph_from_dot_data is called. That should make it easy for pydot.legacy.graph_from_dot_data to import a dot_parser configured to use pydot.legacy classes instead of pydot.core classes. (I hope.)

@ferdnyc
Copy link
Member Author

ferdnyc commented Jul 27, 2025

Wait... I'm an idiot.

Nothing about dot_parser has to change for pydot.legacy — the same extraction of obj_dicts and discarding of objects that makes subclassing a pain will actually save us, there. dot_parser can go right on creating its graphs using pydot.core.* objects, and all pydot.legacy.graph_from_dot_data has to be is this:

def graph_from_dot_data(s: str) -> list[pydot.legacy.Dot] | None:
    graphs = pydot.core.graph_from_dot_data(s)
    if graphs is not None:
        return [
            pydot.legacy.Dot(obj_dict=g.obj_dict)
            for g in graphs
        ]
    return graphs

Done.

Same for every other graph_from_... function, actually. We could probably use an API generator like we do for the various attribute- and format-specific methods, except for the fact that the graph functions all take different arguments. I don't think that would work with argument-heterogeneous functions, unless the wrappers were defined to just take (*args, **kwargs) — which I doubt MyPy could be made OK with.

Maybe a generator could use some inspect magic to extract the argument signature from the parent function, and copy it to the wrapper. I wonder if that would work?

Eh. I guess there's only six graph_from_... functions total (why does it feel like more?), including _from_dot_file and _from_dot_data. Probably not even worth it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants