Skip to content

Enable subscripting generic types in annotations. #401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

udifuchs
Copy link
Contributor

Some classes like lxml.etree._ElementTree are generic types that can contain different kinds of elements. The type of these elements has to be specified in type annotations. For example:

element_tree: lxml.etree._ElementTree[lxml.etree._Element]

This requires adding a __class_getitem__ class method to these class, as specified in PEP 560. This commit adds these class methods.

The list of generic types in lxml was taken from:

https://github.com/abelcheung/types-lxml/wiki/Using-specialised-class-directly

Some classes like lxml.etree._ElementTree are generic types that can contain
different kinds of elements. The type of these elements has to be specified
in type annotations. For example:

element_tree: lxml.etree._ElementTree[lxml.etree._Element]

This requires adding a __class_getitem__ class method to these class, as
specified in PEP 560. This commit adds these class methods.

The list of generic types in lxml was taken from:

https://github.com/abelcheung/types-lxml/wiki/Using-specialised-class-directly
@scoder
Copy link
Member

scoder commented Jan 25, 2024

Hmm. Thanks. I'm reluctant to apply this, because I'd rather not see users import non-public classes in their code. The internal element classes deliberately use underscore names to mark them as internal.

@udifuchs
Copy link
Contributor Author

I also don't like using non-public classes in type annotations. The problem is that currently we cannot use the public symbols because they are defined as functions and not as classes. Trying to use lxml.etree.Element in type annotations results in the mypy error:

Function "lxml.etree._factory_func.Element" is not valid as a type  [valid-type]

The _factory_func module is defined in the lxml stubs from https://github.com/abelcheung/types-lxml.
I am not sure if Element was defined as a function in the typing stub just to be compatible with lxml or if there was a deeper reason. In any case, if this is fixed and people start using ElementTree instead of _ElementTree in their type annotations, then this PR would become useless.

What do think about applying this patch to lxml/etree.pyx:

-def Element(_tag, attrib=None, nsmap=None, **_extra):
+class Element(_Element):
+  def __new__(cls, _tag, attrib=None, nsmap=None, **_extra):

It looks a bit disruptive, but all the tests pass. I will have to make similar changes to: Comment, SubElement, ElementTree, Entity, ProcessingInstruction. Then we could worry about updating the stubs.

@scoder
Copy link
Member

scoder commented Jan 27, 2024

Note that Element() is intentionally a factory function, not a specific class. The class implementation is decoupled from the instance creation, and the implementation classes are kept internal, except for the user facing ElementBase subclass that users can publicly inherit from.

Honestly, users could just avoid using generic types for this. After all, static typing is optional and designed to be incomplete across a code base. That's a good thing. Not everything needs to be typed. Most programming errors that exact typing of _Element versus its wide range of subclasses (including user defined ones) could detect are much more easily, accurately and safely found with testing.

I'm certainly not going to change anything in the API just to help users with statically typing their code. Static typing is not a design goal. Usability and abstraction are.

@udifuchs
Copy link
Contributor Author

In the types-lxml tests I see two uses for this generic:

  1. _ElementTree[_Element]
  2. _ElementTree[HTMLElement]

This seems like a reasonable use of generics. Otherwise, every time someone tries to access an HTML method they will have to cast _Element to HTMLElement to prevent mypy from complaining.

I'm certainly not going to change anything in the API just to help users with statically typing their code. Static typing is not a design goal. Usability and abstraction are.

From my experience, adding type annotations helps me design better code. If the type annotations become too convoluted, it is usually a sign that either I made some wrong assumptions in my types specifications or that the API is not optimal.

In the case of lxml, the typing stubs seem to suggest that _Element should be part of the public API.
ElementBase looks like a replacement, but it does not help because ElementBase is a subclass of _Element and not a base class.

@scoder
Copy link
Member

scoder commented Feb 12, 2024

I'm wondering if types-lxml would be better off defining Protocols instead of exposing the actual _Element etc. types. That's really what an Element is, it's a kind of tree object that has a certain behaviour.

@udifuchs
Copy link
Contributor Author

A typing.Protocol is essentially an abstract class. Therefore, it does not make sense to define:

class ElementProtocol(Protocol): ...
def Element() -> ElementProtocol: ...

because the return type of a function should not be an abstract class.

I just made another PR that transforms Element from a factory function into a class.
I am not convinced that this is the right approach, but it is easier to judge the idea with actual working code.

@scoder
Copy link
Member

scoder commented Feb 17, 2024

A typing.Protocol is essentially an abstract class. Therefore, it does not make sense to define:

class ElementProtocol(Protocol): ...
def Element() -> ElementProtocol: ...

because the return type of a function should not be an abstract class.

That's not how I understand Protocols. I see them as Python's form of interfaces – they define methods and their signatures and arbitrary classes can implement them, even without saying that they implement them. That's very much what lxml's XML element classes are: arbitrary objects that implement a common interface. What you get back from the parse/select/etc. functions and methods is some object, class not interesting, but with a well defined interface.

The more I think about this, the more it seems a perfect match for lxml's API. I created an issue in types-lxml: abelcheung/types-lxml#31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants