-
Notifications
You must be signed in to change notification settings - Fork 263
Description
We propose adding an Annotated
type to the typing module to decorate existing types with context-specific metadata. Specifically, a type T
can be annotated with metadata x
via the typehint Annotated[T, x]
. This metadata can be used for either static analysis or at runtime. If a library (or tool) encounters a typehint Annotated[T, x]
and has no special logic for metadata x
, it should ignore it and simply treat the type as T
. Unlike the no_type_check
functionality that current exists in the typing
module which completely disables typechecking annotations on a function or a class, the Annotated
type allows for both static typechecking of T
(e.g., via MyPy or Pyre, which can safely ignore x
) together with runtime access to x
within a specific application. We believe that the introduction of this type would address a diverse set of use cases of interest to the broader Python community.
Motivating examples:
READING binary data
The struct
module provides a way to read and write C structs directly from their byte representation. It currently relies on a string representation of the C type to read in values:
record = b'raymond \x32\x12\x08\x01\x08'
name, serialnum, school, gradelevel = unpack('<10sHHb', record)
The documentation suggests using a named tuple to unpack the values and make this a bit more tractable:
from collections import namedtuple
Student = namedtuple('Student', 'name serialnum school gradelevel')
Student._make(unpack('<10sHHb', record))
# Student(name=b'raymond ', serialnum=4658, school=264, gradelevel=8)
However, this recommendation is somewhat problematic; as we add more fields, it's going to get increasingly tedious to match the properties in the named tuple with the arguments in unpack
.
Instead, annotations can provide better interoperability with a type checker or an IDE without adding any special logic outside of the struct
module:
from typing import NamedTuple
UnsignedShort = Annotated[int, struct.ctype('H')]
SignedChar = Annotated[int, struct.ctype('b')]
@struct.packed
class Student(NamedTuple):
# MyPy typechecks 'name' field as 'str'
name: Annotated[str, struct.ctype("<10s")]
serialnum: UnsignedShort
school: SignedChar
gradelevel: SignedChar
# 'unpack' only uses the metadata within the type annotations
Student.unpack(record))
# Student(name=b'raymond ', serialnum=4658, school=264, gradelevel=8)
dataclasses
Here's an example with dataclasses that is a problematic from the typechecking standpoint:
from dataclasses import dataclass, field
@dataclass
class C:
myint: int = 0
# the field tells the @dataclass decorator that the default action in the
# constructor of this class is to set "self.mylist = list()"
mylist: List[int] = field(default_factory=list)
Even though one might expect that mylist
is a class attribute accessible via C.mylist
(like C.myint
is) due to the assignment syntax, that is not the case. Instead, the @dataclass
decorator strips out the assignment to this attribute, leading to an AttributeError
upon access:
C.myint # Ok: 0
C.mylist # AttributeError: type object 'C' has no attribute 'mylist'
This can lead to confusion for newcomers to the library who may not expect this behavior. Furthermore, the typechecker needs to understand the semantics of dataclasses and know to not treat the above example as an assignment operation in (which translates to additional complexity).
It makes more sense to move the information contained in field
to an annotation:
@dataclass
class C:
myint: int = 0
mylist: Annotated[List[int], field(default_factory=list)]
# now, the AttributeError is more intuitive because there is no assignment operator
C.mylist # AttributeError
# the constructor knows how to use the annotations to set the 'mylist' attribute
c = C()
c.mylist # []
The main benefit of writing annotations like this is that it provides a way for clients to gracefully degrade when they don't know what to do with the extra annotations (by just ignoring them). If you used a typechecker that didn't have any special handling for dataclasses and the field
annotation, you would still be able to run checks as though the type were simply:
class C:
myint: int = 0
mylist: List[int]
lowering barriers to developing new types
Typically when adding a new type, we need to upstream that type to the typing module and change MyPy, PyCharm, Pyre, pytype, etc. This is particularly important when working on open-source code that makes use of our new types, seeing as the code would not be immediately transportable to other developers' tools without additional logic (this is a limitation of MyPy plugins, which allow for extending MyPy but would require a consumer of new typehints to be using MyPy and have the same plugin installed). As a result, there is a high cost to developing and trying out new types in a codebase. Ideally, we should be able to introduce new types in a manner that allows for graceful degradation when clients do not have a custom MyPy plugin, which would lower the barrier to development and ensure some degree of backward compatibility.
For example, suppose that we wanted to add support for tagged unions to Python. One way to accomplish would be to annotate TypedDict in Python such that only one field is allowed to be set:
Currency = Annotated(
TypedDict('Currency', {'dollars': float, 'pounds': float}, total=False),
TaggedUnion,
)
This is a somewhat cumbersome syntax but it allows us to iterate on this proof-of-concept and have people with non-patched IDEs work in a codebase with tagged unions. We could easily test this proposal and iron out the kinks before trying to upstream tagged union to typing
, MyPy, etc. Moreover, tools that do not have support for parsing the TaggedUnion
annotation would still be able able to treat Currency
as a TypedDict
, which is still a close approximation (slightly less strict).
Details of proposed changes to typing
Syntax
Annotated
is parameterized with a type and an arbitrary list of Python values that represent the annotations. Here are the specific details of the syntax:
- The first argument to
Annotated
must be a validtyping
type - Multiple type annotations are supported (Annotated supports variadic arguments):
Annotated[int, ValueRange(3, 10), ctype("char")]
- When called with no extra arguments
Annotated
returns the underlying value:Annotated[int] == int
- The order of the annotations is preserved and matters for equality checks:
Annotated[int, ValueRange(3, 10), ctype("char")] != Annotated[int, ctype("char"), ValueRange(3, 10)]
- Nested
Annotated
types are flattened, with metadata ordered starting with the innermost annotation:Annotated[Annotated[int, ValueRange(3, 10)], ctype("char")] ==
Annotated[int, ValueRange(3, 10), ctype("char")]`` - Duplicated annotations are not removed:
Annotated[int, ValueRange(3, 10)] != Annotated[int, ValueRange(3, 10), ValueRange(3, 10)]
consuming annotations
Ultimately, the responsibility of how to interpret the annotations (if at all) is the responsibility of the tool or library encountering the Annotated
type. A tool or library encountering an Annotated
type can scan through the annotations to determine if they are of interest (e.g., using isinstance
).
Unknown annotations:
When a tool or a library does not support annotations or encounters an unknown annotation it should just ignore it and treat annotated type as the underlying type. For example, if we were to add an annotation that is not an instance of struct.ctype
to the annotation for name (e.g., Annotated[str, 'foo', struct.ctype("<10s")]
), the unpack method should ignore it.
Namespacing annotations:
We do not need namespaces for annotations since the class used by the annotations acts as a namespace.
Multiple annotations:
It's up to the tool consuming the annotations to decide whether the client is allowed to have several annotations on one type and how to merge those annotations.
Since the Annotated
type allows you to put several annotations of the same (or different) type(s) on any node, the tools or libraries consuming those annotations are in charge of dealing with potential duplicates. For example, if you are doing value range analysis you might allow this:
T1 = Annotated[int, ValueRange(-10, 5)]
T2 = Annotated[T1, ValueRange(-20, 3)]
Flattening nested annotations, this translates to:
T2 = Annotated[int, ValueRange(-10, 5), ValueRange(-20, 3)]
An application consuming this type might choose to reduce these annotations via an intersection of the ranges, in which case T2
would be treated equivalently to Annotated[int, ValueRange(-10, 3)]
.
An alternative application might reduce these via a union, in which case T2
would be treated equivalently to Annotated[int, ValueRange(-20, 5)]
.
In this example whether we reduce those annotations using union or intersection can be context dependant (covarient vs contravariant); this is why we have to preserve all of them and let the consumers decide how to merge them.
Other applications may decide to not support multiple annotations and throw an exception.
related bugs
- issues 482: Mixing typing and non-typing information in annotations has some discussion about this problem but none of the proposed solutions (using intersection types, passing dictionaries of annotations) seemed to garner enough steam. We hope this solution is non-intrusive and compelling enough to make it in the standard library.