-
Notifications
You must be signed in to change notification settings - Fork 258
Speed improvements for typing #439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This now only touches Python 3 version. If Jukka's results will be promising, then I will also backport this to Python 2. |
I'm also seeing a few % speedup running mypy on my codebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense as a speedup as long as you can convince me the semantics are unchanged.
""" | ||
assert isinstance(a, GenericMeta) and isinstance(b, GenericMeta) | ||
# Reduce each to its origin. | ||
return _gorg(a) is _gorg(b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is _gorb(b)
entirely redundant here? I see that you've inlined this as a._gorg is b
everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably this is just a coincidence, maybe it was different in older versions, but now b
(and its ._gorg
) is known "statically" in all places where _geqv
is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha!
src/typing.py
Outdated
new_bases = [] | ||
for base in bases: | ||
if isinstance(base, GenericMeta): | ||
bextra = getattr(base, '__extra__', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as we're micro-optimizing, this call is wasted when extra is falsey or origin is truthy (if I understand the condition below correctly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I reshuffled code here, so that we quickly escape if origin
is truthy and then if extra
is falsey.
src/typing.py
Outdated
if isinstance(base, GenericMeta): | ||
bextra = getattr(base, '__extra__', None) | ||
if (extra and bextra and not origin and bextra not in bases and | ||
type(bextra) is abc.ABCMeta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this is equivalent to the original code (I would have expected something simpler, matching _gorg(b) if isinstance(b, GenericMeta) else b
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is this is not totally equivalent. This makes more "aggressive" erasure for classes with __extra__
, this shortens MROs for special typing types. Compare:
Before:
>>> pprint(List[int].__mro__)
(typing.List[int],
typing.List,
<class 'list'>,
typing.MutableSequence,
<class 'collections.abc.MutableSequence'>,
typing.Sequence,
<class 'collections.abc.Sequence'>,
typing.Reversible,
<class 'collections.abc.Reversible'>,
typing.Collection,
<class 'collections.abc.Collection'>,
<class 'collections.abc.Sized'>,
typing.Iterable,
<class 'collections.abc.Iterable'>,
typing.Container,
<class 'collections.abc.Container'>,
typing.Generic,
<class 'object'>)
and after:
(typing.List[int],
typing.List,
<class 'list'>,
<class 'collections.abc.MutableSequence'>,
<class 'collections.abc.Sequence'>,
<class 'collections.abc.Reversible'>,
<class 'collections.abc.Collection'>,
<class 'collections.abc.Sized'>,
<class 'collections.abc.Iterable'>,
<class 'collections.abc.Container'>,
<class 'object'>)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually a bit worried maybe this is too aggressive, since sometimes we even lose Generic
, for example:
>>> pprint(MutableMapping.__mro__)
(typing.MutableMapping,
<class 'collections.abc.MutableMapping'>,
<class 'collections.abc.Mapping'>,
<class 'collections.abc.Collection'>,
<class 'collections.abc.Sized'>,
<class 'collections.abc.Iterable'>,
<class 'collections.abc.Container'>,
<class 'object'>)
Although all tests pass, and I tried to play with this and it works as expected (it is still an instance of GenericMeta
). Probably, I will add Generic
at the end of bases in such cases (just for consistency).
PS. I'm not sure how we would test this speedup with real code. Did you ask for us to have this patched into the Python version we use to run mypy and then run mypy over our large codebase? That makes sense except the time there is pretty variable (since it runs so long that all sorts of other activities on the same machine affect the timing). |
I am marking this as WIP until I figure out what is going on with Python 2.
Yes, even if it is not super-stable it would be great to have few more data points. |
I decided to keep only obvious optimizations, and leave more aggressive things (that might change semantics) out for now. I think you should be happy with this now. I will probably make another PR soon dealing with (excessively) long MROs. |
python2/typing.py
Outdated
|
||
# remove bare Generic from bases if there are other generic bases | ||
if any(isinstance(b, GenericMeta) and b is not Generic for b in bases): | ||
bases = tuple(b for b in bases if b is not Generic) | ||
namespace.update({'__origin__': origin, '__extra__': extra}) | ||
self = super(GenericMeta, cls).__new__(cls, name, bases, namespace) | ||
super(GenericMeta, self).__setattr__('_gorg', self if not origin | ||
else origin._gorg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whitespace nit: I don't love how this line is broken up; either split before the comma or indent the else
to align with the if
.
""" | ||
assert isinstance(a, GenericMeta) and isinstance(b, GenericMeta) | ||
# Reduce each to its origin. | ||
return _gorg(a) is _gorg(b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha!
@@ -1568,7 +1546,7 @@ def no_type_check(arg): | |||
if isinstance(arg, type): | |||
arg_attrs = arg.__dict__.copy() | |||
for attr, val in arg.__dict__.items(): | |||
if val in arg.__bases__: | |||
if val in arg.__bases__ + (arg,): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this added (arg,)
needed? And why not in the PY2 version? (If there's a good reason please add a comment.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is better to have them in both versions, @no_type_chek
can be also used in Python 2, fixed this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, it's needed because arg._gorg == arg
. Fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, it's needed because
arg._gorg == arg
Exactly. Sorry, I have answered only half of your question.
@@ -1568,7 +1546,7 @@ def no_type_check(arg): | |||
if isinstance(arg, type): | |||
arg_attrs = arg.__dict__.copy() | |||
for attr, val in arg.__dict__.items(): | |||
if val in arg.__bases__: | |||
if val in arg.__bases__ + (arg,): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, it's needed because arg._gorg == arg
. Fine.
I'm torn about whether to merge this. Maybe we should wait until 3.6.2 has actually been released? The RC is supposed to be cut today, I expect the final release in a few weeks (see PEP 494). |
This makes sense. We can keep the master "clean" (i.e. identical with current CPython state) for any emergency fixes (just in case). Then I can merge this right after the release of 3.6.2.final. |
OK, Python 3.6.2 was released earlier today, so that I am merging this now. |
Incorporates change made by python#439, which was recently merged.
This PR contains two performance improvements based on profiling and inspired by #432:
_gorg
and_geqv
This gives roughly 5% speed-up for
mypy mypy
. @JukkaL I will be grateful if you could do more benchmarking with some real-life large codebases.