-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-92810: Reduce memory usage by ABCMeta.__subclasscheck__ #131914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Lib/_py_abc.py
Outdated
# >>> class Ancestor: __subclasses__ = lambda: [Other] | ||
# >>> class Other: pass | ||
# >>> isinstance(Other, Ancestor) is True | ||
# Do not iterate over cls.__subclasses__() because it returns the entire class tree, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. __subclasses__()
is meant to only return the direct children that are still alive according to the docs:
Each class keeps a list of weak references to its immediate subclasses. This method returns a list of all those references still alive. The list is in definition order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. There are cases.
Case 1:
class A:
pass
class B(A):
pass
assert A.__subclasses__() == [B]
assert B.__mro__ = (B, A, object)
Case 2:
class A:
__subclasses__ = lambda: [B]
class B:
pass
assert A.__subclasses__() == [B]
assert B.__mro__ = (B, object)
For normal classes, isinstancecheck
/issubclass
ignores __subclasses__
entirely. But for ABCMeta it leads to recursive __instancecheck__
/__subclasscheck__
calls in case 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But for ABCMeta it leads to recursive instancecheck/subclasscheck calls in case 1.
Unfortunately, and you said it in the issue, this makes some constructions fail and this is a breaking change, and a quite serious one. While the GitHub search wasn't fruitful, it may be because the class tree is extremely deep, and only the top class is an ABC. I don't think we're able to know in advance whether we should or shouldn't use __subclasses__
so I'm afraid we need another solution.
Here:
for scls in cls.__subclasses__():
if issubclass(subclass, scls):
cls._abc_cache.add(subclass)
return True
Observe that issubclass()
is checked multiple times, and issubclass
can itself call cls.__subclasses__
. Can we first check if subclass
is in the _abc_cache
before as it wouldn't call issubclass
? or can we use some recursive guard for issubclass(subclass, scls)
where we don't do the same tests? or maybe we can first try to check if cls
is in the MRO of subclass
(it might not be sufficient but is mro
already computed or not?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand that this is breaking, and cannot be accepted in current form. That's why this PR is in draft, until a solution can be found.
maybe we can first try to check if cls is in the MRO of subclass
I've tried this approach. It saves RAM because negative cache of all subclasses is not updated anymore. But scls.__mro__
computation takes time, and because this is done for every class in the inheritance tree (and result is not cached because class can override its own .mro()
method), this still O(n^2) solution which takes 25 seconds on tree with 6k classes (same as was before the change). This can be a temporal solution for the issue, but it is not a final one.
or can we use some recursive guard for issubclass(subclass, scls) where we don't do the same tests?
Unfortunately, issubclass(A, B)
does not allow passing custom arguments to __subclasscheck__(A, B)
method, to notify nested classes that some checks should be avoided.
Introducing cls._is_inside_subclasscheck: bool
variables doesn't work either, because it is set only for current class, and not for children. Iterating over cls.__subclasses__()
to update their own guard is both expensive and error-prone (as users may override methods).
This probably can be done by checking if parent class has a guard enabled/disabled, and we can use super()
for resolving it recursively. I haven't tried this approach yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've returned back __subclasses__
clause, and added a global context to ABCMeta class with a check. Now child class knows that isinstance
/issubclass
was called by some parent class. This allows to skip adding entries to child class _abc_cache
/_abc_negative_cache
, reducing memory footprint for 8k classes from almost 4Gb to just 60Mb.
As I'm new to Python's C API, I didn't yet get how to implement the same global context in _abc.c
. But let's first discuss if approach used in _py_abc.py
is valid or not.
Modules/_abc.c
Outdated
if (scls == NULL) { | ||
goto end; | ||
} | ||
int r = PyObject_IsSubclass(subclass, scls); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have a UAF here. PyObject_IsSubclass
can call __subclasscheck__
which can itseslf call arbitrary code so you might mutate subclasses
. The issue already exists with the existing code but can you confirm that we can indeed produce a UAF? (if you don't know how to do it, I'll try to investigate this separately tomorrow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you confirm that we can indeed produce a UAF?
Sorry, my C knowledge is very minimal, I don't know anything about this yet
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
3 similar comments
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Lib/_py_abc.py
Outdated
# >>> class Ancestor: __subclasses__ = lambda: [Other] | ||
# >>> class Other: pass | ||
# >>> isinstance(Other, Ancestor) is True | ||
# Do not iterate over cls.__subclasses__() because it returns the entire class tree, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But for ABCMeta it leads to recursive instancecheck/subclasscheck calls in case 1.
Unfortunately, and you said it in the issue, this makes some constructions fail and this is a breaking change, and a quite serious one. While the GitHub search wasn't fruitful, it may be because the class tree is extremely deep, and only the top class is an ABC. I don't think we're able to know in advance whether we should or shouldn't use __subclasses__
so I'm afraid we need another solution.
Here:
for scls in cls.__subclasses__():
if issubclass(subclass, scls):
cls._abc_cache.add(subclass)
return True
Observe that issubclass()
is checked multiple times, and issubclass
can itself call cls.__subclasses__
. Can we first check if subclass
is in the _abc_cache
before as it wouldn't call issubclass
? or can we use some recursive guard for issubclass(subclass, scls)
where we don't do the same tests? or maybe we can first try to check if cls
is in the MRO of subclass
(it might not be sufficient but is mro
already computed or not?)
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Signed-off-by: Martynov Maxim <martinov_m_s_@mail.ru>
Signed-off-by: Martynov Maxim <martinov_m_s_@mail.ru>
Signed-off-by: Martynov Maxim <martinov_m_s_@mail.ru>
Signed-off-by: Martynov Maxim <martinov_m_s_@mail.ru>
Signed-off-by: Martynov Maxim <martinov_m_s_@mail.ru>
abf4bfe
to
b7603e0
Compare
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
_abc._abc_subclasscheck
has very poor performance and (I think) a memory leak #92810