Skip to content

gh-92810: Reduce memory usage by ABCMeta.__subclasscheck__ #131914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

dolfinus
Copy link

@dolfinus dolfinus commented Mar 30, 2025

@bedevere-app
Copy link

bedevere-app bot commented Mar 30, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Lib/_py_abc.py Outdated
# >>> class Ancestor: __subclasses__ = lambda: [Other]
# >>> class Other: pass
# >>> isinstance(Other, Ancestor) is True
# Do not iterate over cls.__subclasses__() because it returns the entire class tree,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. __subclasses__() is meant to only return the direct children that are still alive according to the docs:

Each class keeps a list of weak references to its immediate subclasses. This method returns a list of all those references still alive. The list is in definition order.

Copy link
Author

@dolfinus dolfinus Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. There are cases.

Case 1:

class A:
  pass

class B(A):
  pass

assert A.__subclasses__() == [B]
assert B.__mro__ = (B, A, object)

Case 2:

class A:
  __subclasses__ = lambda: [B]

class B:
  pass

assert A.__subclasses__() == [B]
assert B.__mro__ = (B, object)

For normal classes, isinstancecheck/issubclass ignores __subclasses__ entirely. But for ABCMeta it leads to recursive __instancecheck__/__subclasscheck__ calls in case 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for ABCMeta it leads to recursive instancecheck/subclasscheck calls in case 1.

Unfortunately, and you said it in the issue, this makes some constructions fail and this is a breaking change, and a quite serious one. While the GitHub search wasn't fruitful, it may be because the class tree is extremely deep, and only the top class is an ABC. I don't think we're able to know in advance whether we should or shouldn't use __subclasses__ so I'm afraid we need another solution.

Here:

for scls in cls.__subclasses__():
    if issubclass(subclass, scls):
        cls._abc_cache.add(subclass)
        return True

Observe that issubclass() is checked multiple times, and issubclass can itself call cls.__subclasses__. Can we first check if subclass is in the _abc_cache before as it wouldn't call issubclass? or can we use some recursive guard for issubclass(subclass, scls) where we don't do the same tests? or maybe we can first try to check if cls is in the MRO of subclass (it might not be sufficient but is mro already computed or not?)

Copy link
Author

@dolfinus dolfinus Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand that this is breaking, and cannot be accepted in current form. That's why this PR is in draft, until a solution can be found.

maybe we can first try to check if cls is in the MRO of subclass

I've tried this approach. It saves RAM because negative cache of all subclasses is not updated anymore. But scls.__mro__ computation takes time, and because this is done for every class in the inheritance tree (and result is not cached because class can override its own .mro() method), this still O(n^2) solution which takes 25 seconds on tree with 6k classes (same as was before the change). This can be a temporal solution for the issue, but it is not a final one.

or can we use some recursive guard for issubclass(subclass, scls) where we don't do the same tests?

Unfortunately, issubclass(A, B) does not allow passing custom arguments to __subclasscheck__(A, B) method, to notify nested classes that some checks should be avoided.

Introducing cls._is_inside_subclasscheck: bool variables doesn't work either, because it is set only for current class, and not for children. Iterating over cls.__subclasses__() to update their own guard is both expensive and error-prone (as users may override methods).

This probably can be done by checking if parent class has a guard enabled/disabled, and we can use super() for resolving it recursively. I haven't tried this approach yet.

Copy link
Author

@dolfinus dolfinus Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've returned back __subclasses__ clause, and added a global context to ABCMeta class with a check. Now child class knows that isinstance/issubclass was called by some parent class. This allows to skip adding entries to child class _abc_cache/_abc_negative_cache, reducing memory footprint for 8k classes from almost 4Gb to just 60Mb.

As I'm new to Python's C API, I didn't yet get how to implement the same global context in _abc.c. But let's first discuss if approach used in _py_abc.py is valid or not.

Modules/_abc.c Outdated
if (scls == NULL) {
goto end;
}
int r = PyObject_IsSubclass(subclass, scls);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a UAF here. PyObject_IsSubclass can call __subclasscheck__ which can itseslf call arbitrary code so you might mutate subclasses. The issue already exists with the existing code but can you confirm that we can indeed produce a UAF? (if you don't know how to do it, I'll try to investigate this separately tomorrow)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you confirm that we can indeed produce a UAF?

Sorry, my C knowledge is very minimal, I don't know anything about this yet

@bedevere-app
Copy link

bedevere-app bot commented Mar 31, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

3 similar comments
@bedevere-app
Copy link

bedevere-app bot commented Mar 31, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@bedevere-app
Copy link

bedevere-app bot commented Mar 31, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@bedevere-app
Copy link

bedevere-app bot commented Mar 31, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Lib/_py_abc.py Outdated
# >>> class Ancestor: __subclasses__ = lambda: [Other]
# >>> class Other: pass
# >>> isinstance(Other, Ancestor) is True
# Do not iterate over cls.__subclasses__() because it returns the entire class tree,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for ABCMeta it leads to recursive instancecheck/subclasscheck calls in case 1.

Unfortunately, and you said it in the issue, this makes some constructions fail and this is a breaking change, and a quite serious one. While the GitHub search wasn't fruitful, it may be because the class tree is extremely deep, and only the top class is an ABC. I don't think we're able to know in advance whether we should or shouldn't use __subclasses__ so I'm afraid we need another solution.

Here:

for scls in cls.__subclasses__():
    if issubclass(subclass, scls):
        cls._abc_cache.add(subclass)
        return True

Observe that issubclass() is checked multiple times, and issubclass can itself call cls.__subclasses__. Can we first check if subclass is in the _abc_cache before as it wouldn't call issubclass? or can we use some recursive guard for issubclass(subclass, scls) where we don't do the same tests? or maybe we can first try to check if cls is in the MRO of subclass (it might not be sufficient but is mro already computed or not?)

@python-cla-bot
Copy link

python-cla-bot bot commented Apr 6, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Apr 21, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Signed-off-by: Martynov Maxim <martinov_m_s_@mail.ru>
Signed-off-by: Martynov Maxim <martinov_m_s_@mail.ru>
Signed-off-by: Martynov Maxim <martinov_m_s_@mail.ru>
Signed-off-by: Martynov Maxim <martinov_m_s_@mail.ru>
Signed-off-by: Martynov Maxim <martinov_m_s_@mail.ru>
@dolfinus dolfinus force-pushed the improvement/ABCMeta_subclasscheck branch from abf4bfe to b7603e0 Compare April 21, 2025 11:03
@bedevere-app
Copy link

bedevere-app bot commented Apr 21, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@bedevere-app
Copy link

bedevere-app bot commented Apr 21, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@bedevere-app
Copy link

bedevere-app bot commented Apr 21, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@dolfinus dolfinus changed the title gh-92810: Avoid O(n^2) complexity in ABCMeta.__subclasscheck__ gh-92810: Reduce memory usage by ABCMeta.__subclasscheck__ Apr 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants