Skip to content

gh-100926: Move ctype's pointers cache to StgInfo #131282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 63 commits into
base: main
Choose a base branch
from

Conversation

sergey-miryanov
Copy link
Contributor

@sergey-miryanov sergey-miryanov commented Mar 15, 2025

This is a bit straightfoward.
I keep the global cache for manually registered pairs of pointers and classes, particularly for comtypes (gh-124520, PyCSimpleTypeAsMetaclassTest).

@picnixz picnixz requested a review from encukou March 16, 2025 16:44
sergey-miryanov and others added 2 commits March 17, 2025 12:21
Co-authored-by: neonene <53406459+neonene@users.noreply.github.com>
Co-authored-by: neonene <53406459+neonene@users.noreply.github.com>
@encukou encukou removed the skip news label Mar 17, 2025
Copy link
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this needs a NEWS blurb. I'd even add a What's New entry -- _pointer_type_cache is nominally internal API, but it's been there for a long time and people are using it.

@sergey-miryanov
Copy link
Contributor Author

IMO, this needs a NEWS blurb. I'd even add a What's New entry -- _pointer_type_cache is nominally internal API, but it's been there for a long time and people are using it.

Thanks! I'll add both entries later today.

@sergey-miryanov
Copy link
Contributor Author

I have added news and whatsnew entries, but I believe we should update them after we decided what to do with _ctypes_ptrtype_cache behaviour.

@encukou
Copy link
Member

encukou commented Mar 18, 2025

As far as I can see, this numeric ID key is only used for a check, and I think that it's OK to remove a check for obsolete functionality from a deprecated function.

The feature is "incomplete pointers" -- ones that were created with a name, but weren't assigned an underlying type yet. See test_ctypes.test_incomplete.py for the intended usage. That test references "the tutorial", but the current tutorial doesn't include this -- instead, you can create incomplete structs for this use case.
Back when ctypes was an independent project, this started working in ctypes 0.6.3 (2003) and was made obsolete in 0.9.5 (2005). ctypes was added to Python 2.5 (2006), without SetPointerType mentioned in docs.


What I'm a bit worried about is the duplication between _pointer_type_cache and info->pointer_type. Which one should have priority? When one is updated, should the other one be updated too?
What kind of relationship is expected and/or guaranteed are between info->proto (ptr_type._type_) and info->pointer_type?

Should we expose the pointer type as a Python attribute?
If we do, with a setter to allow registration, could _pointer_type_cache be turned into a deprecated proxy for that? Something like:

        class PointerTypeCache:
            def __setitem__(self, cls, pointer_type):
                cls.pointer_type = pointer_type

            def __getitem__(self, cls):
                try:
                    return cls.pointer_type
                except AttributeError:
                    raise KeyError(cls)

            def __contains__(self, cls):
                return hasattr(cls, 'pointer_type')

        _pointer_type_cache = PointerTypeCache()

Would having all the info in one place be worth that?

Or am I overthinking it, and we should fix the immediate issue and merge this PR as is?
Entirely possible, but I'd like to have solid understanding first.

@sergey-miryanov
Copy link
Contributor Author

What I'm a bit worried about is the duplication between _pointer_type_cache and info->pointer_type. Which one should have priority? When one is updated, should the other one be updated too? What kind of relationship is expected and/or guaranteed are between info->proto (ptr_type._type_) and info->pointer_type?

AFAIU, if we introduce info->pointer_type then the only use for _pointer_type_cache is a manual registration like for comtypes, where POINTER for non-ctypes type created and registered in the cache. And for incomplete types as well. So, since we create POINTER type from our factory, it doesn't matter how _pointer_type_cache and info->pointer_type are related. They store different data. Also, I expect that ctypes POINTER types will not be registered in _pointer_type_cache and I see the only way to control this is to deprecate _pointer_type_cache and add a registration function, that should check where to store the POINTER type (like register_pointer_type(type, pointer_type) and if type is a non-ctypes then register their pointer in _pointer_type_cache, also same for incomplete types). So, I expect no regular ctypes pointer types will be stored in _pointer_type_cache.

I'm not that familiar with info->proto and ptr_type._type_. I need to dig deeper to answer this.

Should we expose the pointer type as a Python attribute? If we do, with a setter to allow registration, could _pointer_type_cache be turned into a deprecated proxy for that? Something like:

If we allow to poison regular Python classes with our pointer_type attribute, then it is worth idea. But this is will not work for incomplete types (cause cls will be str for all of them), we should strictly check that cls is not a builtin type and so on. I'm not sure it will be an easy way to go :)

Co-authored-by: neonene <53406459+neonene@users.noreply.github.com>
@neonene
Copy link
Contributor

neonene commented Mar 18, 2025

Just in my gut feeling, but would the following make sense?

  • Enhance PyCPointerType.set_type() (currently used in the deprecated SetPointerType()) to store pointer_type in stginfo.

  • Make _pointer_type_cache a dict subclass, where the __setitem__() uses the set_type(). The dict would be like {None: c_void_p}.

@sergey-miryanov
Copy link
Contributor Author

Just in my gut feeling, but would the following make sense?

  • Enhance PyCPointerType.set_type() (currently used in the deprecated SetPointerType()) to store pointer_type in stginfo.
  • Make _pointer_type_cache a dict subclass, where the __setitem__() uses the set_type(). The dict would be like {None: c_void_p}.

Yeah, this is like @encukou proposed for PointerTypeCache with addition of mapping interface that would be useful for introspection for example. But please see my reply about incomplete types and non-ctypes types above.

@encukou
Copy link
Member

encukou commented Mar 19, 2025

As far as I can see, only three kinds of keys are acceptable in _pointer_type_cache:

  • a ctypes type, which do have stginfo. Note that this includes comtypes (those inherit from ctypes)
  • an integer -- the id of an incomplete pointer. AFAICS, such an entry in _pointer_type_cache is only used as a “flag” to allow SetPointerType to work; this info might as well be stored in stginfo instead. (I imagine it's equivalent to info->proto == NULL, but I haven't checked for further complications.)
  • None, a special case that can be handled elsewhere

Is there a use case for putting something else in _pointer_type_cache? If we can't find one, let's not support it. We don't want to break comtypes and such (i.e. users who don't have another way of doing what they need), but on the other hand, _pointer_type_cache is an internal implementation detail that's subject to change.

Make _pointer_type_cache a dict subclass

A dict-like object, please. Subclassing dict is too fragile, plus this proxy this can't support iteration.


There's one more aspect to think about: the registration that comtypes does isn't really the right thing for comtypes to do.

comtypes needs to add some functionality to all of its types, so it defines subclasses of the ctypes (meta)classes. But, it also wants to add some functionality to pointers to its types, and pointers are normally created on demand.
So, for every type ctypes defines, it also defines a pointer type, and registers that. It only does that for one “level” though: a pointer to a pointer to a comtype is a plain old ctypes pointer. comtypes could register another level, but not infintely many of them.
IMO, the proper solution here would be to add a make_pointer_type method to the metaclasses, which comtypes could override, so that its pointer types are created on demand using the regular mechanism.
This would make registration unnecessary for comtypes.

I can't rule out another use case for registration, but I don't see one. Perhaps we can deprecate _pointer_type_cache and plan to remove it entirely unless someone speaks up.
(Deprecation/removal would also be easier if it becomes a simple proxy for set_type & co., rather than a source of truth.)

@sergey-miryanov
Copy link
Contributor Author

I'm very new to the internals of ctypes, but I agree with three of the use cases of _pointer_type_cache that you mentioned.
One comment: I may not have checked correctly. I remember that when I analyzed PyCSimpleTypeAsMetaclassTest, I saw that a non-ctypes cls was passed to POINTER.
As far as I can tell, this is because the code ptr_bases = (self, POINTER(bases[0]) called with CtBase that hasn't StgInfo. I'm not sure that I understand how we can replace this with regular pointer type creation and avoid registration.

@encukou
Copy link
Member

encukou commented Mar 19, 2025

Ah! Looks like you're correct, thanks!

I'm not on a Windows box now to check.
I believe CtBase mirrors comtypes' IUnknown, which has an interesting comment:

        `IUnknown` behaves as a ctypes type, and `POINTER` can take it.
        This behavior is defined by some metaclasses in runtime.

So, IUnknown is a ctypes type in spirit, but not in reality :)

We could expose info->pointer_type as a reserved Python attribute __pointer_type__, and for classes without stginfo, look up the attribute. Do you think that's worth pursuing?
I don't think we need to worry builtin types (and others that disable setting attributes).

Some concerns about having _pointer_type_cache store only the manually registered types:

  • lookups in _pointer_type_cache, as a way to check if a pointer type is assigned (without creating the type if it's not), stop working.
  • iteration over _pointer_type_cache will give fewer results.

But, given that it is an internal implementation detail, perhaps that's acceptable.

Copy link
Contributor

@neonene neonene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can your tests have a comment to explain the purpose? Also, can you add the tests below?

    def test_custom_pointer_cache_for_ctypes_type1(self):
        # Check if PyCPointerType.__init__() caches a pointer type
        # customized in the metatype's __new__().
        class PointerMeta(PyCPointerType):
            def __new__(cls, name, bases, namespace):
                namespace["_type_"] = C
                return super().__new__(cls, name, bases, namespace)

            def __init__(self, name, bases, namespace):
                assert C.__pointer_type__ is None
                super().__init__(name, bases, namespace)
                assert C.__pointer_type__ is self

        class C(c_void_p):  # ctypes type
            pass

        class P(ctypes._Pointer, metaclass=PointerMeta):
            _type_ = "dummy"

        self.assertIs(P._type_, C)
        self.assertIs(P, POINTER(C))

    def test_custom_pointer_cache_for_ctypes_type2(self):
        # Check if PyCPointerType.__init__() caches a pointer type
        # customized in the metatype's __init__().
        class PointerMeta(PyCPointerType):
            def __init__(self, name, bases, namespace):
                self._type_ = namespace["_type_"] = C
                assert C.__pointer_type__ is None
                super().__init__(name, bases, namespace)
                assert C.__pointer_type__ is self

        class C(c_void_p):  # ctypes type
            pass

        class P(ctypes._Pointer, metaclass=PointerMeta):
            _type_ = "dummy"

        self.assertIs(P._type_, C)
        self.assertIs(P, POINTER(C))

@neonene
Copy link
Contributor

neonene commented Apr 2, 2025

Consider the docstring for __pointer_type__, where we could possibly provide internal information. There is also a possibility that __pointer_type__ would be non-public.

@sergey-miryanov
Copy link
Contributor Author

Consider the docstring for __pointer_type__, where we could possibly provide internal information. There is also a possibility that __pointer_type__ would be non-public.

I'm afraid I'm out of words 😢

Copy link
Contributor

@neonene neonene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relying on the deprecation/error messages of the dict proxy, we could reconsider the documentation.

@junkmd
Copy link
Contributor

junkmd commented Apr 5, 2025

I think this PR should have the topic-ctypes label attached.

Copy link
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Sorry for the delay. I am back now.

I started reviewing; here are some initial comments.

Unfortunately, feature freeze is next week. I think this could still make it though. Let me know if I should take over.

@sergey-miryanov
Copy link
Contributor Author

@encukou Thanks for review. I have fixed your suggestions. Please take a look.

Comment on lines +308 to +309
except AttributeError:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly -1. This just denies the purpose of AttributeError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure. Previous implementation allows silently rewrite entry in the _pointer_type_cache. If we raise exception here it may break some code (maybe it is worth to do). If we silently swallow exception, then it may add some strange heisenbugs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, is there any reason why the C setter is not the right place to supress the error?

Copy link
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sent a few more suggestions as a PR: sergey-miryanov#1

  • cache old-style incomplete types, with a "fallback cache" used when __pointer_type__ is unavailable
  • allow setting __pointer_type__ to arbitrary values (_ctypes doesn't care)

Co-authored-by: Petr Viktorin <encukou@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants