Skip to content

Add sys.set_object_tags() and sys.get_object_tags() APIs for debugging and experimental Use #134819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
corona10 opened this issue May 28, 2025 · 25 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@corona10
Copy link
Member

corona10 commented May 28, 2025

Background

CPython currently exposes several internal implementation details via APIs such as:

  • sys._defer_refcount
  • sys._is_immortal
  • sys._is_interned

These APIs leak implementation-specific details and create implicit expectations of cross-version(e.g 3.13, 3.14, 3.15...) and cross-implementation compatibility(e.g CPython, PyPy, RustPython), even though they are not part of any formal public API contract.

For example, other Python implementations may not support or emulate these features, and yet their presence in CPython can create unintentional backward compatibility burdens when new releases are made.

Proposal

To address this, I would like to propose introducing two weak introspection APIs in the sys module:

sys.set_tags(obj, ["defer_refcount", "tag2"]) -> None
sys.get_tags(obj) -> tuple

sys.set_tags(obj, tags: Iterable[str]) -> None

  • Sets optional "tags" on an object.
  • Tags are hints for the Python implementation and are not guaranteed to be applied or have any effect.
  • The implementation may accept or ignore any or all provided tags.
  • These tags are advisory only, intended primarily for debugging, experimentation, and tooling used by Python implementation developers.

sys.get_tags(obj) -> tuple[str, ...]

  • Returns the tags currently associated with the object.
  • These reflect only the tags actually recognized and retained by the interpreter.
  • For example:
    sys.set_tags(o, ["defer_refcount", "tag2"])
    print(sys.get_tags(o))  # May return: ('defer_refcount',)
  • If the object is already immortal due to previous operations, you might see:
    sys.get_tags(o)  # May return: ('defer_refcount', 'immortal')

Goals and Non-Goals

Goals:

  • Provide a mechanism to annotate or mark objects for introspection/debugging.
  • Allow developers of Python implementations or advanced tools to experiment with internal object states in a controlled manner.

Non-Goals:

  • These APIs are not intended to be stable or relied upon for program behavior.
  • No tag is guaranteed to have any effect or to be preserved between runs, interpreter versions, or across implementations.

Documentation and Guarantees

We will clearly document that:

  • These APIs are for Python implementation developers only.
  • The presence or absence of any particular tag does not imply any behavioral guarantees.
  • Tags may be implementation-specific, and unsupported tags will be silently ignored.
  • Maybe possible to provide Python-specific tags in somewhere but should note that it will not be guarantee according to versions

cc @ZeroIntensity @vstinner @Fidget-Spinner @colesbury

Linked PRs

@corona10 corona10 changed the title Add sys.set_tags() and sys.get_tags() APIs for Debugging and Experimental Use Add sys.set_tags() and sys.get_tags() APIs for debugging and experimental Use May 28, 2025
@corona10
Copy link
Member Author

corona10 commented May 28, 2025

FYI, I am even fine with sys._set_tags and sys._get_tags, but I believe that it would be better than providing every Python API per implementations.

@corona10
Copy link
Member Author

And please let me know if there are better namings

@emmatyping emmatyping added type-feature A feature request or enhancement interpreter-core (Objects, Python, Grammar, and Parser dirs) labels May 28, 2025
@corona10 corona10 self-assigned this May 28, 2025
@ZeroIntensity
Copy link
Member

I like the general idea, but I have a few notes/concerns:

  • Should set_tags be plural? I'd think that in most cases, you'd want to set one tag only.
  • set_tags seems too misleading if the interpreter is allowed to ignore it. If I see anything called "set", I'd expect it to actually set something upon calling it. How about request_tags?
  • get_tags/set_tags only covers implementation details for objects themselves. They won't work for experimental APIs that need parameters.

An alternative could be to properly expose unstable APIs like we do with PyUnstable in the C API. Maybe something like sys.unstable_defer_refcount, or an unstable module (from unstable.sys import defer_refcount) could work.

@corona10
Copy link
Member Author

An alternative could be to properly expose unstable APIs like we do with PyUnstable in the C API. Maybe something like sys.unstable_defer_refcount, or an unstable module (from unstable.sys import defer_refcount) could work.

I no longer like adding more such APIs. The basic concept of this API is not making CPython a specific implementation's API anymore. It will break other implementations and cause compatibility issues. unstable.sys will not solve the current situtations.

Should set_tags be plural? I'd think that in most cases, you'd want to set one tag only.

Well, API will not care about whether the user adding multiple attributes and singe attribute anyway.

set_tags seems too misleading if the interpreter is allowed to ignore it. If I see anything called "set", I'd expect it to actually set something upon calling it. How about request_tags?

I don't care about the naming, I thought that get/set is conventional naming. For me, this is matter of documentation and I still think that people should not use this API as much as possible.

get_tags/set_tags only covers implementation details for objects themselves. They won't work for experimental APIs that need parameters

Would you like to provide a concrete example? Currently, we only care about defer_refcount and immortal, so I didn't think about it. Well, we could change the signature of set_tags to be set_tag and make it receive parameters.

@ZeroIntensity
Copy link
Member

I no longer like adding more such APIs. The basic concept of this API is not making CPython a specific implementation's API anymore. It will break other implementations and cause compatibility issues. unstable.sys will not solve the current situtations.

I'm worried that get_tags isn't much better. If someone were to write something like this, it would not be portable to other implementations:

def do_something_to_constant(op):
    # Use immortality as a notion of constant-ness
    if "immortal" not in sys.get_tags(op):
        raise ValueError()

Couldn't other implementations just implement _is_interned or whatever as just return False?

Would you like to provide a concrete example?

What if we wanted to provide an API for object flags someday?

It's also not totally clear to me if get_tags/set_tags is supposed to cover general object implementation details (e.g., immortality and DRC), or something specific to a type (e.g., string interning).

@corona10
Copy link
Member Author

I'm worried that get_tags isn't much better. If someone were to write something like this, it would not be portable to other implementations:

The key point is where the focus lies. If you care about portability, then you shouldn’t rely on unstable or implementation-specific APIs, which may not exist in other versions or implementations. However, sys.get_tags itself will be available consistently across implementations and versions. As I mentioned earlier, we don’t guarantee the specific output — and if a third-party library depends on certain tags being present, that’s their responsibility.

Consider the case where we want to remove sys._is_immortal(). With sys.get_tags, we can simply stop returning the "immortal" tag — the code using it won’t break; only the implementation detail changes, which is exactly what we want. On the other hand, sys._is_immortal() is a different story: in some cases, we might have to keep the API even if we no longer want to support it.

@ZeroIntensity
Copy link
Member

Consider the case where we want to remove sys._is_immortal(). With sys.get_tags, we can simply stop returning the "immortal" tag — the code using it won’t break; only the implementation detail changes, which is exactly what we want. On the other hand, sys._is_immortal() is a different story: in some cases, we might have to keep the API even if we no longer want to support it.

Ok, that makes sense.

The place where I'm getting a little tripped up is that the whole point of the _ prefix was that we could remove it any version--it's supposed to be a private API, we just document it and thus shift the maintenance responsibility to users. I don't see it as much different than using a private method (prefixed with _). Why doesn't that work?

@corona10
Copy link
Member Author

What if we wanted to provide an API for object flags someday?

I'm open to making the API design more flexible, but we should still try to avoid exposing implementation details whenever possible. So, should we plan to support object flags in the future? The reason I mention this is that we can not cover all cases :)

How about sys.set_tag(obj, tag: str, *, options: dict[str, Any] = {}) -> None this?

@corona10
Copy link
Member Author

The place where I'm getting a little tripped up is that the whole point of the _ prefix was that we could remove it any version--it's supposed to be a private API, we just document it and thus shift the maintenance responsibility to users. I don't see it as much different than using a private method (prefixed with _). Why doesn't that work?

See: #134762 (comment), this is a real-world example.
There are also several alternative Python implementations, such as PyPy, GraalPython, and RustPython, which often copy parts of the CPython implementation and adapt them to their own runtimes. Introducing this API would help reduce their catch up burden and make the CPython runtime less tied to specific implementation details :)

@ZeroIntensity
Copy link
Member

I was under the impression that it'd be totally fine to remove sys._getframe, we just won't in practice because frames are exposed in other public APIs (e.g., inspect.currentframe). I think we might just need some additional rules on when something is private (or "unstable") and not.

@vstinner
Copy link
Member

Please rename your API to get/set_object_tags(). get_tags() name is too generic: tags of what?

@corona10
Copy link
Member Author

Please rename your API to get/set_object_tags(). get_tags() name is too generic: tags of what?

Looks good!

@vstinner vstinner changed the title Add sys.set_tags() and sys.get_tags() APIs for debugging and experimental Use Add sys.set_object_tags() and sys.get_object_tags() APIs for debugging and experimental Use May 28, 2025
@vstinner
Copy link
Member

I would prefer sys.set_tags(obj, defer_refcount=True) API. So it would be possible to clear an hypothetical future tag using sys.set_tags(obj, future_tag=False). Or set different values than just True/False: sys.set_tags(obj, future_tag=123).

And get_tags() should return a dictionary with values, like: get_object_tags(obj) -> {'defer_refcount': True, 'interned: True}.

By the way, is it possible to mark a string as interned with your API? Something like: sys.set_tags(obj, interned=True). Does it fail with non-string objects?

I suppose that sys.set_tags(obj, immortal=True) cannot be implemented, or maybe it should work on immortal objects and fail on non-immortal objects?

@Fidget-Spinner
Copy link
Member

I would prefer sys.set_tags(obj, defer_refcount=True) API.

What do we do if we remove the tag in the future though? E.g. if one day we remove defer_refcount. Wouldn't that break the API?

@vstinner
Copy link
Member

vstinner commented May 28, 2025

Should we include GC "tags" (related to the PyGC_Head structure) in get_object_tags()?

  • gc_tracked (bool): similar to gc.is_tracked(obj)
  • finalized (bool): similar to gc.is_finalized(obj)

Not sure about set_object_tags(). Should it be possible to track/untrack using set_object_tags()? It sounds dangerous, maybe don't allow that.

And what about other low-level object attributes?

  • managed_dict (bool): test type(obj).tp_flags & Py_TPFLAGS_MANAGED_DICT
  • managed_weakref (bool): test type(obj).tp_flags & Py_TPFLAGS_MANAGED_WEAKREF
  • inline_values (bool): test type(obj).tp_flags & Py_TPFLAGS_INLINE_VALUES

Would it be interesting to expose these tags? They cannot be modified by set_object_tags().

@corona10
Copy link
Member Author

corona10 commented May 28, 2025

Would it be interesting to expose these tags? They cannot be modified by set_object_tags().

Even if tags that can not be set by set_object_tags(), get_object_tags can return such tags.
It doesn't need to be 1:1. It's just for checking object status.

@corona10
Copy link
Member Author

I would prefer sys.set_tags(obj, defer_refcount=True) API. So it would be possible to clear an hypothetical future tag using

I still prefer to use string-based tag since those keywords would be meaningless for other implementations.

@corona10
Copy link
Member Author

Should we include GC "tags" (related to the PyGC_Head structure) in get_object_tags()?

I am not sure about these, if we think that gc.is_tracked(obj) and gc.is_finalized(obj) are already public API and widely used.
Let's focus on internal implementation details first. And if it occurs that some tags are frequently used all over the place maybe we can promote specific API as public like gc.is_tracked(obj) and gc.is_finalized(obj)

@colesbury
Copy link
Contributor

colesbury commented May 28, 2025

I think we'd be better of with dedicated APIs for the different pieces of functionality. I think combining lots of functionality into a single Swiss army knife style API isn't good design: it makes it harder to use, harder to discover, harder to implement correctly, and harder to add new features. To quote Robert C. Martin: "Functions should do one thing. They should do it well. They should do it only."

I am not convinced that this style of design makes it any easier for implementers of alternate Python runtimes.

It is incredibly easy to support APIs like sys._defer_refcount, sys._is_immortal, sys._is_interned even if the runtime doesn't support deferred reference counting, immortality, and interning: just do nothing (for sys._defer_refcount) or return False, which is exactly what we would do in the default build for sys._defer_refcount.

On the other hand, the semantics of sys.set_tags and sys.get_tags is more confusing for both implementers and users: are unknown tags ignored? Okay, now it's easy to introduce bugs due to minor spelling errors in strings. Are unknown tags preserved between sys.set_tags and sys.get_tags? (My guess is no.)

@colesbury
Copy link
Contributor

To expand on my previous comment: if you want the APIs to be clearly labeled as experimental or implementation details, stick them in a namespace whose name that makes that clear.

@vstinner
Copy link
Member

I concur with @colesbury, I also prefer multiple functions rather than a single one (two in practice).

@corona10
Copy link
Member Author

I think we'd be better of with dedicated APIs for the different pieces of functionality. I
On the other hand, the semantics of sys.set_tags and sys.get_tags is more confusing for both implementers and users

Most of case this is correct approach and rasonable concern, but intention of this API is not recommending people to use. The user of this API would be limited and we prefer to use our internal purpose which can allow to change anything without any burnden.

@corona10
Copy link
Member Author

On the other hand, the semantics of sys.set_tags and sys.get_tags is more confusing for both implementers and users

Once the API is added most of implementer does not need to care about it. Most of case it will set anything or return nothing.

For the user, I believe that they should not rely on this API and even for the current usage.

@corona10
Copy link
Member Author

Okay, now it's easy to introduce bugs due to minor spelling errors in strings.

Yeah, that’s true. But why should we care about people who rely on private APIs like sys._xxx?

If the current stance is that these sys._xxx APIs can be removed freely without documentation, that would make things much less stressful.
But when I look at this PR (#134762), I wonder, why are we trying to expose and document them?
Why are we shooting ourselves in the foot by making implementation details public?

@corona10
Copy link
Member Author

My motivation for this API is that if exposing implementation details is inevitable, then why aren't we exposing a sandbox API that can be easily managed, and nothing needs to be guaranteed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

6 participants