Skip to content

[mypyc] feat: true_dict_rprimitive for optimized dict fastpath usage #19499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

BobTheBuidler
Copy link
Contributor

@BobTheBuidler BobTheBuidler commented Jul 24, 2025

This draft PR adds a true_dict_rprimitive which allows mypyc to differentiate between an actual dict instance and a subclass of dict.

Since this impacts any function that builds kwargs, plus dict comprehensions and other things, the special casing seems reasonable.

I chose not to map builtins.dict to the new rprimitive because we cannot guarantee types of input arguments. I intended to ONLY use it in instances where the dict type is certain. I'm not sure if this decision is correct or not, looking for opinions.

There is one test failure I still need to figure out, but I'd appreciate any feedback on the direction.

Copy link
Contributor

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

@brianschubert
Copy link
Member

(Just a passing remark, I haven't looked at the actual change)

I think "exact dict" would be a better term than "true dict" to describe instances of dict not including instances of subtypes of dict. Similar terminology is used in Python's C API, e.g. PyDict_CheckExact.

@BobTheBuidler
Copy link
Contributor Author

Should we maybe just rename dict_rprimitive to dict_subclass_rprimitive?

I noticed while coding this PR that it isn't mapped the same way as the other rprimitive types, and this is not clear from the definition in rtypes, so maybe renaming that var and then changing true_dict_rprimitive to dict_rprimitive is the optimal choice.

@BobTheBuidler
Copy link
Contributor Author

BobTheBuidler commented Jul 26, 2025

I decided to go ahead and map builtins.dict to the new primitive, and updated the naming as per @brianschubert recommendation. Everything seems to be working except for some issue with defaultdict. For some reason, defaultdict is now also mapped to the new dict primitive. but my if check is very clearly name== "builtins.dict" which should not be the case for defaultdict?

If I remove my new line from the mapper, which leaves "builtins.dict" mapped to the existing primitive, all tests turn green.

If we could decide we want the check, we need to figure out why mypy is treating defaultdict like a non-subclass dict. One of you guys might already know why. Maybe mypy handles defaultdict uniquely. It might be the case that this is not possible to workaround in a reasonable fashion.

I need an opinion from a maintainer on the best way to proceed here. I could remove the new check from the mapper and we could merge this in basically now.

@BobTheBuidler BobTheBuidler marked this pull request as ready for review July 26, 2025 23:01
@@ -716,6 +727,25 @@ static inline char CPyDict_CheckSize(PyObject *dict, Py_ssize_t size) {
return 1;
}

// Unsafe because it assumes dict is actually a dict.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure why this goes here, but I just stuck it with the other version of the same

@BobTheBuidler
Copy link
Contributor Author

This PR apparently shaves about 0.4% off of self-check

sterliakov/mypy-issues#131

@BobTheBuidler
Copy link
Contributor Author

Would it make this one easier to review if I split it up? I know this is my largest PR so far, and its a bit much to review, but its also the most impactful since so many things go thru dictionaries.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a full review, but left some comments. I've thought about doing something similar myself, since dict operations are very common and often we can figure out that dict subclasses aren't possible.

rest = r11
r12 = PyDict_DelItem(r11, r2)
r13 = r12 >= 0 :: signed
r12 = cast(dict, r11)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems redundant. An exact dict should be a runtime subtype of dict (see is_runtime_subtype)?

dict_rprimitive: Final = RPrimitive("builtins.dict", is_unboxed=False, is_refcounted=True)
"""A primitive that represents instances of builtins.dict or subclasses of dict."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: We don't use docstrings with variables.

# Python dict object (or an instance of a subclass of dict).
# Python dict object.
exact_dict_rprimitive: Final = RPrimitive(
"builtins.dict[exact]", is_unboxed=False, is_refcounted=True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could generalize this to other types by adding an is_exact attribute to RPrimitive. Not sure whether it's worth it though -- dict is likely the one that will likely benefit the most from this.

Copy link
Contributor Author

@BobTheBuidler BobTheBuidler Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when I was setting this up originally, I noticed that dict is unique in that we specialize operations on subclasses by considering all of them instances of dict_rprimitive. We don't currently do that for subclasses of other primitive types, but with the addition of this flag we easily could.

I think that reason alone might make it worthwhile. Then we can support faster ops for cases like the first class in the below example, without breaking the second class in the example.

Example with str.lower:

class MyStringSpecializedCallCase(str):
    ...

class MyStringGenericCallCase(str):
    def lower(self) -> str:
        return "x"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking a few steps ahead here of course, but the is_exact flag would be a critical first step

@@ -30,7 +31,7 @@
# bytes(obj)
function_op(
name="builtins.bytes",
arg_types=[RUnion([list_rprimitive, dict_rprimitive, str_rprimitive])],
arg_types=[RUnion([list_rprimitive, dict_rprimitive, exact_dict_rprimitive, str_rprimitive])],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exact dict can be a subtype of dict_primitive. In that case this change shouldn't be required.

@@ -503,11 +504,11 @@ L19:
L20:
return 1
def fn_typeddict(t):
t :: dict
t :: dict[exact]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When coercing from say Any to a TypedDict type, do we raise a type error if the source is an instance of a dict subclass? This is needed for type safety, and we need a test for this (unless one already exists). This includes both implicit coercions in IR and wrapper methods.

@JukkaL
Copy link
Collaborator

JukkaL commented Aug 13, 2025

Would it make this one easier to review if I split it up?

Yes please. Maybe start with a PR that adds the exact dict type but only uses it in one use case only. Then you can add more use cases in later PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants