Make dataclass attr collection no longer worst-case quadratic #13539

Michael0x2a · 2022-08-28T05:54:40Z

While working on #13531, I noticed that DataclassTransformer's collect_attributes method was doing basically this:

all_attrs = []
known_attrs = set()
for stmt in current_class:
    attr = convert_stmt_to_dataclass_attr(stmt)
    all_attrs.append(attr)
    known_attrs.add(attr.name)

for info in current_class.mro[1:-1]:
    if info is not a dataclass:
        continue

    super_attrs = []
    for attr in info.dataclass_attributes:
        # ...snip...
        if attr.name not in known_attrs:
            super_attrs.append(attr)
            known_attrs.add(attr.name)
        elif all_attrs:
            for other_attr in all_attrs:
                if other_attr.name == attr.name:
                    all_attrs.remove(attr)
                    super_attrs.append(attr)
                    break
    all_attrs = super_attrs + all_attrs
    all_attrs.sort(key=lambda a: a.kw_only)

validate all_attrs

Constantly searching through and removing items from all_attrs, then pre-pending the superclass attrs will result in worst-case quadratic behavior in the edge case where subtype is overriding every attribute defined in the supertype.

This edge case is admittedly pretty unlikely to happen, but I wanted to clean up the code a bit by reversing the order in which we process everything so we naturally record attrs in the correct order. The code now looks more like this:

found_attrs = {}   # Mypy requires Python 3.7+, so this will preserve insertion order
found_dataclass_supertype = False
for info in reversed(current_class.mro[1:-1]):
    if info is not a dataclass:
        continue

    found_dataclass_supertype = True
    for attr in info.dataclass_attributes:
        # ...snip...
        found_attrs[attr.name] = attr

for stmt in current_class:
    attr = convert_stmt_to_dataclass_attr(stmt)
    found_attrs[attr.name] = attr

all_attrs = list(found_attrs.values())
if found_dataclass_supertype:
    all_attrs.sort(key=lambda a: a.kw_only)

validate all_attrs

One quirk of the old implementation I found was that we only move kw-only attrs to the end of the attrs list when we find at least one dataclass supertype. I tried changing this logic so we unconditionally sort/do not sort the list, but both alternatives ended up breaking some tests. I'm not sure I really understand what's supposed to happen here, so opted to just keep this part of mypy as-is.

While working on python#13531, I noticed that DataclassTransformer's `collect_attributes` method was doing basically this: ```python all_attrs = [] known_attrs = set() for stmt in current_class: attr = convert_stmt_to_dataclass_attr(stmt) all_attrs.append(attr) known_attrs.add(attr.name) for info in current_class.mro[1:-1]: if info is not a dataclass: continue super_attrs = [] for attr in info.dataclass_attributes: # ...snip... if attr.name not in known_attrs: super_attrs.append(attr) known_attrs.add(attr.name) elif all_attrs: for other_attr in all_attrs: if other_attr.name == attr.name: all_attrs.remove(attr) super_attrs.append(attr) break all_attrs = super_attrs + all_attrs all_attrs.sort(key=lambda a: a.kw_only) validate all_attrs ``` Constantly searching through and removing items from `all_attrs`, then pre-pending the superclass attrs will result in worst-case quadratic behavior in the edge case where subtype is overriding every attribute defined in the supertype. This edge case is admittedly pretty unlikely to happen, but I wanted to clean up the code a bit by reversing the order in which we process everything so we naturally record attrs in the correct order. One quirk of the old implementation I found was that we do not sort the attrs list and move kw-only attrs to the end when none of the supertypes are dataclasses. I tried changing this logic so we unconditionally sort the list, but this actually broke a few of our tests for some reason. I didn't want to get too deep in the weeds, so opted to preserve this behavior.

github-actions · 2022-08-28T06:19:15Z

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

JukkaL

Thanks, the new code looks much cleaner (and it's also faster as a bonus).

JukkaL approved these changes Aug 31, 2022

View reviewed changes

JukkaL merged commit 2857736 into python:master Aug 31, 2022

sterliakov mentioned this pull request May 3, 2025

Move dataclass kw_only fields to the end of the signature #19018

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make dataclass attr collection no longer worst-case quadratic #13539

Make dataclass attr collection no longer worst-case quadratic #13539

Michael0x2a commented Aug 28, 2022

github-actions bot commented Aug 28, 2022

JukkaL left a comment

Make dataclass attr collection no longer worst-case quadratic #13539

Make dataclass attr collection no longer worst-case quadratic #13539

Conversation

Michael0x2a commented Aug 28, 2022

github-actions bot commented Aug 28, 2022

JukkaL left a comment

Choose a reason for hiding this comment