Skip to content

Make dataclass attr collection no longer worst-case quadratic #13539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 31, 2022

Conversation

Michael0x2a
Copy link
Collaborator

While working on #13531, I noticed that DataclassTransformer's collect_attributes method was doing basically this:

all_attrs = []
known_attrs = set()
for stmt in current_class:
    attr = convert_stmt_to_dataclass_attr(stmt)
    all_attrs.append(attr)
    known_attrs.add(attr.name)

for info in current_class.mro[1:-1]:
    if info is not a dataclass:
        continue

    super_attrs = []
    for attr in info.dataclass_attributes:
        # ...snip...
        if attr.name not in known_attrs:
            super_attrs.append(attr)
            known_attrs.add(attr.name)
        elif all_attrs:
            for other_attr in all_attrs:
                if other_attr.name == attr.name:
                    all_attrs.remove(attr)
                    super_attrs.append(attr)
                    break
    all_attrs = super_attrs + all_attrs
    all_attrs.sort(key=lambda a: a.kw_only)

validate all_attrs

Constantly searching through and removing items from all_attrs, then pre-pending the superclass attrs will result in worst-case quadratic behavior in the edge case where subtype is overriding every attribute defined in the supertype.

This edge case is admittedly pretty unlikely to happen, but I wanted to clean up the code a bit by reversing the order in which we process everything so we naturally record attrs in the correct order. The code now looks more like this:

found_attrs = {}   # Mypy requires Python 3.7+, so this will preserve insertion order
found_dataclass_supertype = False
for info in reversed(current_class.mro[1:-1]):
    if info is not a dataclass:
        continue

    found_dataclass_supertype = True
    for attr in info.dataclass_attributes:
        # ...snip...
        found_attrs[attr.name] = attr

for stmt in current_class:
    attr = convert_stmt_to_dataclass_attr(stmt)
    found_attrs[attr.name] = attr

all_attrs = list(found_attrs.values())
if found_dataclass_supertype:
    all_attrs.sort(key=lambda a: a.kw_only)

validate all_attrs

One quirk of the old implementation I found was that we only move kw-only attrs to the end of the attrs list when we find at least one dataclass supertype. I tried changing this logic so we unconditionally sort/do not sort the list, but both alternatives ended up breaking some tests. I'm not sure I really understand what's supposed to happen here, so opted to just keep this part of mypy as-is.

While working on python#13531, I noticed that DataclassTransformer's
`collect_attributes` method was doing basically this:

```python
all_attrs = []
known_attrs = set()
for stmt in current_class:
    attr = convert_stmt_to_dataclass_attr(stmt)
    all_attrs.append(attr)
    known_attrs.add(attr.name)

for info in current_class.mro[1:-1]:
    if info is not a dataclass:
        continue

    super_attrs = []
    for attr in info.dataclass_attributes:
        # ...snip...
        if attr.name not in known_attrs:
            super_attrs.append(attr)
            known_attrs.add(attr.name)
        elif all_attrs:
            for other_attr in all_attrs:
                if other_attr.name == attr.name:
                    all_attrs.remove(attr)
                    super_attrs.append(attr)
                    break
    all_attrs = super_attrs + all_attrs
    all_attrs.sort(key=lambda a: a.kw_only)

validate all_attrs
```

Constantly searching through and removing items from `all_attrs`,
then pre-pending the superclass attrs will result in worst-case
quadratic behavior in the edge case where subtype is overriding
every attribute defined in the supertype.

This edge case is admittedly pretty unlikely to happen, but I wanted
to clean up the code a bit by reversing the order in which we process
everything so we naturally record attrs in the correct order.

One quirk of the old implementation I found was that we do not sort
the attrs list and move kw-only attrs to the end when none of the
supertypes are dataclasses. I tried changing this logic so we
unconditionally sort the list, but this actually broke a few of our
tests for some reason. I didn't want to get too deep in the weeds,
so opted to preserve this behavior.
@github-actions
Copy link
Contributor

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the new code looks much cleaner (and it's also faster as a bonus).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants