Skip to content

gh-121192: Add fast path to deepcopy() for empty list/tuple/dict/set #121193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

lgeiger
Copy link
Contributor

@lgeiger lgeiger commented Jul 1, 2024

deepcopy() can be surprisingly slow when called with empty containers like lists, tuples, dicts, sets or frozensets.

This adds a fast path for this case similar to #114266 which speeds up these cases by about 4x - 21x while having a small impact on the default path.

Here are some benchmarks comparing this PR against main. Let me know if there are more benchmarks that I can run to verify that there a no performance regressions for the common case:

import pyperf
runner = pyperf.Runner()

setup = """
import copy

a = {"list": [1, 2 ,3 ,4], "t": (1, 2, 3), "str": "hello", "subdict": {"a": True}}

t = ()
fs = frozenset()
l = []
s = set()
d = {}
deep = [[], (), {}, set(), frozenset()]
"""

runner.timeit(name="deepcopy dict", stmt=f"b = copy.deepcopy(a)", setup=setup)
runner.timeit(name="deepcopy empty tuple", stmt=f"b = copy.deepcopy(t)", setup=setup)
runner.timeit(name="deepcopy empty frozenset", stmt=f"b = copy.deepcopy(fs)", setup=setup)
runner.timeit(name="deepcopy empty list", stmt=f"b = copy.deepcopy(l)", setup=setup)
runner.timeit(name="deepcopy empty set", stmt=f"b = copy.deepcopy(s)", setup=setup)
runner.timeit(name="deepcopy empty dict", stmt=f"b = copy.deepcopy(d)", setup=setup)
runner.timeit(name="deepcopy multiple empty containers", stmt=f"b = copy.deepcopy(deep)", setup=setup)
deepcopy empty tuple: Mean +- std dev: [baseline] 287 ns +- 4 ns -> [opt-empty] 49.8 ns +- 1.7 ns: 5.77x faster
deepcopy empty frozenset: Mean +- std dev: [baseline] 1.46 us +- 0.02 us -> [opt-empty] 67.1 ns +- 1.8 ns: 21.70x faster
deepcopy empty list: Mean +- std dev: [baseline] 327 ns +- 6 ns -> [opt-empty] 67.1 ns +- 8.2 ns: 4.88x faster
deepcopy empty set: Mean +- std dev: [baseline] 1.44 us +- 0.02 us -> [opt-empty] 67.9 ns +- 3.8 ns: 21.26x faster
deepcopy empty dict: Mean +- std dev: [baseline] 329 ns +- 4 ns -> [opt-empty] 68.1 ns +- 3.0 ns: 4.83x faster

Benchmark hidden because not significant (2): deepcopy dict, deepcopy multiple empty containers

Geometric mean: 4.85x faster

@lgeiger
Copy link
Contributor Author

lgeiger commented Jul 1, 2024

I ran the relevant pyperformance deepcopy benchmark which does seem to suggest that this change makes the non empty case slightly slower:EDIT: I re-ran the pyperformance benchmarks with the latest changes in d87901e and I now can't see any performance regression for the common case.

Performance version: 1.11.0
Python version: 3.14.0a0 (64-bit) revision 1a84bdc237
Report on macOS-14.5-arm64-arm-64bit-Mach-O
Number of logical CPUs: 8
Start date: 2024-07-02 01:23:14.690599
End date: 2024-07-02 01:23:51.844329

### deepcopy ###
Mean +- std dev: 123 us +- 12 us -> 120 us +- 1 us: 1.03x faster
Significant (t=2.25)

### deepcopy_memo ###
Mean +- std dev: 14.1 us +- 0.2 us -> 14.1 us +- 0.3 us: 1.00x slower
Not significant

### deepcopy_reduce ###
Mean +- std dev: 1.22 us +- 0.06 us -> 1.21 us +- 0.01 us: 1.00x faster
Not significant

@lgeiger lgeiger force-pushed the deepcopy-empty-list-dict branch from d7b07eb to d87901e Compare July 2, 2024 19:38
@eendebakpt
Copy link
Contributor

The optimization applied in the latest iteration of the PR is only applied when no memo is passed to deepcopy. The advantage is that any recursive structures like {'a': [1, 2, 3, 4, ,5]} will have no (or near zero) overhead. So I agree that performance regression for many common cases is not a big concern.

The question then is whether a deepcopy directly on an empty builtin like like and set (and not a deepcopy on a recursive structure containing empty builtins, e.g, ( {}, [], ())) is a common enough operation to apply this PR. I am not convinced on this, In the corresponding issue an example is given for pydantic basemodels. But there the deepcopy overhead can be avoided by using a default factory. Also the corresponding situation for plain dataclasses cannot occur:

from dataclasses import dataclass

@dataclass
class DC:
    a : list = []
    
x=DC()

raises a ValueError: mutable default <class 'list'> for field a is not allowed: use default_factory. With attrs an empty list is allowed, but no deepcopy is performed on instantiation. For example

import attrs
@attrs.define
class DC:
    a : list = []
    
x=DC()
y=DC()

x.a.append(1)

print(x)
print(y)

results in

DC(a=[1])
DC(a=[1])

@eendebakpt
Copy link
Contributor

For reference: I performed a benchmark of main, this PR and the deepcopy C implementation from #91610. On benchmark

import pyperf
runner = pyperf.Runner()

setup="""
import copy
from dataclasses import dataclass

@dataclass
class A:
    a : list
    b : str
    c : bool
    d: tuple
    e : set
    
dc=A([1,2,3], 'hello', True, (), {})

empty_tuple = tuple()
empty_list = []
"""

runner.timeit(name="deepcopy empty tuple", stmt="b=copy.deepcopy(empty_tuple)", setup=setup)
runner.timeit(name="deepcopy empty list", stmt="b=copy.deepcopy(empty_list)", setup=setup)
runner.timeit(name="deepcopy dataclass", stmt="b=copy.deepcopy(dc)", setup=setup)

Results are

deepcopy empty tuple: Mean +- std dev: [dc_main] 827 ns +- 37 ns -> [dc_fastpath_empty] 203 ns +- 7 ns: 4.06x faster
deepcopy empty list: Mean +- std dev: [dc_main] 1.00 us +- 0.09 us -> [dc_fastpath_empty] 233 ns +- 24 ns: 4.29x faster
deepcopy dataclass: Mean +- std dev: [dc_main] 8.02 us +- 0.12 us -> [dc_fastpath_empty] 8.22 us +- 0.07 us: 1.02x slower

Geometric mean: 2.57x faster
(env312) c:\projects\misc\cpython\PCbuild>python -m pyperf compare_to dc_main.json dc_c.json
deepcopy empty tuple: Mean +- std dev: [dc_main] 827 ns +- 37 ns -> [dc_c] 90.6 ns +- 9.8 ns: 9.13x faster
deepcopy empty list: Mean +- std dev: [dc_main] 1.00 us +- 0.09 us -> [dc_c] 216 ns +- 24 ns: 4.63x faster
deepcopy dataclass: Mean +- std dev: [dc_main] 8.02 us +- 0.12 us -> [dc_c] 4.06 us +- 0.43 us: 1.97x faster

Geometric mean: 4.37x faster

@sobolevn
Copy link
Member

sobolevn commented Jul 3, 2024

@eendebakpt it can happen for dataclass, example:

from dataclasses import dataclass, as_dict

@dataclass 
class Some:
    field: set[int]

as_dict(Some(set()))

@eendebakpt
Copy link
Contributor

@eendebakpt it can happen for dataclass, example:

Indeed. Here a more complete example, including the cases where this PR helps

from dataclasses import dataclass, asdict

@dataclass 
class Some:
    a: str
    t: tuple[int]
    field: set[int] # benefits
    l : list[int]
    d : dict[str, str]
    fs: frozenset[int] # benefits
    nonempty : set[int]

s=Some('x', tuple(), set(), [], d={}, fs=frozenset(), nonempty={1, 2})
asdict(s)

@lgeiger
Copy link
Contributor Author

lgeiger commented Jul 3, 2024

Indeed. Here a more complete example, including the cases where this PR helps

Thanks for the example! This PR indeed shows a decent improvement here:

import pyperf
runner = pyperf.Runner()

setup = """
from dataclasses import dataclass, asdict

@dataclass
class Some:
    a: str
    t: tuple[int]
    field: set[int] # benefits
    l : list[int]
    d : dict[str, str]
    fs: frozenset[int] # benefits
    nonempty : set[int]

s = Some('x', tuple(), set(), [], d={}, fs=frozenset(), nonempty={1, 2})
"""

runner.timeit(name="dataclass asdict", stmt=f"b = asdict(s)", setup=setup)
Mean +- std dev: [asdict-baseline] 6.26 us +- 0.07 us -> [asdict-opt] 3.39 us +- 0.18 us: 1.85x faster

@sobolevn
Copy link
Member

sobolevn commented Jul 3, 2024

Other use-cases that we can find in our own code:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants