Skip to content

gh-114264: Optimize performance of copy.deepcopy by adding a fast path for atomic types #114266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jun 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
9990245
Optimize performance of copy.deepcopy by adding a fast path for atomi…
eendebakpt Jan 18, 2024
06f1980
match order with main
eendebakpt Jan 18, 2024
dd9fabf
📜🤖 Added by blurb_it.
blurb-it[bot] Jan 18, 2024
becbb59
format news entry
eendebakpt Jan 18, 2024
339c2b9
put d variable back
eendebakpt Jan 19, 2024
89ecad1
Merge branch 'deepcopy_atomic_types' of github.com:eendebakpt/cpython…
eendebakpt Jan 19, 2024
9aacf87
Merge branch 'main' into deepcopy_atomic_types
eendebakpt Jan 30, 2024
5197f94
Merge branch 'main' into deepcopy_atomic_types
eendebakpt Feb 4, 2024
18914f3
Merge branch 'main' into deepcopy_atomic_types
eendebakpt Feb 26, 2024
8f294c7
Merge branch 'main' into deepcopy_atomic_types
eendebakpt Feb 26, 2024
3b57fc1
Merge branch 'main' into deepcopy_atomic_types
eendebakpt Mar 18, 2024
53d63a7
Merge branch 'main' into deepcopy_atomic_types
eendebakpt Mar 29, 2024
eb8e6ba
improve performance of _deepcopy_list with a list comprehension
eendebakpt Apr 13, 2024
e468b6c
avoid generation of exception in _deepcopy_tuple
eendebakpt Apr 13, 2024
2b4abe6
Merge branch 'main' into deepcopy_atomic_types
eendebakpt Apr 13, 2024
e3310f5
revert comprehension changes
eendebakpt Apr 14, 2024
1186a5b
Merge branch 'deepcopy_atomic_types' of github.com:eendebakpt/cpython…
eendebakpt Apr 14, 2024
ce7273b
Merge branch 'main' into deepcopy_atomic_types
eendebakpt Apr 20, 2024
4bc74d3
Update Lib/copy.py
eendebakpt May 1, 2024
d69d8da
Merge branch 'main' into deepcopy_atomic_types
eendebakpt May 1, 2024
c66b24d
Apply suggestions from code review
eendebakpt May 1, 2024
142b0fc
Merge branch 'main' into deepcopy_atomic_types
eendebakpt May 6, 2024
d6fa670
Merge branch 'main' into deepcopy_atomic_types
eendebakpt Jun 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 10 additions & 21 deletions Lib/copy.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,11 @@ def deepcopy(x, memo=None, _nil=[]):
See the module's __doc__ string for more info.
"""

cls = type(x)

if cls in _atomic_types:
return x

Comment on lines +124 to +128
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a fast path here would cause a potential behavioral change if memo is involved. This should be mentioned in NEWS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safer way might be simply keep the original behavior - put the fast path after memo. This is a user observable change and it might matter. For example, if the user is using the memo dict to keep track of all the objects copied.

It would also be interested to see the benchmark if we keep the memo as it is - how much performance gain is from not updating memo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting the fast path after the memo results in the following (main vs. the variation):

deepcopy dict: Mean +- std dev: [main] 6.88 us +- 0.08 us -> [pr_v2] 6.06 us +- 0.08 us: 1.13x faster
deepcopy dataclass: Mean +- std dev: [main] 6.64 us +- 0.11 us -> [pr_v2] 6.43 us +- 0.17 us: 1.03x faster
deepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [pr_v2] 1.01 us +- 0.02 us: 1.06x faster

Geometric mean: 1.08x faster

Still worthwhile, but a smaller improvement.

I am trying to find a case where the behavior changes (so I can also add a test). But the atomic types are not added to the memo:

...
    # If is its own copy, don't memoize.
    if y is not x:
        memo[d] = y
        _keep_alive(x, memo) # Make sure x lives at least as long as d
...

The only case I could think of where behaviour changes is when users supply their own memo, and then the behavior change (different id for the same string) could be considered an implementation details:

import copy

# large str, so it is not interned
s='sds' * 12312312
s2='sds' * 12312312

print(id(s), id(s2)) # different id's

memo= {id(s2): s}
t=copy.deepcopy(s2, memo)
print(id(t), id(s), id(s)==id(t), id(s2)) # with this PR id(s) and id(t) are not equal, although s and t are equal as strings

Are there any cases where behavior changes that I am missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how the times will change if use if copier is _deepcopy_atomic instead of if cls in _atomic_types? You can also try to use different special value for example ... instead of _deepcopy_atomic to avoid lookup in globals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serhiy-storchaka That is an interesting suggestion. A microbenchmark shows getting the copier is not much more expensive than the cls in _atomic_types check:

%timeit cls in _atomic_types
43.7 ns ± 0.0706 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
%timeit _deepcopy_dispatch.get(cls)
71.5 ns ± 0.0873 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

Using a special value for the copier can be done after the memo check. This is quite close to current main and has performance

deepcopy dict: Mean +- std dev: [main] 6.88 us +- 0.08 us -> [v3] 6.36 us +- 0.11 us: 1.08x faster
deepcopy dataclass: Mean +- std dev: [main] 6.64 us +- 0.11 us -> [v3] 6.26 us +- 0.05 us: 1.06x faster
deepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [v3] 1.02 us +- 0.01 us: 1.05x faster

Geometric mean: 1.06x faster

(implementation is: main...eendebakpt:deepcopy_atomic_types_v3)

Using the same approach before the memo check has performance:

deepcopy dict: Mean +- std dev: [main] 6.88 us +- 0.08 us -> [v4] 4.96 us +- 0.08 us: 1.39x faster
deepcopy dataclass: Mean +- std dev: [main] 6.64 us +- 0.11 us -> [v4] 5.38 us +- 0.06 us: 1.23x faster
deepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [v4] 931 ns +- 10 ns: 1.15x faster

Geometric mean: 1.25x faster

(implementation: main...eendebakpt:deepcopy_atomic_types_v4)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the atomic types are not added to the memo:

Ah you are right, I missed that. If that, then the impact is much smaller than I thought and I think it's okay to skip the memo part.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am okay with both solutions, provided the memo change is mentioned (if there is one). Users can use this function in surprising ways, and it's a de facto public API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for satisfying my curiosity. I see that there is a non-trivial benefit from using _atomic_types.

Could you please make benchmarks for a collection, containing a large number of non-atomic identical objects, e.g. [[1]]*100, for different variants of the code? I expect that some variants can have regression, but if it is not great, we can accept this.

I also wonder whether the same approach (with an _immutable_types set) should be used in copy.copy(). Even if it does not use memo, there may be some benefit, and it could be better for uniformity of the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more extensive benchmark script:

import pyperf
runner = pyperf.Runner()

setup="""
import copy

a={'list': [1,2,3,43], 't': (1,2,3), 'str': 'hello', 'subdict': {'a': True}}

from dataclasses import dataclass

@dataclass
class A:
    a : list
    b : str
    c : bool
    
dc=A([1,2,3], 'hello', True)

@dataclass
class A:
    a : int
    
dc_small = A(123)

small_tuple = (1, )

l = {'hi': 100}
repeating_atomic = [ [1] * 100]
repeating = [dc_small] * 100
"""

runner.timeit(name="deepcopy dict", stmt=f"b=copy.deepcopy(a)", setup=setup)
runner.timeit(name="deepcopy dataclass", stmt=f"b=copy.deepcopy(dc)", setup=setup)
runner.timeit(name="deepcopy small dataclass", stmt=f"b=copy.deepcopy(dc_small)", setup=setup)
runner.timeit(name="deepcopy small tuple", stmt=f"b=copy.deepcopy(small_tuple)", setup=setup)
runner.timeit(name="deepcopy repeating", stmt=f"b=copy.deepcopy(repeating)", setup=setup)
runner.timeit(name="deepcopy repeating_atomic", stmt=f"b=copy.deepcopy(repeating_atomic)", setup=setup)

Comparison of the three alternative versions to main. This PR:

deepcopy dict: Mean +- std dev: [main] 6.82 us +- 0.07 us -> [pr] 4.68 us +- 0.08 us: 1.46x faster
deepcopy dataclass: Mean +- std dev: [main] 6.59 us +- 0.07 us -> [pr] 5.23 us +- 0.07 us: 1.26x faster
deepcopy small dataclass: Mean +- std dev: [main] 4.30 us +- 0.07 us -> [pr] 3.88 us +- 0.08 us: 1.11x faster
deepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [pr] 925 ns +- 12 ns: 1.16x faster
deepcopy repeating: Mean +- std dev: [main] 21.8 us +- 0.2 us -> [pr] 24.9 us +- 0.4 us: 1.15x slower
deepcopy repeating_atomic: Mean +- std dev: [main] 24.6 us +- 0.5 us -> [pr] 10.1 us +- 0.2 us: 2.44x faster

Geometric mean: 1.31x faster

main...eendebakpt:deepcopy_atomic_types_v3

deepcopy dict: Mean +- std dev: [main] 6.82 us +- 0.07 us -> [v3] 6.34 us +- 0.22 us: 1.08x faster
deepcopy dataclass: Mean +- std dev: [main] 6.59 us +- 0.07 us -> [v3] 6.21 us +- 0.11 us: 1.06x faster
deepcopy small dataclass: Mean +- std dev: [main] 4.30 us +- 0.07 us -> [v3] 4.16 us +- 0.04 us: 1.03x faster
deepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [v3] 1.03 us +- 0.02 us: 1.03x faster
deepcopy repeating: Mean +- std dev: [main] 21.8 us +- 0.2 us -> [v3] 22.0 us +- 0.6 us: 1.01x slower
deepcopy repeating_atomic: Mean +- std dev: [main] 24.6 us +- 0.5 us -> [v3] 21.2 us +- 0.6 us: 1.16x faster

Geometric mean: 1.06x faster

main...eendebakpt:deepcopy_atomic_types_v4

deepcopy dict: Mean +- std dev: [main] 6.82 us +- 0.07 us -> [v4] 4.93 us +- 0.05 us: 1.38x faster
deepcopy dataclass: Mean +- std dev: [main] 6.59 us +- 0.07 us -> [v4] 5.35 us +- 0.06 us: 1.23x faster
deepcopy small dataclass: Mean +- std dev: [main] 4.30 us +- 0.07 us -> [v4] 3.89 us +- 0.05 us: 1.11x faster
deepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [v4] 938 ns +- 25 ns: 1.14x faster
deepcopy repeating: Mean +- std dev: [main] 21.8 us +- 0.2 us -> [v4] 26.4 us +- 0.5 us: 1.21x slower
deepcopy repeating_atomic: Mean +- std dev: [main] 24.6 us +- 0.5 us -> [v4] 12.4 us +- 0.4 us: 1.99x faster

Geometric mean: 1.23x faster

Some notes:

  • In this PR we avoid the cost of looking into the memo for atomic types which provides the main speedup. This comes at a performance degradation for objects containing many identical (same id) values. Based on the numbers above there is no peformance reason to pick version v4. For clarify or uniformity of the code it would not hurt a lot though to replace the pr with v4. I think one could construct benchmarks where v3 is faster than this pr (thinking about non-atomic types which have a custom __copy__ and recursive calls to deepcopy such as numpy arrays), but I expect differences to be small. Version v3 has the same performance on the "deepcopy repeating" as main (the 1.01x slower is a random fluctuation I believe, on average it will be 1.00x) and a small performance improvement for some other cases.

  • I also considered aligning the implementations of copy.copy and copy.deepcopy (for uniformity of the code), but decided against this initially. My line of thought:

    • There is no memo that can be skipped (which was the main performance benefit)
    • Calls of copy.copy on atomic types should be relative rare (and cheap anyway), so there is less to gain.
    • For calls on container types (like list, dict or set) there are no recursive calls to the components. This is in contrast with deepcopy where are call on a dataclass can recurse into atomic-types.
  • I have not updated the news entry yet with a description of the behaviour changes, because I think the behavior changes are not really something one can notice with normal use of deepcopy.
    i) The first behavior change is that the number of invocations of memo.get is less. But memo.get is a read-only operation (on normal dicts), so this is not something visible from the public interface.
    ii) In the example given above the memo passed is memo= {id(s2): s}. This memo is not really a valid memo, since we require for each key-value pair in the memo id(value)==key, which is not true in the example.
    If desired I can add this to the news entry though.

  • The results above are only micro benchmarks and it will depend on the application which version will perform best. Applications where I have found the deepcopy to take a significant amount of time are lmfit, qiskit and quantify-scheduler, but I cannot run those with current main (I could with 3.12 though).

  • There is another PR improving the speed of deepcopy (gh-72793: C implementation of parts of copy.deepcopy #91610). That approach converts the python implementation to C but is more complex and will take time to review (and might not be accepted).

d = id(x)
if memo is None:
memo = {}
Expand All @@ -129,14 +134,12 @@ def deepcopy(x, memo=None, _nil=[]):
if y is not _nil:
return y

cls = type(x)

copier = _deepcopy_dispatch.get(cls)
if copier is not None:
y = copier(x, memo)
else:
if issubclass(cls, type):
y = _deepcopy_atomic(x, memo)
y = x # atomic copy
else:
copier = getattr(x, "__deepcopy__", None)
if copier is not None:
Expand Down Expand Up @@ -167,26 +170,12 @@ def deepcopy(x, memo=None, _nil=[]):
_keep_alive(x, memo) # Make sure x lives at least as long as d
return y

_atomic_types = {types.NoneType, types.EllipsisType, types.NotImplementedType,
int, float, bool, complex, bytes, str, types.CodeType, type, range,
types.BuiltinFunctionType, types.FunctionType, weakref.ref, property}

_deepcopy_dispatch = d = {}

def _deepcopy_atomic(x, memo):
return x
d[types.NoneType] = _deepcopy_atomic
d[types.EllipsisType] = _deepcopy_atomic
d[types.NotImplementedType] = _deepcopy_atomic
d[int] = _deepcopy_atomic
d[float] = _deepcopy_atomic
d[bool] = _deepcopy_atomic
d[complex] = _deepcopy_atomic
d[bytes] = _deepcopy_atomic
d[str] = _deepcopy_atomic
d[types.CodeType] = _deepcopy_atomic
d[type] = _deepcopy_atomic
d[range] = _deepcopy_atomic
d[types.BuiltinFunctionType] = _deepcopy_atomic
d[types.FunctionType] = _deepcopy_atomic
d[weakref.ref] = _deepcopy_atomic
d[property] = _deepcopy_atomic

def _deepcopy_list(x, memo, deepcopy=deepcopy):
y = []
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Improve performance of :func:`copy.deepcopy` by adding a fast path for atomic types.
Loading