implement copy on write #597

gdementen · 2018-03-01T13:36:23Z

In python, using b = a does not copy the content of a.

>>> a = ndtest(3)
>>> b = a
>>> b['a1'] = 0
>>> a['a1']
0

The solution is to use b = a.copy() but if users use it everywhere even when not strictly necessary (and determining this is not always obvious) would consume memory and cpu needlessly.

To eliminate this problem, it would not be too hard to tell users to always use .copy() but in .copy() only flag the resulting array as "must_copy_on_write", without actually copying the data right away. Then if (and only if) the user later modifies the copy (using setitem), an actual copy is made and the array is flagged as must_copy_on_write=False before the setitem is done.

gdementen · 2024-07-03T07:52:48Z

pandas has implemented copy-on-write for a while now. It is currently optional but will be the default in Pandas 3.0.

I bumped the issue priority because a subset sometimes sharing its "parent" data is a not-too-rare source of hard-to-find bugs:

>>> a = ndtest(3)
>>> b = a[:'a1']
>>> b['a1'] = 0
>>> a['a1']
0

So this issue has an impact on performance but it is first and foremost about correctness. We cannot do much about the b = a case above, but we can fix the subset issue.

PS: I am unsure whether Pandas plans to make an explicit .copy() call copy-on-write too and it is probably a good idea we align our behaviour on what Pandas does.

gdementen added enhancement performance labels Mar 1, 2018

alixdamman added this to the nice_to_have milestone Mar 7, 2018

gdementen added the priority: low label Aug 1, 2019

gdementen removed this from the nice_to_have milestone Aug 1, 2019

gdementen added priority: high and removed priority: low labels Jul 3, 2024

gdementen mentioned this issue Jul 25, 2024

Deprecate setting values on a view subset #1114

Open

gdementen added this to the 0.37 milestone Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement copy on write #597

implement copy on write #597

gdementen commented Mar 1, 2018 •

edited

Loading

gdementen commented Jul 3, 2024

implement copy on write #597

implement copy on write #597

Comments

gdementen commented Mar 1, 2018 • edited Loading

gdementen commented Jul 3, 2024

gdementen commented Mar 1, 2018 •

edited

Loading