Skip to content

FUTURE BUG: reconsider how we deep-copy path objects #29157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tacaswell opened this issue Nov 19, 2024 · 12 comments · May be fixed by #29393
Open

FUTURE BUG: reconsider how we deep-copy path objects #29157

tacaswell opened this issue Nov 19, 2024 · 12 comments · May be fixed by #29393
Labels
Difficulty: Hard https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues
Milestone

Comments

@tacaswell
Copy link
Member

We currently use super() in the __deepcopy__ implementation ofPath

def __deepcopy__(self, memo=None):
"""
Return a deepcopy of the `Path`. The `Path` will not be
readonly, even if the source `Path` is.
"""
# Deepcopying arrays (vertices, codes) strips the writeable=False flag.
p = copy.deepcopy(super(), memo)
p._readonly = False
return p

however, on the main branch of CPython this causes infinite recursion (python/cpython#126817). Although upstream may (or may not) sort out how to fix this, we should see if there is another way we can do what we need to do and avoid any fallout from changes in CPython.


Good first issue because "make sure deepcopy works on Paths" is a narrowly limited task, but hard because it will require understanding why we currently use super() and enough of the deepcopy/pickle protocol details to fix it.

attn @anntzer

@tacaswell tacaswell added Difficulty: Hard https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues Good first issue Open a pull request against these issues if there are no active ones! labels Nov 19, 2024
@tacaswell tacaswell added this to the v3.11.0 milestone Nov 19, 2024
Copy link

Good first issue - notes for new contributors

This issue is suited to new contributors because it does not require understanding of the
Matplotlib internals. To get started, please see our contributing
guide
.

We do not assign issues. Check the Development section in the sidebar for linked pull
requests (PRs). If there are none, feel free to start working on it. If there is an open PR, please
collaborate on the work by reviewing it rather than duplicating it in a competing PR.

If something is unclear, please reach out on any of our communication
channels
.

@kyracho
Copy link
Contributor

kyracho commented Nov 20, 2024

Hi! I'm relatively a beginner but I'd love to take this on as a challenge if it's okay 😊

I was wondering, would it be acceptable to replace

p = copy.deepcopy(super(), memo)

with

p = self.__class__.__new__(self.__class__)

and to manually copy the attributes like

p._vertices = copy.deepcopy(self._vertices, memo)
p._codes = copy.deepcopy(self._codes, memo)
p._interpolation_steps = self._interpolation_steps
p._readonly = False

?

The end result would be like

 def __deepcopy__(self, memo=None):
        """
        Return a deepcopy of the `Path`.  The `Path` will not be
        readonly, even if the source `Path` is.
        """

        if id(self) in memo:
            return memo[id(self)]

        p = self.__class__.__new__(self.__class__)
        memo[id(self)] = p

        p._vertices = copy.deepcopy(self._vertices, memo)
        p._codes = copy.deepcopy(self._codes, memo)
        p._interpolation_steps = self._interpolation_steps
        p._readonly = False

        return p

@saikarna913
Copy link
Contributor

saikarna913 commented Nov 20, 2024

@tacaswell
I want to ask why we are using super() here if there is no parent class for the path.
This can be another possibility, like code calling the super() again and again, i.e. calling the method that we have in the path.
Instead of using:

 cls = self.__class__
    new_instance = cls.__new__(cls)
    
    # Add the new instance to the memo dictionary to handle circular references
    if memo is None:
        memo = {}
    memo[id(self)] = new_instance

    # Deepcopy the attributes explicitly
    new_instance.vertices = copy.deepcopy(self.vertices, memo)
    new_instance.codes = copy.deepcopy(self.codes, memo)
    new_instance._readonly = False

    return new_instance

This is because in the original code, super() begins searching for the method in the method resolution order (MRO), where the current class itself is considered first since there is no superclass. This results in an infinite recursion since copy.deepcopy finds the __deepcopy__ method in the current class, which then calls super() again, leading to a loop.

By explicitly utilizing the memo dictionary and checking if the current object is already present in it, we ensure that the __deepcopy__ process avoids re-processing the current object. This effectively prevents recursion by leveraging the memoization mechanism, which keeps track of objects that have already been deep-copied.

@tacaswell
Copy link
Member Author

@saikarna913 Are you working with @kyracho ? I am a bit confused by you posting an almost identical comment.

Reaching for __new__ is probably not the right answer, can either of you elaborate on why you think this is the right path?

It is probably also worth reading #21280 for the context of why we are using super() (along with the bug it fixed and the alternative PR we declined).

@anntzer
Copy link
Contributor

anntzer commented Nov 20, 2024

Without looking at the proper fix, it feels like we could/should get rid of the TextPath class and just have a Text(...).to_path() -> Path method?

@tacaswell
Copy link
Member Author

TextPath seems to have the ability to change the font size (but not the string?) so a one-off converter may not be entirely sufficient.

@saikarna913
Copy link
Contributor

I apologize for my confusion. I'm trying to clarify something. Since there isn't a deepcopy method in Textpath, it inherits from its parent class(PATH), which calls super() in the deepcopy method. Currently, the method that invoked super() is Textpath, so it will follow the method resolution order (MRO) of Textpath. In this MRO, the parent class is PATH, meaning that super() effectively returns a copy of path to deepcopy. However, this doesn't seem to be creating a true deepcopy of Textpath. Could you help me understand how this code functions? Thank you!

why can't we explicitly define deepcopy for subclasses like Textpath?

@anntzer
Copy link
Contributor

anntzer commented Nov 21, 2024

TextPath seems to have the ability to change the font size (but not the string?) so a one-off converter may not be entirely sufficient.

It seems simple enough to do textobj.set_size(new_size); path = textobj.to_path(), or, if one is really worried about efficiency, one can also just linearly scale the vertices (and translate them wherever needed) themselves.

@tacaswell
Copy link
Member Author

However, this doesn't seem to be creating a true deepcopy of Textpath.

What do you mean by this?

why can't we explicitly define deepcopy for subclasses like Textpath?

There is no technical reason we can not, but the behavior we want is "do the normal deepcopy thing, but set writable" which is a behavior inherited from Path so we should not have to copy that logic into every sub-class (of which we apparently only have one).

@saikarna913
Copy link
Contributor

@tacaswell
I need small help
i am unable to find the _path which is giving me an import error in the path.py file
image

and regarding the deepcopy I think this works

in path.py file

def __deepcopy__(self, memo):
    # Create a shallow copy of the object first.
    new_instance = copy.copy(self)
    
    # Deep copy the _vertices array and set the writeable flag based on the readonly flag
    new_instance._vertices = np.copy(self._vertices)
    new_instance._vertices.flags.writeable = not self._readonly
    
    # Deep copy the _codes array if it exists, and handle the writeable flag
    if self._codes is not None:
        new_instance._codes = np.copy(self._codes)
        new_instance._codes.flags.writeable = not self._readonly
    
    # Deep copy any additional attributes that were added in the class
    for key, value in self.__dict__.items():
        # Skip internal attributes (e.g., _vertices, _codes, _readonly)
        if key in ['_vertices', '_codes', '_readonly']:
            continue
        
        # If the attribute is a mutable object, perform deep copy, otherwise just copy it
        if isinstance(value, (np.ndarray, list, dict)):
            new_instance.__dict__[key] = copy.deepcopy(value, memo)
        else:
            new_instance.__dict__[key] = value

    # Retain the readonly flag
    new_instance._readonly = self._readonly
    
    return new_instance

in textpath.py

  def __deepcopy__(self, memo):
        # Create a shallow copy of the TextPath object
        new_instance = super().__deepcopy__(memo)
        
        # Deep copy text-specific attributes
        new_instance._xy = self._xy
        new_instance._size = self._size
        new_instance._should_simplify = self._should_simplify

        # If cached vertices exist, we need to copy them and ensure the writeable flag
        if self._cached_vertices is not None:
            new_instance._cached_vertices = np.copy(self._cached_vertices)
            new_instance._cached_vertices.flags.writeable = not self._readonly
        
        # Mark the object as invalid since it's a deep copy
        new_instance._invalid = True

        return new_instance

@tacaswell
Copy link
Member Author

_path is a cextension (see src/_path.h and src/_path_wrapper.cpp).

I'm sure with a bit of refinement those methods would work, but I am not sure that mostly re-implementing copy.deepcopy in __deepcopy__ of two of our classes is the correct approach. On one hand it will "work" in that for at least the classes as they are now for the tests we currently have it will pass, but it leaves open a path for getting out-of-sync with upstream in the future and has a good chance of confusing a future maintainer. Before we go that route we should fully rule out using the more "standard" path.


As an implementation detail, can you explain why you went with obj.__dict__[key] = value rather than setattr(obj, key, value)? I am also curious why you copy the cache in both implementations but only bother to invalidate it in textpath?

tacaswell added a commit to tacaswell/matplotlib that referenced this issue Dec 31, 2024
See python/cpython#126817 for upstream discussion.

This works around the change by using (private) methods from the copy module to
re-implement the path though copy/deepcopy that we would like to use but avoid
the special-casing for `super()` objects that is breaking us.

We could vendor the current versions of `_keep_alive` (weakref work to manage
lifecycles) and `_reconstruct` (where the recursion happens) to superficially
avoid using private functions from CPython.  However, if these functions do
change significantly I worry that our copies would not inter-operate anyway.

Closes matplotlib#29157
@tacaswell tacaswell linked a pull request Dec 31, 2024 that will close this issue
2 tasks
@tacaswell tacaswell removed the Good first issue Open a pull request against these issues if there are no active ones! label Dec 31, 2024
@tacaswell
Copy link
Member Author

I've put up a PR that fixes this and a related issue.

tacaswell added a commit to tacaswell/matplotlib that referenced this issue Dec 31, 2024
See python/cpython#126817 for upstream discussion.

This works around the change by using (private) methods from the copy module to
re-implement the path though copy/deepcopy that we would like to use but avoid
the special-casing for `super()` objects that is breaking us.

We could vendor the current versions of `_keep_alive` (weakref work to manage
lifecycles) and `_reconstruct` (where the recursion happens) to superficially
avoid using private functions from CPython.  However, if these functions do
change significantly I worry that our copies would not inter-operate anyway.

Closes matplotlib#29157
tacaswell added a commit to tacaswell/matplotlib that referenced this issue Dec 31, 2024
See python/cpython#126817 for upstream discussion.

This works around the change by using (private) methods from the copy module to
re-implement the path though copy/deepcopy that we would like to use but avoid
the special-casing for `super()` objects that is breaking us.

We could vendor the current versions of `_keep_alive` (weakref work to manage
lifecycles) and `_reconstruct` (where the recursion happens) to superficially
avoid using private functions from CPython.  However, if these functions do
change significantly I worry that our copies would not inter-operate anyway.

Closes matplotlib#29157
@QuLogic QuLogic marked this as a duplicate of #29785 Mar 20, 2025
tacaswell added a commit to tacaswell/matplotlib that referenced this issue Apr 27, 2025
See python/cpython#126817 for upstream discussion.

This works around the change by using (private) methods from the copy module to
re-implement the path though copy/deepcopy that we would like to use but avoid
the special-casing for `super()` objects that is breaking us.

We could vendor the current versions of `_keep_alive` (weakref work to manage
lifecycles) and `_reconstruct` (where the recursion happens) to superficially
avoid using private functions from CPython.  However, if these functions do
change significantly I worry that our copies would not inter-operate anyway.

Closes matplotlib#29157
tacaswell added a commit to tacaswell/matplotlib that referenced this issue Apr 28, 2025
See python/cpython#126817 for upstream discussion.

This works around the change by using (private) methods from the copy module to
re-implement the path though copy/deepcopy that we would like to use but avoid
the special-casing for `super()` objects that is breaking us.

We could vendor the current versions of `_keep_alive` (weakref work to manage
lifecycles) and `_reconstruct` (where the recursion happens) to superficially
avoid using private functions from CPython.  However, if these functions do
change significantly I worry that our copies would not inter-operate anyway.

Closes matplotlib#29157
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Difficulty: Hard https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues
Projects
None yet
4 participants