Skip to content

ENH: Allow for non-normalized and transformed markers #16773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ImportanceOfBeingErnest
Copy link
Member

PR Summary

This PR allows markers to be non-normalized or transformed:

  • MarkerStyle now gets a normalize_path argument, which defaults to True, but can be set False to allow for custom paths that are input to not be normalized to the unit square. That is, the following two will result in the same plot
    p = Path([[-1,-1],[1,-1],[0,1]])
    ax.scatter(...., s=20**2, marker=MarkerStyle(p))
    ax.scatter(...., s=1, marker=MarkerStyle(p.transformed(Affine2D().scale(10)), 
                 normalize_path=False))
    
  • MarkerStyle now gets a set_transform method, that allows to externally set a transform to the marker. I.e., the following two result in the same plot
    ax.scatter(...., s=20**2, marker=MarkerStyle("s"))
    m = MarkerStyle("s")
    m.set_transform(Affine2D().scale(20))
    ax.scatter(...., s=1, marker=m)
    

Why is this useful?

Until now, it is impossible use arbitrary custom markers. Custom markers are always normalized to the unit square. So for a given path, the size and center point of the marker are changed by matplotlib.

Common problems:

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@tacaswell tacaswell added this to the v3.3.0 milestone Mar 15, 2020
@tacaswell
Copy link
Member

👍 Holding off approving for a bit more discussion about the name.

The appveyor failure is not real.

@tacaswell
Copy link
Member

@ImportanceOfBeingErnest Can you please champion your own PR ?

@ImportanceOfBeingErnest
Copy link
Member Author

I think what can be discussed here is the following: Should the same argument that is introduced here be used later on to distinguish between several inbuild marker modifications, for example:

MarkerStyle("v", normalize_path="old-style")
MarkerStyle("v", normalize_path="area")
MarkerStyle("v", normalize_path="unit-square")
MarkerStyle("v", normalize_path="perceptually-uniform-01")

?

@tacaswell
Copy link
Member

One of the places we have historically gotten our selves into trouble is trying to fold too much functionality into to any given layer. From the point of view of MarkerStyle we can't stop normalizing by default (to maintain API), but the simplest extension is to add a "just don't touch it" and rely on the user to do whatever normalization they want. However, I see the argument that making it a bool paints us into a corner we may want to be able to get out of [1].

I propose we start with:

normalization: str, {'unit-scale', 'none'}, optional

In the future we will be able to add more known strings in a safe way and to allow callables to be passed. This rhymes with patterns in our codebase and numpy/scipy.

[1] sometimes it seems that in programming that the cardinality of options seems to be 0 (no option), 2 (yes/no), or infinite

@ImportanceOfBeingErnest
Copy link
Member Author

Ok, let's think this through. normalization="unit-scale" would be the default. It would leave everything as it is currently. One drawback here is that "unit-scale" may create false expectations for some predefined markers, which are not scaled into the unit square (as seen from the examples in #16623).
Equally, normalization="none" does not change the predefined markers, right?

So, for now:

MarkerStyle("D", normalize="unit-scale")        # leaves "D" unchanged, not scaled
MarkerStyle(Path(...), normalize="unit-scale")  # scales path to unit square
MarkerStyle("$f$", normalize="unit-scale")      # scales path to unit square

MarkerStyle("D", normalize="none")        # leaves "D" unchanged, not scaled to unit square
MarkerStyle(Path(...), normalize="none")  # does not scale path
MarkerStyle("$f$", normalize="none")      # scales path to unit square 
                                          # (because there is no other canonical scale)

In addition, in the future

MarkerStyle("D", normalize="area")        # scales "D" to unit area (==leaves it unchanged 
                                                        since it already is 1 unit^2 in size)
MarkerStyle("d", normalize="area")        # scales "d" to unit area 
MarkerStyle(Path(...), normalize="area")  # might try to calculate the area from the path - 
                                                                  what happens if it fails?
MarkerStyle("$f$", normalize="area")      #   same as above

Is this how we want to have it?

@tacaswell
Copy link
Member

Hmm, you are correct that the letter-named markers are not "unit-scale", maybe we should keep the default behavior as "default" and then add "unit-scale" that makes everything fit inside of a unit-box (centered at 0) for both Path and string based input, "none" which imposes no normalization on Paths that are passed in, and lets the string-based do what it does.

In the future we can add "bounding-box-area" and "filled-area" normalization values that apply to everything?

@ImportanceOfBeingErnest
Copy link
Member Author

I now went with

 normalization : str, {'classic', 'none'}, optional, default: "classic"

I think "default" would be fine for now as well, but suppose the defaults are changed in the future, that notion might become ambiguous. In this respect "classic" conveys better what it is... it's not modern, it's not great, but it's classic, like it's always been.

I am not implementing "unit-scale" here in this PR. This would be significantly more work

  • to go over each and every marker and come to a conclusion about whether it should be scaled
  • refactor the complete class to allow for some kind of ._post_transform to be applied to any marker, or change each marker individually.

@brunobeltran
Copy link
Contributor

brunobeltran commented Mar 17, 2020

I know I'm a bit late to the conversation (had to escape the country because of covid), but I agree that something like "classic" would be best to convey the current behavior.

..."unit-scale" that makes everything fit inside of a unit-box (centered at 0) for both Path and string based input...

I haven't settled on good, short names for each of these behaviors yet, but I agree that something like:

  1. linear scaling to fit inside unit square, i.e. "bbox-width"/"width" (synonyms)
  2. area scaling to have filled area same as unit square, i.e. "area"
  3. area scaling so bounding box has same area as unit square, i.e. "bbox-area"
  4. perceptually uniform scaling (fails or fallback to (2) for custom paths)

seem useful enough to consider as future options. Just would like to bring up that if (1)-(3) are on the table, then we probably want to not call any one of them in particular "unit-scale" right now, as this description fits them all three pretty well.

In the future we can add "bounding-box-area" and "filled-area" normalization values that apply to everything?

I can comment on the difficulty of these calculations, having just written code to do the bounding box calculation for custom paths with finite-width stroked edges.

The machinery for "bounding-box-area" is already in place (requires just some simple helpers added to bezier.py in #16607).

The machinery for "filled-area" would need to be added. The best way to do this would be iterative refinement (calculating area of approximating polygon to the curve for increasingly fine discretizations until some tolerance is reached). I'd estimate a very dense collection of 2-4 helper functions would need to be added to path.py/bezier.py to accomplish this. On the other hand, this is code I've written before, and it can be easily tested and would be relatively self-contained.

I think there's a very strong argument to be made that the scaling of the markers for each of these options should not take into account the markeredgewidth at all. Which is good, because while adding finite edges into the bounding box calculation requires no extra work (code already exists in #16607), adding finite edges to the "filled-area" calculation would require implementing some pretty state-of-the-art stuff, that feels pretty out of scope for the MPL library.

@brunobeltran
Copy link
Contributor

Instead of

  • refactor the complete class to allow for some kind of ._post_transform to be applied to any marker, or change each marker individually.

I would propose just adding a "classic scaling" attribute to each "marker" in MarkerStyle.markers. Then the transform property of the marker could be much more neatly folded into __init__, since it would use centralized code determined by the normalization switch.

On the other hand

I am not implementing "unit-scale" here in this PR. This would be significantly more work

Chiming in as outsider (since I've already stuck my nose in here) that I agree this would fit better in a separate PR.

@tacaswell
Copy link
Member

@brunobeltran Is it fair to summarize your position as: a) this is a non-trivial problem b) this lays down the skeleton of the public API we are going to need c) this should go in about as-is so we can build on it?

tacaswell
tacaswell previously approved these changes Mar 18, 2020
Copy link
Member

@tacaswell tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modulo tests passing and a rebase.

@anntzer
Copy link
Contributor

anntzer commented Mar 18, 2020

Previously, the unscaled paths of the builtin markers were effectively not part of the public API; with this PR, they are. Do we want to change the size of the unscaled versions of the builtin markers while this is still possible?

@ImportanceOfBeingErnest
Copy link
Member Author

Mhh. .get_path and .get_transform have always been public, rigth? So changing those would be a breaking change?

@brunobeltran
Copy link
Contributor

brunobeltran commented Mar 18, 2020

@brunobeltran Is it fair to summarize your position as: a) this is a non-trivial problem b) this lays down the skeleton of the public API we are going to need c) this should go in about as-is so we can build on it?

Yup 👍

Previously, the unscaled paths of the builtin markers were effectively not part of the public API; with this PR, they are. Do we want to change the size of the unscaled versions of the builtin markers while this is still possible?

This doesn't seem true to me? Sure now users might rely on the size of a marker given normalization="none", but we can easily document in the future that normalization="none" will default to "classic" for the case of built-in markers.

I mean, you could argue that the path/transform were always publicly available (MarkerStyle.get_path()/MarkerStyle.get_transform()).

This is especially true since, contrary to what was initially proposed above,

MarkerStyle("D", normalize="unit-scale") # leaves "D" unchanged, not scaled

would not actually leave "D" unchanged, but scale it down by a factor of 1/sqrt(2) to fit into the unit square.

EDIT: @ImportanceOfBeingErnest beat me to this, agree with his comment.

@anntzer
Copy link
Contributor

anntzer commented Mar 18, 2020

Fair enough, they were public if one tried hard enough, but this makes them much more easily accessible.

@ImportanceOfBeingErnest
Copy link
Member Author

By "more easily accessible" you mean "discoverable" or "useful"? Or else, I might not really understand the argument because accessibility does not change here.

Arguably there is a high reduncancy between the path and the transform. So in principle one could define any marker just by a path. In that case, one could deprecate get_transform; is that what you have in mind?

@brunobeltran
Copy link
Contributor

brunobeltran commented Mar 23, 2020

Missed the call today, but I would like to say that I highly prefer the original proposal, for a few reasons:

  1. Allowing a general Transform to be passed in is completely redundant since we're already allowing a general Path to be passed in. If the user wants to do something arbitrarily fancy, let them apply that transform to the path themselves. I generally hate when people say this phrase, but I guess my point is it's not "Pythonic" to give the user two ways to do the same thing unnecessarily.
  2. From the perspective of a new user, this adds undue learning burden. In order to use a custom marker in a reasonable way, I don't want to have to understand what a Transform is as well. Even if we provide some good defaults there will always be the cognitive burden if a user has an issue and they start to ask themselves "is figuring out what this Transform thing is what I need to do to solve my problem?"
  3. I don't understand the use case where someone would do
    MarkerStyle(path, transform=transform instead of MarkerStyle(transform.transform_path(path), transform=IdentityTransform()), since these would become synonymous. I definitely don't understand why you would pass a transform_factory, which presumably would just require a bunch of boilerplate in order to do a call that looks like MarkerStyle(path, transform=factory), when would be exactly the same thing to do transform = factory(path) followed by MarkerStyle(transform.transform_path(path), transform=IdentityTransform()).
    When would someone ever be in possession of such a "transform factory" object that providing thi API would make their lives easier? After all, a given markerstyle can't have more than one transform. The only reason "transform" even existed in the first place was to make the construction of the "build-in" markers easier to read (as code), and maybe to match the style of the rest of the library.
  4. Most importantly to me, it feels like this proposal unnecessarily promotes the .transform property of the MarkerStyle class to be central to its use, whereas I (and I think @anntzer as well) are of the opinion that it should be removed altogether (as it was and should remain an outdated implementation detail).

My alternate proposal (see #16891), would be to instead provide the user with two convenience arguments normalization and centering that basically correspond to the special cases where transform is Affine2D().scale and Affine2D().translate, but with the amount of scaling/transforming chosen to be something generally useful (scale to area=1, center of mass to origin, etc). If the user wants to do something more complicated, it seems like it would be less work for them to just transform the path themselves?

@jklymak
Copy link
Member

jklymak commented Mar 23, 2020

but I would like to say that I highly prefer the original proposal,

Can you clarify what the "original proposal" is?

@brunobeltran
Copy link
Contributor

brunobeltran commented Mar 23, 2020

but I would like to say that I highly prefer the original proposal,

Can you clarify what the "original proposal" is?

Sorry: by "original proposal" I meant @ImportanceOfBeingErnest's proposal to have the user pass in an arbitrary path if they want, and to only provide them with some helpers for rescaling/etc using string inputs (instead of asking them to control the marker's ._transform, which is currently just an implementation detail of the existing built-in marker types).

The built-in paths would by default not be affected but if you requested e.g. normalization='area', then it would work just as well for the built-in paths as for a custom path that way.

Otherwise, (if we instead ask the user to pass in a transform object) in order to say, normalize "*", the user would have to know what transform is currently being applied to it, or otherwise do something that feels silly like ``` star_path = MarkerStyle('*').get_path() normed_star = MarkerStyle(star_path, tranform='unit-area') ```

@ImportanceOfBeingErnest
Copy link
Member Author

I guess adding a transform argument (instead of normalization) is totally possible. I just wonder if that doesn't make things unnecessarily confusing. While we do have transform as argument elsewhere in the library, it's always the actual transform of the artist.
The case here is slightly different, because the transform is just the transform of the path that is applied before going through the rest of the transform system. In particular I see the following confusing cases:

  • transform here would take very different inputs than elsewhere
  • User might try MarkerStyle(..., transform=ax.transData) - leading to nonsensical output.
  • MarkerStyle("d", transform=IdentityTransform()) would result in a unit square instead of a skewed diamond.

@jklymak
Copy link
Member

jklymak commented Mar 25, 2020

I don't think that a transform kwarg is any more confusing than a set_transform method. Allowing transform to take additional string arguments or special transform subclasses for paths that achieve the normalizations doesn't seem too confusing. Its just strange to allow access to a path._transform object, allow the user to path.set_transform, but then make the init kwarg normalization with no direct access to transform.

@brunobeltran
Copy link
Contributor

brunobeltran commented Mar 25, 2020

I just don't think the user should have access to ._transform at all. It's currently (and feels like it should remain) an implementation detail (edit: am I wrong about this?). They can already pass in arbitrary paths, the point of normalization kwarg was just to make their lives easier I think, not to give them a way to tap into MarkerStyle's implementation?

@jklymak
Copy link
Member

jklymak commented Mar 25, 2020

@brunobeltran OK, so by "Original proposal" you didn't mean this PR?

I guess I agree that an expert user can just set a transform before passing to MarkerStyle so its not clear why the set_transform needs to be implemented here. Whether there should be a bunch of normalization and offset options is a different question.

@ImportanceOfBeingErnest
Copy link
Member Author

The transform is not an "implementation detail", because .get_transform is public. As said, one could deprecate it, if that is desired.

Indeed, .set_transform is not strictly needed. But it's a nice to have feature that allows to scale or distort existing markers. I.e.

marker=MarkerStyle("d")
marker.set_transform(marker.get_transform() + Affine2D().rotate_deg(30))

is just quicker / less convoluted than

marker = MarkerStyle("d")
marker = MarkerStyle(marker.get_path().transformed(marker.get_transform()+ Affine2D().rotate_deg(30)), normalization="none")

This assumes .get_transform is not deprecated. If it is, the above is not possible any more and one would need to construct the complete marker path from scratch.

@brunobeltran
Copy link
Contributor

brunobeltran commented Mar 31, 2020

@ImportanceOfBeingErnest You're right, if we get rid of the [get/set]_transform interface, for each new marker you want, you would need to pass the fully transformed Path to MarkerStyle.

But you definitely don't have to "construct the marker from scratch". The convoluted API you're describing is still assuming that ._transform exists in the first place.

Actually deprecating .get_transform() would be quite the pain. It was clearly originally included to maintain a uniform interface with Artist's, and is used throughout the collections and backend code.

On the other hand, we can easily leave it in place and just have it always return IdentityTransform. All we have to do is a really simple code change to have the "old" transforms be applied during marker construction.

Then the Path will be enough, you never need to access the transform at all (although it will still be possible to support all that old backend/collections code that depends on it).

d_path = MarkerStyle("d").get_path()
rot_d = Affine2D().rotate_deg(30).transform_path(d_path)
marker = MarkerStyle(rot_d)

or similar.

@brunobeltran
Copy link
Contributor

@brunobeltran OK, so by "Original proposal" you didn't mean this PR?

I do mean this PR (and its associated discussion), as opposed to the alternate proposal from the call last week that we instead use a Transform kwarg.

Whether there should be a bunch of normalization and offset options is a different question.

Maybe whether there should be "a bunch" of normalization and offset options is a different question (I propose two kwargs, "normalization" and "centering", not "a bunch"). However, this PR adds a normalization kwarg. So that's definitely a question for right now. I am definitely in favor of a "normalization" kwarg instead of forcing a user to pass a Transform, which is currently the only other proposal.

@ImportanceOfBeingErnest
Copy link
Member Author

Note that deprecating .get_transform() is much easier than having it return the identity transform. The latter would be an API change. Also mind that .get_path() suddenly returning a different path would be an API change as well.

The only clean way to do this would be to give both a new kwarg that sets the return, in the sense of
.get_transform(newstyle=False) / .get_path(newstyle=False), then in a subsequent release turning the default to True. This gets particularly ugly, because during transition you need to have both concepts fully implemented internally.

I'm not convinced that this is really needed. Also, having the transform on a generic path is a nice thing and keeps the code short. The transform would also be useful for implementing any of the other types, which mostly are just a scaled or translated version of the existing ones.

@brunobeltran
Copy link
Contributor

I've played around with custom markers more this week, and I am pretty convinced that allowing a user to modify the marker in-place at all is not ideal.

Some issues:

  1. If you create a marker, then plot with it, then modify that marker, then plot again, you affect how the first plot is saved, which MAY be what you want, but I kept finding that it wasn't.
  2. In practice, if I'm trying to modify a marker, I need to get it's "transformed path" in order to figure out how to transform it in the first place (except for rotations, which is the example you provided above). Typically, I can't do something like marker.set_transform(marker.get_transform() + additional_trasnform), because I need to know something about the path, so I still have to do
actual_path = marker.get_transform().transform(marker.get_path())
width = actual_path.get_extents().width  # compute something about path
marker.set_transform(marker.get_transform() + Affine2D().scale(1/width))

and at that point, I might as well (i.e. it's not really extra typing, and definitely feels more intuitive/discoverable) to just do

path = marker.get_path(transformed=True)
width = path.get_extents().width  # compute something about path
new_path = Affine2D().scale(1/width).transform_path(path)
new_marker = MarkerStyle(new_path)

where transformed=True could be added as a temporary kwarg while we phase out the old return values, as you propose above.

That said, I could easily be persuaded if I was given a use case where modifying the marker after creating it is useful, I just wasn't able to find one.

Note that deprecating .get_transform() is much easier than having it return the identity transform. The latter would be an API change. Also mind that .get_path() suddenly returning a different path would be an API change as well.

I probably just don't understand the matplotlib development constraints well enough yet, but these both seem like API changes to me? In one we remove a method, in another we change it's return value? It's just as easy to add a kwarg and a deprecation warning, saying that in the next version, we'll be switching to the transformed=True behavior by default and removing the transformed kwarg.

The development burden to remove .get_transform() is much higher, since it's used in the backend in a duck-type-y way, and so while I could probably remove all internal instances of it, I'm not confident enough in our test coverage to be sure I wouldn't break something.

@QuLogic QuLogic modified the milestones: v3.3.0, v3.4.0 May 5, 2020
@QuLogic QuLogic modified the milestones: v3.4.0, v3.5.0 Jan 22, 2021
@jklymak jklymak marked this pull request as draft May 8, 2021 17:49
@jklymak
Copy link
Member

jklymak commented May 8, 2021

@ImportanceOfBeingErnest @brunobeltran is there interest in pushing this forward, either in the context of this PR or a new one? I guess the original motivation makes it seem that this would be generally helpful, though I'm not sure we agreed on the API.

@QuLogic QuLogic modified the milestones: v3.5.0, v3.6.0 Aug 21, 2021
@timhoffm timhoffm modified the milestones: v3.6.0, unassigned Apr 30, 2022
@story645 story645 modified the milestones: unassigned, needs sorting Oct 6, 2022
@tacaswell
Copy link
Member

This functionality was implemented in #20914 in a way inline with #16773 (comment) (adding a transform keyword but composing it with the default transform so slightly differently parameterized than this proposal, but I think with the same capabilities).

Thank you for your work on this @ImportanceOfBeingErnest and @brunobeltran . I do hope we hear from both of you again!

@tacaswell tacaswell closed this Dec 6, 2022
@QuLogic QuLogic removed this from the future releases milestone Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants