Skip to content

FIX: get_datalim for collection #13642

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 19, 2019
Merged

Conversation

jklymak
Copy link
Member

@jklymak jklymak commented Mar 10, 2019

PR Summary

EDIT 12 Mar:

Closes #13639

Closes the non-unique auto-limit problem in #7413 for collections. Collections have "offsets" and "paths" and each can be in a different co-ordinate frames; This PR tackles the autolim problem by consulting the self.get_transform() and self.get_offset_transform() for the collection and checking if these are ax.dataTrans or not.

For collection.get_datalim: There are two cases that I can see in common usage:

A. offsets=None for LineCollection. In this case, all the vertices in paths should be included in the autolim if transform=ax.dataTrans. (this is the case for streamplot)

B offset is not None:

  1. if the offset coords (offsetTrans) is not the axes dataTrans, then don't auto-lim on this collection at all. I don't know off hand of any built-in examples like this, but if the user specifies a bunch of markers in AxesTrans co-ords, then they shouldn't expect the axes to change its x/ylims.

  2. If the path coords (self.transform) is dataTrans then we use the existing algorithm. This is useful for paths like in quiver, where the offset defines the tail of an arrow, and the path defines it's length. In that case, the whole arrow is meant to be in data space, and the lims can change size accordingly

  3. if the path co-ords are not dataTrans, then we don't consult the path at all, because these paths are in some other unit that doesn't help set the data limits. i.e. scatter markers are in points, and these can be arbitrarily large (even larger than the figure if you want), so setting the axes data limits based on this size is ill-defined. That lead to many of the issues in Autoscaling has fundamental problems #7413. Of course the offset limits should still be included in the limits (i,e. we want the centre of markers to be in the new x/ylims.)

This necessitated 14 image tests needing to be changed, mostly due to small axes limit differences in scatter tests.

The major exception is test_transforms.py::test_pre_transform_plotting which highlights a limitation of the consult-the-transforms approach used here. If the user passes a simple transform like transform = mtransforms.Affine2D().scale(10) + ax.dataTrans the autolimiting will not take place because transform != ax.dataTrans. Thats a cost of the approach, but I personally feel users should just transform the data before asking matplotlib to plot it, or be willing to manually adjust the axes limits.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame({'Depth': [0.2, 0.4, 0.4, 0.4, 0.4, 0.4, 0.6, 0.4, 3.2, 2.0],
                   'DateTimeUTC': [pd.Timestamp('2018-03-28 06:25:08'),
                                   pd.Timestamp('2018-03-28 06:25:49'),
                                   pd.Timestamp('2018-03-28 06:27:06'),
                                   pd.Timestamp('2018-03-28 06:32:11'),
                                   pd.Timestamp('2018-03-28 06:32:59'),
                                   pd.Timestamp('2018-03-28 06:34:02'),
                                   pd.Timestamp('2018-03-28 06:35:38'),
                                   pd.Timestamp('2018-03-28 06:37:04'),
                                   pd.Timestamp('2018-03-28 06:39:08'),
                                   pd.Timestamp('2018-03-28 06:40:52')]})

x = np.asarray(df['DateTimeUTC'])
y = np.asarray(df['Depth'])

fig, axs = plt.subplots(2, 1)

ax = axs[0]
#pp = ax.plot(x, y, '.')
ax.scatter(x, y, s=25, marker='d')
ax.set_title("using scatter")

ax = axs[1]
ax.plot(x, y)
ax.set_title("Using plot")

fig.tight_layout()
plt.show()

Before

Figure_2

After

Figure_1

Big Markers

Note that big markers will be clipped. This is "worse", but its consistent...

Before:

boo2

After:

boo

Cures inconsistency

To see the inconsistency in the old way not how the xlim changes with each re-lim:

import matplotlib.pyplot as plt

fig, ax = plt.subplots()

r = 2 * 72
sc = ax.scatter([0, 1], [0, 1], s=r * r * 3.14)

plt.show()
xmin = []
for i in range(4):
    fig.canvas.draw()
    ax.update_datalim(sc.get_datalim(ax.transData))
    ax.autoscale()
    print(ax.get_xlim())
    xmin += [ax.get_xlim()[0]]
(-0.5934780784674604, 1.59347807846746)
(-0.9094406858496092, 1.9094406858496091)
(-1.15777773561647, 2.15777773561647)
(-1.3529631431489448, 2.3529631431489455)

Of course the resulting markers are still clipped anyways:

Before:

boo3

After

(-0.05, 1.05)
(-0.05, 1.05)
(-0.05, 1.05)
(-0.05, 1.05)

boo4

Failures:

test_axes

Failures are minor axes limits changing slightly....

test_collections.py

test_barb_limits fails because it is constructed as:

    x = np.linspace(-5, 10, 20)
    y = np.linspace(-2, 4, 10)
    y, x = np.meshgrid(y, x)
    trans = mtransforms.Affine2D().translate(25, 32) + ax.transData
    plt.barbs(x, y, np.sin(x), np.cos(y), transform=trans)

and that fails the rubric above because the offset transform is not the data transform. Not sure about how people feel about this - personally I'm not a fan of defining data at one location and then using the transform stack to move the object around, instead of just manipulating the original data, and if folks want to do this, I'm OK with auto_lims failing for them. If you are going to manually move things around with a low-level stack, then you can also be expected to set your limits accordingly: flexibility comes at the cost of automation.

test_EllipseCollection, test_regularpolycollection_rotate fail somewhat badly because they are big paths that used to get some space from the existing algorithm, and no longer do. scatter_post_alpha is a very minor change....

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@jklymak
Copy link
Member Author

jklymak commented Mar 10, 2019

This is WIP from a still-need-to-fix-tests point of view, but I wanted to have a concrete proposal on the table for fixing part of #7413.

@anntzer
Copy link
Contributor

anntzer commented Mar 10, 2019

I think the general approach is good, haven't reviewed carefully yet.
Does this close #6915/#11198 too?
In #7413 (comment) @tacaswell argued in favor of accepting clipped markers too. Despite my comments on that thread I now also think ignoring marker extents in scatter makes complete sense (the "larger-than-figure marker" case is quite convincing).

@jklymak
Copy link
Member Author

jklymak commented Mar 10, 2019

No it doesn’t fix #6915. But I don’t think it’s incompatible with your PR leaving the autolim to draw time which makes sense to me.

@jklymak
Copy link
Member Author

jklymak commented Mar 11, 2019

OK, this breaks streamplots, mostly in manageable ways, but a couple of the failures I can't figure out yet.

@@ -181,7 +181,12 @@ def get_offset_transform(self):
def get_datalim(self, transData):
transform = self.get_transform()
transOffset = self.get_offset_transform()
if not transOffset == transData:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transOffset != transData

@jklymak
Copy link
Member Author

jklymak commented Mar 12, 2019

So, this PR damages autolim-ing on constructs like in test_transforms:

times10 = mtransforms.Affine2D().scale(10)
ax.scatter(np.linspace(0, 10), np.linspace(10, 0), transform=times10 + ax.transData)

I'm OK with that, but wanted to let folks know... The new test file looks fine, but the limits have changed substantially: https://github.com/matplotlib/matplotlib/blob/d3f8447cc69b21be885c8683c942043e2903ac7f/lib/matplotlib/tests/baseline_images/test_transforms/pre_transform_data.png

@jklymak
Copy link
Member Author

jklymak commented Mar 12, 2019

BTW, is there an easy way to tell if a transform has ax.dataTrans in it? i.e. if someone has done a simple offsetTrans+ax.dataTrans?

@anntzer
Copy link
Contributor

anntzer commented Mar 12, 2019

@jklymak Have a look at contains_branch or contains_branch_separately (not exactly sure about their use, but I think they may be what you want).

@jklymak
Copy link
Member Author

jklymak commented Mar 12, 2019

Please see updated description above.

I can make an API change doc etc, but wanted feedback on the approach first. Ping @ImportanceOfBeingErnest as you have been known to make extensive use of the transform machinery. This will break some autolimits for some transforms.

@anntzer
Copy link
Contributor

anntzer commented Mar 12, 2019

Instead of changing the baseline images, can you keep the old ones, add an assert that the x/ylims are what they should be per the new approach, then force the x/ylims to match the old values?

@anntzer
Copy link
Contributor

anntzer commented Mar 12, 2019

I'll wait until you look into contains_branch and possibly don't change the baseline images?
But again, generally speaking I like the approach.

@jklymak
Copy link
Member Author

jklymak commented Mar 12, 2019

Instead of changing the baseline images, can you keep the old ones, add an assert that the x/ylims are what they should be per the new approach, then force the x/ylims to match the old values?

Moved this discussion to gitter 😉

@jklymak
Copy link
Member Author

jklymak commented Mar 12, 2019

I'll wait until you look into contains_branch...

Seems to work! The transform=times10 + ax.transData now works, though the image still changes because the scatter in that test changes its limits....

@anntzer
Copy link
Contributor

anntzer commented Mar 12, 2019

copypasting my reply on gitter:

Instead of regenerating the images, it may also be worth just checking whether we could do away with some of the tests, in case they're redundant with others
For example the test that generates scatter.png could become a check_figures_equal comparing scatter() with manually calling plot() to add one marker at a time
scatter_post_alpha too
I guess colorbar_single_scatter could just not do an image check but check that the colorbar bounds are correctly set
Also, fewer baseline-image tests also help mplcairo :) (as they are renderer-dependent)

@jklymak
Copy link
Member Author

jklymak commented Mar 12, 2019

Fair enough - I'll look at rejiggering the tests after we get some consensus that this is a good way to go. Particularly as A/B-ing the tests shows the end-effects of this change for users....

@jklymak
Copy link
Member Author

jklymak commented Mar 12, 2019

...alos ping @efiring

y, x = np.meshgrid(y, x)
trans = mtransforms.Affine2D().translate(25, 32) + ax.transData
plt.barbs(x, y, np.sin(x), np.cos(y), transform=trans)
# trans = mtransforms.Affine2D().translate(25, 32) + ax.transData
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just delete this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, meant to re-Instate this actually. Thanks....

transform = self.get_transform()
transOffset = self.get_offset_transform()
if (not self._offsetsNone and
not transOffset.contains_branch(transData)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess(?) you could get a few extra points by checking contains_branch_separately and then deciding to use the x or the y data limits if one of them does contain transData; this could be a separate PR too as the use case is a bit obscure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just forget this case for now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on gitter, I think thats best for simplicity sake. For instance, I don't know what the datalim handling does with infs (for instance) and adding this case is a good chunk of code. As also discussed, very happy to revisit if it turns out to be needed.

@anntzer
Copy link
Contributor

anntzer commented Mar 12, 2019

btw, I don't mind doing the conversion to check_figures_equal if you want.

@ImportanceOfBeingErnest
Copy link
Member

I will need to look at this more thoroughly. Based on a quick glance I wonder if we can't keep using mpath.get_path_collection_extents even in the cases you exclude here. It does not seem to give generally bad output, it just needs a good frozen transform to start with (maybe that is the hard part, but maybe not?) and it will then not be 100% accurate - but probably still better then cropping half the marker.

@jklymak jklymak force-pushed the fix-relim-collection branch from 1162001 to 28f63d0 Compare March 12, 2019 22:24
@jklymak
Copy link
Member Author

jklymak commented Mar 12, 2019

I will need to look at this more thoroughly. Based on a quick glance I wonder if we can't keep using mpath.get_path_collection_extents even in the cases you exclude here. It does not seem to give generally bad output, it just needs a good frozen transform to start with (maybe that is the hard part, but maybe not?) and it will then not be 100% accurate - but probably still better then cropping half the marker.

We could heuristically do that when the marker size is not too big (probably a quarter of the axes size), but you still need to solve the inverse problem, which isn't necessarily trivial particularly if you need to take into account the scale of the axes, or iterate to a solution as discussed before. Or accept the present state, which I think we can all agree is broken..

@jklymak
Copy link
Member Author

jklymak commented Mar 13, 2019

One further point, this doesn't typically cut off markers because of the default data margins. Just really big markers... Which are also the ones that cause auto-lim instability...

@tacaswell tacaswell added this to the v3.2.0 milestone Mar 17, 2019
@anntzer anntzer added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label May 2, 2019
@jklymak jklymak force-pushed the fix-relim-collection branch 2 times, most recently from 41e75ec to a005562 Compare July 18, 2019 16:22
# (i.e. like scatter). We can't uniquely set limits based on
# those shapes, so we just set the limits based on their
# location.
# Finish the transform:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the same spirit as #14845 this could be

offset_to_data = transOffset - transData
offsets = offset_to_data.transform(offsets)

(you also save a transform by doing so).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't paying attention when #14845 went in. This requires remembering that transA - transB is the same as what I wrote here. I don't think it helps the readability/comprehensibility of all the transform magic to have arithmetic operators that do stuff versus just using explicit methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main operator is trA+trB, which just means "perform trA, then perform trB"; once you know it it's quite natural that trA-trB means "perform trA, then perform the inverse of trB". You could write mtransforms.composite_transform_factory(trA, trB.inverse()) if you prefer explicit method names, but the point is that it is more efficient to first multiply two small (3x3) matrices and then use the result to transform a big (N,2) array, rather than to do two transformations in a series.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Split the difference: (trA + trB.inverted()) seems straightforward enough. Like, I understand the __sub__ convention, I just find it a bit obscure. The __add__ convention is fine.

@jklymak jklymak force-pushed the fix-relim-collection branch from a005562 to f19ae31 Compare July 18, 2019 18:04
Copy link
Contributor

@anntzer anntzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modulo ci

@jklymak
Copy link
Member Author

jklymak commented Jul 18, 2019

Thanks for the help and suggestions @anntzer, @QuLogic, and @efiring !

Copy link
Member

@QuLogic QuLogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, Axes.transData does not link (I think because it's an undocumented instance variable), but there's one more that can be fixed:

Co-Authored-By: Elliott Sales de Andrade <quantum.analyst@gmail.com>
@QuLogic QuLogic merged commit b03370a into matplotlib:master Jul 19, 2019
@jklymak jklymak deleted the fix-relim-collection branch May 8, 2021 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

scatter and date auto-lim far too wide....
6 participants