Cleanup transforms.py. #7394

anntzer · 2016-11-03T07:24:16Z

Simplify the implementation of the various contains-like methods.

Switch to modern property declarations. Cleanup some docstrings.

tacaswell · 2016-11-03T22:47:48Z

lib/matplotlib/transforms.py


    def get_points(self):
-        return NotImplementedError()
+        raise NotImplementedError


Should this be raise NotImplementedError() ?

https://docs.python.org/3/reference/simple_stmts.html#the-raise-statement:
Otherwise, raise evaluates the first expression as the exception object. It must be either a subclass or an instance of BaseException. If it is a class, the exception instance will be obtained when needed by instantiating the class with no arguments.

huh, never used it like that.

raise ValueError is pretty common? (if you're too lazy (hem hem) to write a proper error message)

tacaswell · 2016-11-03T23:18:08Z

lib/matplotlib/transforms.py

+        """
+        (property) :attr:`xmin` is the left edge of the bounding box.
+        """
+        return np.min(self.get_points()[:, 0])


why np.min over min?

So that nan passes through -- which is probably the right thing to do? (min(np.nan, 1) is 1, np.min(np.nan, 1) is np.nan.) (and in fact needed for the various new implementations of the contains/overlaps methods)

story645 · 2016-11-03T23:32:13Z

I know you're not touching this, but is all the if DEBUG code in there necessary? (And does it ever get run considering DEBUG=false at the top of that code...)

tacaswell · 2016-11-03T23:42:08Z

I am a tad concerned about performance regressions on this.

attn @mdboom as you last touched / wrote this code in 2008.

anntzer · 2016-11-03T23:42:32Z

@story645 There's a bunch of if DEBUG a bit everywhere in the codebase, if you want to clean them all up go ahead :-)

story645 · 2016-11-03T23:59:41Z

git grep says the problem isn't so bad...bunch of .cpp and backend files 😅, but otherwise it's offsetbox, texmanger and transforms.

NelleV · 2016-11-05T00:20:10Z

lib/matplotlib/transforms.py

-        return self.get_points()[:, 1]
-    intervaly = property(_get_intervaly, None, None, """
+    @property
+    def intervaly(self):


Doesn't that make this in the public API?
Can we keep this private?

Never mind… Sorry about the noise.

You bet I would be extra careful not to add any public API :-)
intervaly was a public property; it is still a public property when set like this. (And in both cases the function that implements the getter is available as Bbox.invervaly.fget; it used to also be available as Bbox._get_intervaly but this PR removes that.)

NelleV

Thanks @anntzer for the clean up.
This looks good to me.

NelleV · 2016-11-05T00:21:51Z

lib/matplotlib/transforms.py

-        return ((x0 < x1
-                 and (x >= x0 and x <= x1))
-                or (x >= x1 and x <= x0))
+        return self.xmin <= x <= self.xmax


That's so much better…

QuLogic · 2016-11-05T03:06:30Z

lib/matplotlib/transforms.py

@@ -955,62 +897,62 @@ def update_from_data_xy(self, xy, ignore=None, updatex=True, updatey=True):
    def _set_x0(self, val):
        self._points[0, 0] = val
        self.invalidate()
-    x0 = property(BboxBase._get_x0, _set_x0)
+    x0 = property(BboxBase.x0.fget, _set_x0)


You could write this in the decorator form as:

@BboxBase.x0.setter def x0(self, val): self._points[0, 0] = val self.invalidate()

or at least, it seems to work for me...

No, because this would update the setter of BboxBase too. I could write

@partial(property, BboxBase.x0.fget) def x0(self, val): ...

but that hardly seems like an improvement in readability...

Ah, never mind then...

It does not rewrite the setter of base class for me (on both 2.7 and 3.5).

class A(object): def __init__(self): self._z = None @property def z(self): return self._z @z.setter def z(self, val): print('A') self._z = val class B(A): @A.z.setter def z(self, val): print('B') self._z = val a = A() a.z = 123 print(a.z) b = B() b.z = 456 print(b.z) class C(object): def __init__(self): self._z = None @property def z(self): return self._z class D(A): @C.z.setter def z(self, val): print('D') self._z = val try: c = C() c.z = 123 print('Error') except AttributeError: print('Pass') d = D() d.z = 456 print(d.z)

A 123 B 456 Pass D 456

Oh, indeed prop.setter returns a new property, leaving the previous one untouched. The docs actually explicitly say that:

A property object has getter, setter, and deleter methods usable as decorators that create a copy of the property with the corresponding accessor function set to the decorated function.

(emphasis mine)

One new thing I learned today... and I'll update the PR accordngly.

anntzer · 2016-11-05T05:54:20Z

@tacaswell I think your concerns about performance are very valid. For example

%%timeit np.random.seed(0); N = 100; bbs = [Bbox(t) for t in np.random.rand(N, 2, 2)]; xs = np.random.rand(N)
[b.containsx(x) for b, x in zip(bbs, xs)]

is three times slower.
I'll see what I can do.

anntzer · 2016-11-05T23:17:14Z

I tried adding a _normalized_extents property that would compute all of xmin, xmax, ymin, ymax at the same time (this seemed to be the main performance loss) so that they could be used later, but it seems that that's not enough... basically it seems that something like (xmin, xmax), (ymin, ymax) = np.sort(self.get_points(), axis=1) will never be as fast as the old implementation.

Now the question is, does this actually matter? It would be nice to have some actual performance tests (on "real" plots) to check that.

Kojoley · 2016-11-08T20:19:33Z

@anntzer If I understand you correctly you are talking about fast onepass minmax algorithm. While it makes 25% less comparisons (1.5*n+1 vs 2*n) you will see no difference on CPython because of interpreter overhead. After all len(self.get_points()) is always 2, so there is no point of using it.

anntzer · 2016-11-08T20:33:40Z

The point is that

x1, x2 = ...
return x1 < x < x2 if x1 < x2 else x2 < x < x1

will be faster than anything that goes through a function (/ property) call.

Kojoley · 2016-11-08T21:00:43Z

Ah, so you worry about added function call/property access overhead?

class Base(object):
    intervalx = 123, 456


class A(Base):
    def fully_containsx(self, x):
        x0, x1 = self.intervalx
        return ((x0 < x1
                 and (x > x0 and x < x1))
                or (x > x1 and x < x0))


class B(Base):
    @property
    def xmin(self):
        return min(self.intervalx)

    @property
    def xmax(self):
        return max(self.intervalx)

    def fully_containsx(self, x):
        return self.xmin < x < self.xmax

>python -m timeit "from prop import A; a = A();" "for _ in range(10000): a.fully_containsx(321)"
100 loops, best of 3: 2.77 msec per loop

>python -m timeit "from prop import B; b = B();" "for _ in range(10000): b.fully_containsx(321)"
100 loops, best of 3: 7.96 msec per loop

anntzer · 2016-11-08T21:16:13Z

That's basically the difference I measured, yes.

codecov-io · 2016-12-02T23:57:17Z

Current coverage is 61.86% (diff: 90.62%)

Merging #7394 into master will decrease coverage by 4.56%

@@             master      #7394   diff @@
==========================================
  Files           109        173     +64   
  Lines         46780      56092   +9312   
  Methods           0          0           
  Messages          0          0           
  Branches          0          0           
==========================================
+ Hits          31075      34702   +3627   
- Misses        15705      21390   +5685   
  Partials          0          0

Powered by Codecov. Last update dfd38f7...1859a22

!!

QuLogic · 2017-01-19T09:49:26Z

Any updates? Needs a rebase.

anntzer · 2017-01-19T18:07:07Z

Given the reasonable performance concerns I don't think this can go forward unless we somehow add performance regression tests and show that there won't be a significant slowdown.

NelleV · 2017-01-19T18:21:59Z

I wouldn't add a performance regression tests, but I think it is reasonable to benchmark the two options implementations.

anntzer · 2017-01-19T18:28:21Z

But this is so much down the stack that it could literally affect anything, so I don't think a microbenchmark really makes sense. I have no idea, for example, whether it would make plotting a lot of artists much slower because it may be doing a bunch of bbox comparisons... or perhaps not.

Simplify the implementation of various methods. Note the switch to `np.min` rather than `min`, which avoids silently dropping `nan`s. Switch to modern property declarations. Cleanup some docstrings.

anntzer · 2017-02-21T00:54:48Z

@tacaswell I restored a fast(?) implementation of overlaps/fully_overlaps (and made it faster by not checking for nan anymore, relying on the fact that the comparison operators will return False in presence of nans...). Other than that I don't see any other possible performance issue right now.

tacaswell · 2017-02-21T03:03:36Z

lib/matplotlib/transforms.py

-            y1 = max(y1, np.max(ys))
-
-        return Bbox.from_extents(x0, y0, x1, y1)
+        x0 = min(bbox.xmin for bbox in bboxes)


Can np.nan or np.inf come out of these?

I guess I should use np.min everywhere (which passes nan through, whereas builtin min drops them). (At least this would be consistent with bbox.xmin.)

Added a note to that effect.

tacaswell

Looks good to me

anntzer force-pushed the transforms-cleanup branch 4 times, most recently from e1ce046 to 2def7be Compare November 3, 2016 09:45

tacaswell reviewed Nov 3, 2016

View reviewed changes

tacaswell added this to the 2.1 (next point release) milestone Nov 3, 2016

tacaswell reviewed Nov 3, 2016

View reviewed changes

NelleV reviewed Nov 5, 2016

View reviewed changes

NelleV previously approved these changes Nov 5, 2016

View reviewed changes

NelleV changed the title ~~Cleanup transforms.py.~~ [MRG+1] Cleanup transforms.py. Nov 5, 2016

QuLogic reviewed Nov 5, 2016

View reviewed changes

QuLogic changed the title ~~[MRG+1] Cleanup transforms.py.~~ Cleanup transforms.py. Nov 5, 2016

anntzer force-pushed the transforms-cleanup branch from 2def7be to 8a3bc52 Compare November 8, 2016 20:50

anntzer force-pushed the transforms-cleanup branch 2 times, most recently from 2f8a568 to 1859a22 Compare December 2, 2016 18:10

Cleanup transforms.py.

c2aabe9

Simplify the implementation of various methods. Note the switch to `np.min` rather than `min`, which avoids silently dropping `nan`s. Switch to modern property declarations. Cleanup some docstrings.

anntzer force-pushed the transforms-cleanup branch from 1859a22 to c681756 Compare February 21, 2017 00:53

Restore fast(er) impl. of overlaps.

aa406bf

anntzer force-pushed the transforms-cleanup branch from c681756 to aa406bf Compare February 21, 2017 01:36

tacaswell reviewed Feb 21, 2017

View reviewed changes

tacaswell approved these changes Feb 21, 2017

View reviewed changes

Don't drop nans in bbox computations.

297a753

QuLogic merged commit 2bce760 into matplotlib:master Feb 21, 2017

anntzer deleted the transforms-cleanup branch February 21, 2017 06:01

anntzer mentioned this pull request Feb 22, 2017

Default legend location 'best' is extremely slow #8108

Open

Cleanup transforms.py. #7394

Cleanup transforms.py. #7394

Conversation

anntzer commented Nov 3, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

story645 commented Nov 3, 2016 • edited Loading

tacaswell commented Nov 3, 2016

anntzer commented Nov 3, 2016

story645 commented Nov 3, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NelleV left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kojoley Nov 8, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anntzer commented Nov 5, 2016

anntzer commented Nov 5, 2016

Kojoley commented Nov 8, 2016

anntzer commented Nov 8, 2016

Kojoley commented Nov 8, 2016

anntzer commented Nov 8, 2016

codecov-io commented Dec 2, 2016

Current coverage is 61.86% (diff: 90.62%)

QuLogic commented Jan 19, 2017

anntzer commented Jan 19, 2017

NelleV commented Jan 19, 2017

anntzer commented Jan 19, 2017

anntzer commented Feb 21, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tacaswell left a comment

Choose a reason for hiding this comment

story645 commented Nov 3, 2016 •

edited

Loading

Kojoley Nov 8, 2016 •

edited

Loading