Skip to content

Cleanup transforms.py. #7394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 21, 2017
Merged

Conversation

anntzer
Copy link
Contributor

@anntzer anntzer commented Nov 3, 2016

Simplify the implementation of the various contains-like methods.

Switch to modern property declarations. Cleanup some docstrings.

@anntzer anntzer force-pushed the transforms-cleanup branch 4 times, most recently from e1ce046 to 2def7be Compare November 3, 2016 09:45

def get_points(self):
return NotImplementedError()
raise NotImplementedError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be raise NotImplementedError() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.python.org/3/reference/simple_stmts.html#the-raise-statement:
Otherwise, raise evaluates the first expression as the exception object. It must be either a subclass or an instance of BaseException. If it is a class, the exception instance will be obtained when needed by instantiating the class with no arguments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, never used it like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise ValueError is pretty common? (if you're too lazy (hem hem) to write a proper error message)

@tacaswell tacaswell added this to the 2.1 (next point release) milestone Nov 3, 2016
"""
(property) :attr:`xmin` is the left edge of the bounding box.
"""
return np.min(self.get_points()[:, 0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why np.min over min?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that nan passes through -- which is probably the right thing to do? (min(np.nan, 1) is 1, np.min(np.nan, 1) is np.nan.) (and in fact needed for the various new implementations of the contains/overlaps methods)

@story645
Copy link
Member

story645 commented Nov 3, 2016

I know you're not touching this, but is all the if DEBUG code in there necessary? (And does it ever get run considering DEBUG=false at the top of that code...)

@tacaswell
Copy link
Member

I am a tad concerned about performance regressions on this.

attn @mdboom as you last touched / wrote this code in 2008.

@anntzer
Copy link
Contributor Author

anntzer commented Nov 3, 2016

@story645 There's a bunch of if DEBUG a bit everywhere in the codebase, if you want to clean them all up go ahead :-)

@story645
Copy link
Member

story645 commented Nov 3, 2016

git grep says the problem isn't so bad...bunch of .cpp and backend files 😅, but otherwise it's offsetbox, texmanger and transforms.

return self.get_points()[:, 1]
intervaly = property(_get_intervaly, None, None, """
@property
def intervaly(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that make this in the public API?
Can we keep this private?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind… Sorry about the noise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You bet I would be extra careful not to add any public API :-)
intervaly was a public property; it is still a public property when set like this. (And in both cases the function that implements the getter is available as Bbox.invervaly.fget; it used to also be available as Bbox._get_intervaly but this PR removes that.)

NelleV
NelleV previously approved these changes Nov 5, 2016
Copy link
Member

@NelleV NelleV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @anntzer for the clean up.
This looks good to me.

return ((x0 < x1
and (x >= x0 and x <= x1))
or (x >= x1 and x <= x0))
return self.xmin <= x <= self.xmax
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's so much better…

@NelleV NelleV changed the title Cleanup transforms.py. [MRG+1] Cleanup transforms.py. Nov 5, 2016
@@ -955,62 +897,62 @@ def update_from_data_xy(self, xy, ignore=None, updatex=True, updatey=True):
def _set_x0(self, val):
self._points[0, 0] = val
self.invalidate()
x0 = property(BboxBase._get_x0, _set_x0)
x0 = property(BboxBase.x0.fget, _set_x0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could write this in the decorator form as:

@BboxBase.x0.setter
def x0(self, val):
    self._points[0, 0] = val
    self.invalidate()

or at least, it seems to work for me...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because this would update the setter of BboxBase too. I could write

@partial(property, BboxBase.x0.fget)
def x0(self, val): ...

but that hardly seems like an improvement in readability...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, never mind then...

Copy link
Member

@Kojoley Kojoley Nov 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not rewrite the setter of base class for me (on both 2.7 and 3.5).

class A(object):
    def __init__(self):
         self._z = None

    @property
    def z(self):
         return self._z

    @z.setter
    def z(self, val):
         print('A')
         self._z = val

class B(A):
    @A.z.setter
    def z(self, val):
         print('B')
         self._z = val

a = A()
a.z = 123
print(a.z)

b = B()
b.z = 456
print(b.z)


class C(object):
    def __init__(self):
         self._z = None

    @property
    def z(self):
         return self._z

class D(A):
    @C.z.setter
    def z(self, val):
         print('D')
         self._z = val

try:
    c = C()
    c.z = 123
    print('Error')
except AttributeError:
    print('Pass')

d = D()
d.z = 456
print(d.z)
A
123
B
456
Pass
D
456

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, indeed prop.setter returns a new property, leaving the previous one untouched. The docs actually explicitly say that:

A property object has getter, setter, and deleter methods usable as decorators that create a copy of the property with the corresponding accessor function set to the decorated function.

(emphasis mine)

One new thing I learned today... and I'll update the PR accordngly.

@anntzer
Copy link
Contributor Author

anntzer commented Nov 5, 2016

@tacaswell I think your concerns about performance are very valid. For example

%%timeit np.random.seed(0); N = 100; bbs = [Bbox(t) for t in np.random.rand(N, 2, 2)]; xs = np.random.rand(N)
[b.containsx(x) for b, x in zip(bbs, xs)]

is three times slower.
I'll see what I can do.

@QuLogic QuLogic changed the title [MRG+1] Cleanup transforms.py. Cleanup transforms.py. Nov 5, 2016
@anntzer
Copy link
Contributor Author

anntzer commented Nov 5, 2016

I tried adding a _normalized_extents property that would compute all of xmin, xmax, ymin, ymax at the same time (this seemed to be the main performance loss) so that they could be used later, but it seems that that's not enough... basically it seems that something like (xmin, xmax), (ymin, ymax) = np.sort(self.get_points(), axis=1) will never be as fast as the old implementation.

Now the question is, does this actually matter? It would be nice to have some actual performance tests (on "real" plots) to check that.

@Kojoley
Copy link
Member

Kojoley commented Nov 8, 2016

@anntzer If I understand you correctly you are talking about fast onepass minmax algorithm. While it makes 25% less comparisons (1.5*n+1 vs 2*n) you will see no difference on CPython because of interpreter overhead. After all len(self.get_points()) is always 2, so there is no point of using it.

@anntzer
Copy link
Contributor Author

anntzer commented Nov 8, 2016

The point is that

x1, x2 = ...
return x1 < x < x2 if x1 < x2 else x2 < x < x1

will be faster than anything that goes through a function (/ property) call.

@anntzer anntzer force-pushed the transforms-cleanup branch from 2def7be to 8a3bc52 Compare November 8, 2016 20:50
@Kojoley
Copy link
Member

Kojoley commented Nov 8, 2016

Ah, so you worry about added function call/property access overhead?

class Base(object):
    intervalx = 123, 456


class A(Base):
    def fully_containsx(self, x):
        x0, x1 = self.intervalx
        return ((x0 < x1
                 and (x > x0 and x < x1))
                or (x > x1 and x < x0))


class B(Base):
    @property
    def xmin(self):
        return min(self.intervalx)

    @property
    def xmax(self):
        return max(self.intervalx)

    def fully_containsx(self, x):
        return self.xmin < x < self.xmax
>python -m timeit "from prop import A; a = A();" "for _ in range(10000): a.fully_containsx(321)"
100 loops, best of 3: 2.77 msec per loop

>python -m timeit "from prop import B; b = B();" "for _ in range(10000): b.fully_containsx(321)"
100 loops, best of 3: 7.96 msec per loop

@anntzer
Copy link
Contributor Author

anntzer commented Nov 8, 2016

That's basically the difference I measured, yes.

@anntzer anntzer force-pushed the transforms-cleanup branch 2 times, most recently from 2f8a568 to 1859a22 Compare December 2, 2016 18:10
@codecov-io
Copy link

Current coverage is 61.86% (diff: 90.62%)

Merging #7394 into master will decrease coverage by 4.56%

@@             master      #7394   diff @@
==========================================
  Files           109        173     +64   
  Lines         46780      56092   +9312   
  Methods           0          0           
  Messages          0          0           
  Branches          0          0           
==========================================
+ Hits          31075      34702   +3627   
- Misses        15705      21390   +5685   
  Partials          0          0           

Powered by Codecov. Last update dfd38f7...1859a22

@QuLogic
Copy link
Member

QuLogic commented Jan 19, 2017

Any updates? Needs a rebase.

@anntzer
Copy link
Contributor Author

anntzer commented Jan 19, 2017

Given the reasonable performance concerns I don't think this can go forward unless we somehow add performance regression tests and show that there won't be a significant slowdown.

@NelleV
Copy link
Member

NelleV commented Jan 19, 2017

I wouldn't add a performance regression tests, but I think it is reasonable to benchmark the two options implementations.

@anntzer
Copy link
Contributor Author

anntzer commented Jan 19, 2017

But this is so much down the stack that it could literally affect anything, so I don't think a microbenchmark really makes sense. I have no idea, for example, whether it would make plotting a lot of artists much slower because it may be doing a bunch of bbox comparisons... or perhaps not.

Simplify the implementation of various methods.  Note the switch to
`np.min` rather than `min`, which avoids silently dropping `nan`s.

Switch to modern property declarations.  Cleanup some docstrings.
@anntzer
Copy link
Contributor Author

anntzer commented Feb 21, 2017

@tacaswell I restored a fast(?) implementation of overlaps/fully_overlaps (and made it faster by not checking for nan anymore, relying on the fact that the comparison operators will return False in presence of nans...). Other than that I don't see any other possible performance issue right now.

y1 = max(y1, np.max(ys))

return Bbox.from_extents(x0, y0, x1, y1)
x0 = min(bbox.xmin for bbox in bboxes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can np.nan or np.inf come out of these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I should use np.min everywhere (which passes nan through, whereas builtin min drops them). (At least this would be consistent with bbox.xmin.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note to that effect.

Copy link
Member

@tacaswell tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@QuLogic QuLogic merged commit 2bce760 into matplotlib:master Feb 21, 2017
@anntzer anntzer deleted the transforms-cleanup branch February 21, 2017 06:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants