-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
New C++ contour code with corner_mask kwarg #3874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is awesome! I will see about at the least testing the legacy mode to ensure no regressions. I have a huge archive of products that I can test against. |
Wonderful, thank you! |
:func:`~matplotlib.pyplot.contour` as well as | ||
:func:`~matplotlib.pyplot.contourf`, and it subdivides the domain into | ||
subdomains of exactly `nchunk` by `nchunk` quads, whereas previously it was | ||
only roughly `nchunk` by `nchunk` quads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the results of the path effects collection test below (where the labels of the contours have moved), do we want to mention that here? Something like: "Contour labels may appear in different places than in earlier versions of matplotlib".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good idea.
Thanks for doing this. Personally, I'm excited not so much by the new masking feature, but by having some contour code that is accessible enough to be continually improved over time, rather than the near-dead-end we had before. |
@@ -0,0 +1,503 @@ | |||
/* | |||
* QuadContourGenerator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you follow a published algorithm when implementing? It would be worth referencing that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, all my own work. Contouring has been my default problem to implement when learning new programming languages for some years, so I have tried out many approaches and what you see is the latest evolution that deals with masking in an efficient manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pelson , no, I have never used chunking, and I don't recall its history. I
think it was before my time.
On Wed, Dec 3, 2014 at 12:28 PM, Ian Thomas notifications@github.com
wrote:
In src/_contour.h:
@@ -0,0 +1,503 @@
+/*
- * QuadContourGenerator
Nope, all my own work. Contouring has been my default problem to implement
when learning new programming languages for some years, so I have tried out
many approaches and what you see is the latest evolution that deals with
masking in an efficient manner.—
Reply to this email directly or view it on GitHub
https://github.com/matplotlib/matplotlib/pull/3874/files#r21248257.
First things first: This is awesome! I do a lot with contours so have a few questions:
Ultimately, matplotlib deals with this kind of Path just fine when rendering it, but I know of other libraries making use of the contours which do struggle with this kind of geometry, and they therefore have to post-process the contour result.
I think that is all. Great work on this! |
As a note about the degenerate polygons. They only occur when the value of As for CCW and CW geometries, I have a few years worth of daily shapefiles I have been approved to spend a couple of days building up a test rig to at On Wed, Dec 3, 2014 at 9:04 AM, Phil Elson notifications@github.com wrote:
|
What do you mean emulate with some clipping? I read it that the new code with |
@ianthomas23 - the filled contour code used to give paths which started and terminated at the same location (though annoyingly they didn't use either the CLOSEPOLY flag, which I think they should). The new contour code however neither produces closed paths, nor does it produce paths that look closed by MOVETO the first point of the geometry. Any chance you could add that here? I have been tempted on several occasions to update the Cntr.c code to do this, but each time I have been put off by the tangled mess (dead-end code as @mdboom calls it). |
@pelson, re: clipping: perhaps I am being a bit of a luddite... and thinking about it, emulating the legacy mode would probably be harder than it is worth. I'll re-evaluate my position after I complete my testing and analysis. |
FWIW, I've been looking at 556b4d1 should it come in useful to compare against future revisions of this PR. |
Ok, here is my testing plan. Looking over the code, the legacy mode really shouldn't suffer from any regressions_. So, I am not going to systematically check the polygons generated from v1.1.1 (our operational version_*) and this PR using legacy mode. Instead, I am going to generate polygons using both modes in this PR. Because the polygons can not be directly compared from the two algorithms (not the same exact points/spacing), I will then rasterize the polygons (I have code for this using GDAL), resulting in numpy arrays that can be compared. The evaluation will be that the arrays should be equal anyplace where the legacy grid is not the background value. In addition, all polygons generated will get validated by GDAL and a few additional checks that I have developed over the past couple of years. I haven't yet thought about how to analyze the portion of the quadcorners grid that are masked out in the legacy grid. Perhaps a count of the number of such grid points would be sufficient? Suggestions would be welcomed! although, I do see some subtle differences between the paths produced using v1.1.1 and those produced in legacy mode -- I suspect this might be due to some fixes in Polygons for v1.4. I'll double-check that. |
Oh, and is there any reason for me to also test contour()? Right now, I only plan on testing contourf(). |
@WeatherGod: I would recommend sticking with testing contourf only, not contour. The contourf code is much more complicated and more likely to have bugs. |
Ok, there appears to be a definite difference in the paths produced by the legacy mode in this PR and in v1.4.2. The paths produced in v1.4.2 are identical to the ones produced in v1.1.1. This is only one file, but a visual inspection does not yield any differences. Now I am going to test against master. |
And master is identical to v1.4.2. Something changed here... now I am going to make sure that the legacy mode is actually being used. |
Orientation of polygons: For filled contours this is covered in the fourth paragraph of 'Filled Contours' near the top of Degenerate polygons: Correctly explained by @WeatherGod as only occurring at points which have a z-value identical to a contour level. My thoughts are this is fine as it is, as it renders correctly. Performance: I've done some basic (non-rigorous) profiling for 'reasonable' cases, calculating the contours and rendering to png. contourf is ~5% faster for corner_mask=False or True compared to legacy; contour is ~1% faster for True and ~6% slower for False than legacy. Given how similar these are to legacy I haven't bothered investigating further. CLOSEPOLY flag: The polygons generated by the new code are purposefully the same as the legacy code in that filled polygons do not repeat the first point and do not have a CLOSEPOLY. It would not be difficult to change this, but I do not want to do so in this PR. I would like to leave the output of the new code as similar as possible to the legacy code to allow others to compare the two. |
gah! ok, I need to improve my reading comprehension. I was thinking that the new contouring code always did the corner stuff, and that a corner_mask value of False or 'legacy' were equivalent. In my initial tests, I just simply set it to False. I'll update my testing plan to ensure that the rasterized results from the corner_mask=False and corner_mask='legacy' runs are identical, and then compare the True and False versions against each other where-ever the False version is not equal to the background value. On a side note, this makes transitioning people over from the legacy implementation to the new contouring, but with corner_mask being False very easy. It also makes for a good case to remove the legacy contouring code sooner. |
@ianthomas23 This needs a re-base. Once that is done I am in favor of going in with this despite @WeatherGod 's linking issues. |
The problem isn't linking. There is a potential serious performance Unfortunately, I haven't had time to investigate this further, but I really On Wed, Feb 18, 2015 at 10:33 PM, Thomas A Caswell <notifications@github.com
|
@WeatherGod: You misunderstand entirely. I cannot reproduce your poor performance figures under any circumstances. So let me be completely clear:
I have no explanation whatsoever for your figures. I cannot reproduce them. Hence your request for me to "retry testing the optimisation flags" is absurd. Only you can do this, as I have already stated. Unless you are intending to ship the offending machine to the UK for me to look at? |
Rebased, changed dates in CHANGELOG and api_changes, and squashed. |
If I get a chance today, how about I build some binary eggs, one with and On Thu, Feb 19, 2015 at 3:49 AM, Ian Thomas notifications@github.com
|
Also, have we had anybody test this on Windows and Macs yet? On Thu, Feb 19, 2015 at 9:37 AM, Benjamin Root ben.v.root@gmail.com wrote:
|
I am happy to test it out on a mac in the next few days |
@WeatherGod: No that would not help. I have already spent many hours helping you with this issue, and when the going got tough you steadfastly refused to put any effort into it yourself. I have yet to see any evidence that your issue is not due to either a bizarre problem in the set up your test system or operator error. Until there is evidence that it is something else I am not going to spend any further time on it. So I will say it for the third time, if you think there is a problem here then you need to put time into investigating it yourself. |
@ianthomas23, I think you misunderstand. I see on my machine a difference in performance. On your machine, you don't. The question is, is it due to compilers, or something else? One way to diagnose that is to take my binaries and try them on your VM to see if you can reproduce my results. If yes, then it is likely something weird with compilers somehow. If no, then the problem is something else entirely. The other way to go about this is to do the opposite (you send me your binaries), but I am a little hesistent to do that as my machine isn't a sandboxed VM. As for time spent on this issue, I was the one who managed to get my employeer to have me spend two days analyzing this PR and its impacts upon our processing. At the moment, for my employeer's sake, I cannot recommend this mode due to the the potential performance regressions that are not exhibited by the old code. At the moment, as we head into severe storm season, I am doubtful that I am going to be able to get any additional time to analyze this problem further. |
Can we please lower the level of aggression in this thread? On Thu, Feb 19, 2015, 12:49 Benjamin Root notifications@github.com wrote:
|
Time to cool this down, please. First, I would like a little bit of summary information, since this set of review comments has gotten too long to follow. @WeatherGod, for realistic cases that you are concerned about,
Second, my inclination is to simply merge this. It is adding functionality that I know I want, and I'm sure others will also appreciate. Even if there is a speed reduction with the new algorithm on some systems, it sounds to me like it will be acceptable in many, perhaps most, cases. So long as "legacy" is left in place, nothing is lost. Merging it will enormously facilitate additional testing. There will still be time for tweaks, bug fixes if necessary, and maybe refinements in compiler flag recommendations. |
Sorry, I am usually the one to keep my cool. I think the snow is getting to
I am not against merging this PR in principle (modulo confirmation that I do apologize if my previous comment was out of line. That was not my On Thu, Feb 19, 2015 at 1:13 PM, Eric Firing notifications@github.com
|
Clearly I am the one being aggressive and I apologise for that. @WeatherGod has nothing to apologise for; all the fault is mine and I am grateful for the help with testing. I should explain where I am coming from. I've been honing this code for a number of years, and I originally wrote it at @efiring's request and as an intellectual challenge. But I don't particularly care about it being in mpl. I hardly ever use mpl any more as even I find it cumbersome and dated. When I need to do some charting I use my own bespoke software with modified versions of the various contour generation algorithms behind the scenes. Whether mpl has corner_mask contouring or not will not affect my quality of life. This PR has been immensely frustrating. My intention was that a few other developers would be sufficiently interested in the new code to test it out before it could be merged. I've asked three times for help with testing but only @WeatherGod has helped so far. It could simply be that when others see that one developer is doing some testing they think that it is covered and they can ignore it. But I don't take it that way. I take it to indicate that there is mostly indifference to including this PR. Now that is fine, but I don't want to take on the burden of supporting functionality for the next n years that nobody else is really that bothered about. Hence my inclination at the moment is to withdraw this PR and step back from my already minimal mpl development. But I haven't quite managed to click on the close button yet so evidently my mind is not completely decided. I have at least learned to be more selfish and only make contributions that I really care about. Given my indifference, the two and a half months since I issued this PR have been frustrating. I've had to spend a lot of time dealing with what I see as minor issues whilst the major issues of is this functionality desirable and does it do what it is supposed to do have mostly been sidelined. Now that we are finally getting to the chase I have almost zero time to spend on this. My primary job no longer involves sitting in front of a computer, instead I work outside. When I issued the PR back in December it was winter and I had plenty of time to spare. Now spring is almost here and as I write it is a clear dry day outside and I really should be out there earning a living rather than having to come inside, turn on a PC and respond to mpl comments, sometimes exactly the same comments repeated. I hope that at least partially explains the shortness in some of my responses. @WeatherGod: I sometimes find in your communications that you haven't quite grasped the gist of what I have been saying. But that is unfair of me, of course you haven't invested all the time and detail into this PR that I have so it shouldn't be as familiar to you. The difference you observe cannot be due to compilers as my test install of CentOS used exactly the same versions of gcc and python that yours did, which was pretty much the whole point of me trying that. Secondly, the performance difference isn't due to asserts. Asserts are no-ops in default mpl builds, which can be confirmed by the presence of the compiler option Thirdly, and the most important one, is why am I so confident that this is not a real problem? Historically I have written a lot of HPC code and subsequent to that I was a commercial C++ developer for many years. Although I don't do either now, I cannot recall a single case of a linker optimisation flag making a measurable performance difference. Of course I cannot be completely sure that it is not a problem, but my experience says otherwise. Unfortunately we are left with the situation that we have one tester who observes poor performance enough to cause concern. I have attempted to replicate these results using the same OS, gcc and python, but to no avail. The situation is similar to a number of reports we get periodically on the mailing list along the lines of "I tried out mpl, the performance wasn't good enough". We try to replicate the results and when we cannot we sideline the issue because we don't believe it. The obvious solution is to get more data points by others performing some testing... |
I wonder if someone over at scikit-image might be able to help us out with I do very much appreciate the effort that you have put into this PR. One of If I get a chance today, I'll try and read over the build logs in more On Fri, Feb 20, 2015 at 4:39 AM, Ian Thomas notifications@github.com
|
It compiles and runs on OSX Mavericks via a standard build of mpl (no surprise). In a few quick tests on both linux (Ubuntu 14.04) and OSX, it is roughly 10% slower than legacy; I can try to do some more systematic comparisons this weekend. |
Actually, I might be able to answer some questions about the legacy code. Indeed, not having a publication isn't a blocker, just a "it would be nice" Given that this was able to compile and build on a CentOS6 machine, I don't On Fri, Feb 20, 2015 at 2:09 PM, Eric Firing notifications@github.com
|
@jbmohler, if I remember correctly, you run Windows; can you try building and running this? |
@WeatherGod, Yes, I know it came from GIST, and we acknowledge that in the comment (which I wrote) at the top of cntr.c; but we were never able to track down the author. If we had, I would have added that to the comment. |
I'll re-iterate. I tested this on several occasions on a RHEL6 box and was satisfied that the resulting contours were as good (and in some cases better) as those produced by the the original contouring algorithm. Performance was not something I looked at - it was good enough for me to not have concerns, but I can go back and put together some numbers if that is of use.
FWIW There is no reason that this piece of code couldn't become an independent "libcontour" with mpl as its first user. You probably know that there are already several projects who have lifted the mpl cntr source verbatim. I don't propose that you do that (because I'd like to have it merged here), but it is always an option. |
Below is a script I used for a timing test. The results vary quite a bit depending on the character of the input data as well as the size of the array. The worst cases for the new algorithm are those with the smoothest data, yielding contour lines that are nearly horizontal or nearly vertical. I suspect the variation in performance has quite a bit to do with the pattern of memory access. In any case, as @ianthomas23's earlier testing showed, the new algorithm can be either faster or slower than the old one, depending on the input. import timeit
import numpy as np
import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
def make_fake(shape, style):
nx, ny = shape
if style == 'rampH':
z = np.arange(nx * ny, dtype=float).reshape(shape)
elif style == 'rampV':
z = np.arange(nx * ny, dtype=float).reshape(shape, order='F')
elif style == 'sinusoidal':
x = np.linspace(0, 2 * np.pi, nx)
y = np.linspace(0, 2 * np.pi, ny)
z = np.sin(10 * x)[:, np.newaxis] * np.cos(10 * y)
elif style == 'random':
np.random.seed(123)
z = np.random.randn(nx, ny)
z[1:-1] = 0.25 * (z[:-2] + z[2:]) + 0.5 * z[1:-1]
z[:, 1:-1] = 0.25 * (z[:, :-2] + z[:, 2:]) + 0.5 * z[:, 1:-1]
return z
setuplines = ["from __main__ import make_fake, shape, style, nb, method, plt",
"z = make_fake(shape, style)"]
shapes = ((50, 100), (200, 400), (500, 1000), (2000, 1000))
styles = ('rampH', 'rampV', 'sinusoidal', 'random')
nbounds = (3, 30, 100)[:2]
methods = ('legacy', True)
tdict = {}
for shape in shapes:
for style in styles:
for nb in nbounds:
print("")
npts = shape[0] * shape[1]
number = 10
if npts * nb > 200 * 400 * 30:
number = 2
for method in methods:
print(shape, style, nb, method)
if method == 'legacy':
stmt = "plt.contourf(z, nb, corner_mask='legacy')"
else:
stmt = "plt.contourf(z, nb, corner_mask=True)"
t = timeit.timeit(stmt=stmt,
setup=';'.join(setuplines),
number=number)
print(" %.4f" % (t / number))
tdict[method] = t
ratio = tdict[True] / tdict['legacy']
print(" Ratio new/legacy (percent) %.0f" % (ratio * 100))
z = make_fake(shape, style)
plt.contourf(z, nb)
plt.savefig("contour_timer_%s_%s_%s.png" % (shape, style, nb))
plt.close('all') |
@cgohlke, might you be able to test this on Windows soon? |
This builds OK on Windows, Python 2.7, 64 bit. The few test failures seem unrelated. |
New C++ contour code with corner_mask kwarg
This is a new C++ extension module that calculates contours to add support for different ways of dealing with masked arrays using a new

corner_mask
keyword argument. It was originally suggested by @efiring a number of years (!) ago. The current behaviour, which is now obtained usingcorner_mask=False
, was for a single masked point to completely mask out all 4 quads touching that point. The new behaviour, obtained usingcorner_mask=True
, masks the corners of those quads touching the point but any triangular corners comprising three unmasked points on the opposite sides of those quads are contoured as usual. The diagram below should make this clearer; red circles indicate masked points.I intend this code to eventually replace the existing C contouring module (cntr.c), if/when it is proven to be robust enough. I did not attempt to add the new functionality to cntr.c as it is a little too cryptic for my taste and I could not see how to shoehorn the new functionality into it. I have kept the existing code, however, and it can be accessed using
corner_mask='legacy'
. This allows the results of the old and new codes to be compared, and allows us to easily fall back to the old code if I have messed up.If the
corner_mask
is not specified in acontour
orcontourf
call, then new rcParamcontour.corner_mask
is used instead. This defaults toTrue
, on the basis that if we have improved functionality it should be the default, and is therefore an API change.I have modified the chunking behaviour (obtained using the kwarg
nchunk
). Previously it only applied tocontourf
, in the new extension if applies tocontour
as well. Also, previously it divided the domain into roughly nchunk-by-nchunk quads, now it is exactly nchunk-by-nchunk quads. I prefer this as the chunks are at predictable, regularly-spaced intervals making it easier to e.g. align them with grid lines to minimise the effects of rendering artifacts that chunking can give you. This is another API change, although not a significant one as I believe chunking is rarely used.In the absence of chunking I have tried to make the outputs of
corner_mask=False
andcorner_mask='legacy'
as similar as possible. I've been mostly successful as image diffs of standard situations usually turn out completely black. But this does not always work as (1) the contoured polygons occur in a different order, and (2) the polygons' points are determined using floating-point math so will not necessarily be exactly the same.What I would like is for other developers/users who regularly generate contour plots (especially of masked arrays) to try out the new functionality. Code of this complexity will invariable have some bugs in it, i.e. I won't have done a perfect job! As well as picking holes in the implementation, there are a few obvious decisions I have made that others may question:
corner_mask
keyword argument.contour.corner_mask=True
.Finally, I've added a couple of image tests and an example. I have made the minimum number of changes to the python code, although I have added code to
contour.py
to remove a mask that is allFalse
. I have had to change the images generated by test_patheffects.test_collection as some default label positions have changed, presumably because the contoured polygons are in a different order. And I've fixed a couple of typos in_tri_wrapper.cpp
that I put in as part of the CXX removal.