-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Automatic downsampling of images. #5602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch adds automatic downsampling of images with dimensions larger than the physical number of pixels that will be displayed, in order to allow displaying of very large images. The idea is to patch AxesImage to replace calls to self._A[x0:x1, y0:y1] to calls to self._A[x0:x1:xstep, y0:y1:ystep] with a well-chosen step. Because this directly patches AxesImage's code, this is simpler than e.g. @ChrisBeaumont's ModestImage, which must modify self._A to "pretend" the image is something else than what it really is, and doesn't handle the "extent" kwarg. What still needs to be done: - agree on a kwarg/rcParam to control the behavior, and possibly the downsampling level (perhaps one wants e.g. up to 5 data points per pixel instead of a single one), - handle image caches: currently I just disabled rgbacache, which should be fine as to_rgba (which is what hanged matplotlib on huge images so far) will only be called on reasonably-sized arrays, but perhaps there's a better option? - tests.
First, a big thank you for taking this on. This is an important part of matplotlib that needs improvement. One concern: This is effectively decimating (i.e. ignoring data), not downsampling (i.e. averaging across neighboring values), correct? I understand that might be acceptable in some domains, but probably not in others (particularly in astronomy I know this would almost never be the right thing). That's not a reason to exclude it, but it should be documented clearly that that is what this feature does. The bigger problem matplotlib has currently is that images are processed by:
It's done this way historically because we're using Agg, which has a bunch of interpolation algorithms in place that were easy to grab and use, and these only supported RGB. With the latest SVN version of Agg, this is reportedly no longer the case, but I spent some time a few months ago looking at this and it seems to not work as advertised. But assuming someone can do the work so instead we do:
then what you've done here just becomes one of many interpolation algorithms (and probably the fastest one). Are you interested in helping with this? |
With some code cleanup thrown in at the same time, but it should be clear what is the interesting part. Compare x, y = np.ogrid[:10000, :10000] plt.imshow(np.sin(x + y), interpolation="none") with the downsampling and averaging methods to see the difference (one shows severe aliasing, the other doesn't). In either case one can zoom in interactively to check that at high zooms, individual pixels are correctly plotted.
See latest commit, which allows the choice of downsampling method. The commit message includes an example where decimation vs. e.g. averaging produces significantly different results. (I wrote the decimating code first mostly for simplicity (with speed as a secondary benefit) but |
I think long term we really want to do proper interpolation on the input (greyscale) image, and this is just a half measure. I've already solved the C++ side of it in matplotlib/aggravate, now we just need to make the Python side use it. I think that will ultimately be more flexible than this and also be the same speed when interpolation == nearest. Would you like to help with that effort? |
To be honest I don't care much, or at least have very little intuition, about interpolation (I only believe in interpolation="none" :-)) but more importantly I still fail to see the connection between the two cases -- other than in the case where you have too many data points per pixel in one direction (you need to downsample there) and not enough in the other (you need to interpolate there) (but then you can always do the two parts sequentially). |
Nope, they address both. It's really just generic "resampling" where each pixel in the output is projected onto the input, and the average of that rectangle taken (modulo the filtering kernel). |
But do we always want to use averaging? I'd say this is pretty application dependent and certainly at least averaging, geometric averaging, median and possibly decimation all have legitimate uses there. |
How does this connect with the new image infrastructure in #5718 |
This is a special case of #5718 (see my pathological example). |
This is basically a performance improvement for plotting very large images through decimation (which only really does the right thing if To test, I'm using the following:
(If you want to reproduce the numbers, make sure you run it twice and ignore the first run to remove the effects of regenerating the font cache). I get:
So, #5718 is much better than master already, by virtue of downsampling first, and then colormapping something much smaller. I'm a bit surprised by the remaining advantage of this over #5718, since it should boil down to effectively the same thing computationally... I'll need to do some further investigation. I'd still like to find a more general solution than this if possible. |
As best I can tell from profiling, the difference in time is mostly in |
Two quick notes:
(
|
@mdboom, @anntzer, I'm not following this in detail, but here are a couple thoughts that might be relevant to the discussion:
|
Yeah -- this seems like something that Numpy really should grow. I wonder if they'd be amenable to a pull request on that. We'd still have to have our own answer in the meantime, though.
In the common case of single plane data that is color mapped, we color map to 8-bit ints (the colormap code can also output to floats, but in practice that code path is rarely used). #5718 improves things further when the input data is already color: in that case, if the input data is 8-bit int, we no longer require passing through float, but can preserve it as 8-bit all the way down. |
There have been talk of a np.minmax() function for years, but it has never Something else that is even worse about np.ma.min() and such... they make On Tue, Jan 5, 2016 at 2:11 PM, Michael Droettboom <notifications@github.com
|
The following shows where some major slowdowns are coming from the masked array support. It certainly would be nice to get all that sped up within numpy...
It looks like the implementation of |
Yeah -- in my test case, there's nothing masked out... |
Yes -- there may be a way to take advantage of that in the context of #5718. The tricky bit is that normalization happens in Python/Numpy (and through a nice plug-in API where different normalization functions can be written and interchanged), and resampling happens in C++. Plugging normalization into the resampling step might be tricky, though. #5718 already cuts the run time in half -- we know from this PR that there's more that could probably be done wrt to speed, but I'm not certain we should let that hold up #5718. |
How does the new image machinery play with this? I have noticed it seems to perform quite well even with very large images. Should this PR be closed? |
Thanks for pinging this. Indeed, the image refactor was in part a response to this -- to try to do this faster without resorting to decimation. I think we should probably close this unless further testing reveals new problems with the new code. |
Reopening per the discussion at https://gitter.im/matplotlib/matplotlib?at=5c92437e49fdaa0d81e71ec9. |
Image taken from http://www.imagemagick.org/Usage/filter/nicolas/rings_sm_orig.gif The article is also quite useful: http://www.imagemagick.org/Usage/filter/nicolas/ import matplotlib.pyplot as plt
import numpy as np
a = plt.imread('rings_sm_orig.gif')
print(np.shape(a))
# 200x200
fig, ax = plt.subplots(figsize=(4, 4))
ax.imshow(a, cmap='gray')
ax.set_position([0, 0, 1, 1])
fig.savefig('dpi50.png', dpi=50)
fig.savefig('dpi30.png', dpi=30)
fig, ax = plt.subplots(figsize=(4, 4))
ax.imshow(a, cmap='gray', interpolation='lanczos', filterrad=(2*50/30))
ax.set_position([0, 0, 1, 1])
fig.savefig('filtdpi50.png', dpi=50)
fig.savefig('filtdpi30.png', dpi=30) dpi50.png:dpi30.pngWith filter:filtdpi50.pngfiltdpi30.pngConclusion:Agg seems to do the right thing if we give it the appropriate filter. The question is if we apply this filter automatically if we know we are downsampling.... |
The issue is better described in #13724 now, so I'll close this. |
This patch adds automatic downsampling of images with dimensions larger than
the physical number of pixels that will be displayed, in order to allow
displaying of very large images. The idea is to patch AxesImage to replace
calls to self._A[x0:x1, y0:y1] to calls to self._A[x0:x1:xstep, y0:y1:ystep]
with a well-chosen step.
Because this directly patches AxesImage's code, this is simpler than
e.g. @ChrisBeaumont's ModestImage, which must modify self._A to "pretend" the
image is something else than what it really is, and doesn't handle the "extent"
kwarg.
What still needs to be done:
downsampling level (perhaps one wants e.g. up to 5 data points per pixel
instead of a single one),
fine as to_rgba (which is what hanged matplotlib on huge images so far) will
only be called on reasonably-sized arrays, but perhaps there's a better option?