Skip to content

Simplify normalization of multiple images #11380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

timhoffm
Copy link
Member

@timhoffm timhoffm commented Jun 5, 2018

PR Summary

In response to #11377. This adds two simpler ways for creating a common normalization for images.

In particular the linked example is quite verbose. That (at least the first) half gets significantly easier with this PR.

Pre-imshow normalization

if the data are known beforehand, one can create a Normalize instance scaled to the global min/max of the data:

import matplotlib.pyplot as plt
from matplotlib.colors import Normalize

fields = [np.random.randn(10, 10) + 2*i for i in range(3)]

_, axes = plt.subplots(ncols=3)
norm = Normalize(data=fields)                     # New API: create Normalize
for ax, field in zip(axes, fields):
    im = ax.imshow(field, norm=norm, cmap="RdBu_r")    # use Normalize
plt.colorbar(im, ax=axes)

grafik

Post-imshow normalization

Alternatively, one can collect the images and create a Normalize instance from their values and apply the norm to the images:

_, axes = plt.subplots(ncols=3)
images = []
for ax, field in zip(axes, fields):
    images.append(ax.imshow(field, cmap="RdBu_r"))
Normalize.apply_autoscale(images)                       # New API
plt.colorbar(im, ax=axes)

(withdrawn after discussion)

This is a proof of concept. Documentation, tests and examples are not yet complete. Feedback on the general API is much apreciated.

PR Checklist

  • Has Pytest style unit tests
  • Code is PEP 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)

@ImportanceOfBeingErnest
Copy link
Member

Why is this restricted to images?
Why is this restricted to Normalize? (What about PowerNorm, LogNorm etc.?)

@timhoffm
Copy link
Member Author

timhoffm commented Jun 6, 2018

It probably shouldn't be restricted to images or Normalize. This is just a proof of concept quickly hacked together. So far, I didn't bother to check if it's straight forward to extend to other cases.

@tacaswell
Copy link
Member

I think data is better than autoscale for the kwarg nam. I agree that this signature should be propagated to all of the Norm sub-classes.

I am less convinced by the class method.

@tacaswell tacaswell added this to the v3.0 milestone Jun 6, 2018
@jklymak
Copy link
Member

jklymak commented Jun 6, 2018

How does auto normalization work now ( I’m away from my computer) If this goes in then it might be nice to make sure we don’t have multiple auto scaling algorithms.

Overall I’m not really sure this is needed but maybe has a small enough footprint to be useful.

@timhoffm
Copy link
Member Author

timhoffm commented Jun 6, 2018

  • Parameter name autoscale changed to data.
  • Added data parameters for Normalize subclasses (where it makes sense)

@timhoffm
Copy link
Member Author

timhoffm commented Jun 6, 2018

@jklymak auto normalization works via self.norm.auto_scale_None(self._A) in AxesImage. It's called when the image is created. By using the _None variant of the function we ensure that the scaling is determined by the first image that is connected to the Normalize instance. So the __init__() methods use exactly the same auto scaling algorithm.

As for the Normalize.apply_autoscale() class method, I use the global min/max of all supplied images as vmin/vmax. For the case of one supplied image, this is equivalent to giving the image data to the constructor. The generalization is straight forward. We don't have to bother with the None part vmin/vmax cannot be supplied externally (the image data are the only criterion here).

@timhoffm
Copy link
Member Author

timhoffm commented Jun 6, 2018

Remark on if this is needed

Normalize(data=...):

Probably I want to normalize to data more often than to vmin/vmax. One could argue, that I can calculate the respective values and just pass them to the existing parameters. From a minimalistic API point of view that is correct. However, matplotlib has a lot of redundant ways of doing things. Most of them for convenience (e.g. set line properties via kwargs, fmt-string, rc_context, on the created Line2D object).

I argue that this is such a case where convenience beats minimalism. In particular, normalizing to data feels quite straight forward.

Normalize.apply_autoscale():

I'm a bit divided if this method is necessary. On the one hand one can argue that Normalize(data=...) is very convenient and thus sufficient. The only disadvantage is that all the data need to be there before the first imshow so that I can construct the norm beforehand and pass it in. If that's not the case, it gets really cumbersome (5 lines of code just to get all images to the same scale).

So the question is: Are there significant real world cases in which I cannot use Normalize(data=...)? Or is it just the example which generates the data on the fly? If there are real-world cases (and I can imagine, people are loading and plotting data within a single loop), I would welcome a function that can autoscale and apply a norm to a number of images.

"""
self.vmin = _sanitize_extrema(vmin)
self.vmax = _sanitize_extrema(vmax)
self.clip = clip
if data is not None:
self.autoscale_None(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean self.apply_autoscale here or am I misunderstanding? I'm not seeing how data can be a list of np arrays if you call autoscale_None. I don't see that you have a test of this below, i.e. you only call apply_autoscale.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I mean autoscale_None. autoscale_None internally calls np.asanyarray() on the data, so a list of arrays is fine. I don't think we really need to test this, but if you want I can add a test.

norm = Normalize(data=data) is really just a shortcut for

norm = Normalize()
norm.autoscale(data)

i.e. it scales to the data. The autoscale_None variant is used so that you can consistently do Normalize(vmin=0, data=data) in which case only vmax will be taken from data.

apply_autoscale() serves a different purpose. While the above works on data, apply_autoscale() works on AxesImages. It creates a norm based on the data in the images and applies the norm to the images. Maybe we should better call it autoscale_images()?

@jklymak
Copy link
Member

jklymak commented Jun 19, 2018

Probably I want to normalize to data more often than to vmin/vmax. One could argue, that I can calculate the respective values and just pass them to the existing parameters. From a minimalistic API point of view that is correct. However, matplotlib has a lot of redundant ways of doing things. Most of them for convenience (e.g. set line properties via kwargs, fmt-string, rc_context, on the created Line2D object).

I rarely want to normalize automatically by data (except for initial exploration), and if I do, I don't find doing it by hand too onerous. I don't see what's better about the list comprehension gather versus just a loop that checks for vmin/vmax. Worse, the gather seems memory intensive to me if I have a big data set.

Yes, we have lots of redundant ways to do the same thing. I'm not sure thats always a good thing 😉

Not 100% against this if there is really an audience, but it seems crufty to me.

@timhoffm
Copy link
Member Author

It probably depends on your application if you want to normalize automatically. No point in arguing individual cases here (neither mine nor yours).

The "list comprehension" is actually a generator expression, and if anything less memory consuming and faster than a loop. That's actually what users should use if they have to do this themselves. 😉

Any additional feedback (positive or negative) on Normalize.apply_autoscale() from other devs would be helpful. If negative, I could remove that function to bring the merge of the data attribute part forward.

@ImportanceOfBeingErnest
Copy link
Member

I think my very first comment still applies. Normalize is a general purpose class to normalize data. I'm not sure if it should handle AxesImages specifically. Or if it does, it should handle every mappable that has a get_array method (Why only images?).

Second, the norm is not updated when the images change. In that sense apply_autoscale is a one time only method. It may be more confusing than helping. Partially this may also be due to a classmethod named apply... (applying something should happen to the instance, but the instance does not store the images to apply anything.).

The new data argument seems sufficiently useful to devote an example to it (or change extend the existing one on that subject).

@timhoffm
Copy link
Member Author

Ok, will remove apply_autoscale() later.

Unfortunately, the existing example does not benefit from the data argument, because the image data is generated on the fly while plotting. That was actually the motivation to have the apply_autoscale() class method.

The workaround would be to split the loop and first generate all the data, then create the norm and then plot all the data. But that's not an easy and natural workflow for the example either.

@ImportanceOfBeingErnest
Copy link
Member

Maybe it's worth looking into why the following does not work:

import matplotlib.pyplot as plt
import numpy as np

fig, axes = plt.subplots(2,2)

data = []
norm = plt.Normalize(None,None)
for i, ax in enumerate(axes.flat):
    arr = np.random.rand(10,10)+i
    ax.imshow(arr, norm=norm)
    data.append(arr)
    
norm.autoscale_None(data)

plt.show()

If it was working as expected, there wouldn't be any need for the apply_autoscale() method I guess.

@timhoffm
Copy link
Member Author

The docstring says

If vmin or vmax is not given, they are initialized from the minimum and maximum value respectively of the first input processed.

Probably, this is done when I pass the norm to Imshow. (Didn‘t check in the code)

@timhoffm
Copy link
Member Author

timhoffm commented Jun 20, 2018

  • Removed Normalize.apply_autoscale()
  • Rewrote the example to
    a) Start with a simple version using Normalize(data=...)
    b) Explain the more involved version when applying the norm after creating the images

Copy link
Member

@tacaswell tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a whats_new entry.

I am a bit concerned about the API growth

@tacaswell tacaswell modified the milestones: v3.0, v3.1 Jul 7, 2018
@timhoffm
Copy link
Member Author

timhoffm commented Jul 9, 2018

What's new added.

Copy link
Contributor

@anntzer anntzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather document norm.autoscale(data) better.
The main issue with the PR, IMO, is that we again end up with "partially overlapping kwargs", namely the behavior when both vmin/vmax and data are passed is less than optimal (IMO).

@jklymak jklymak modified the milestones: v3.1.0, v3.2.0 Feb 5, 2019
@timhoffm timhoffm modified the milestones: v3.2.0, v3.3.0 Aug 7, 2019
@QuLogic QuLogic added the status: needs comment/discussion needs consensus on next step label Apr 1, 2020
@efiring
Copy link
Member

efiring commented Apr 7, 2020

I'm not seeing enough user-level simplification here to justify the PR. The example can be simplified a bit even without the PR, but I also find there is a bug in part of the example that is not related to the PR: the colorbar is not responding properly to changes in the clim unless applied to images[0]. I will make a new issue for that.

@efiring
Copy link
Member

efiring commented Apr 7, 2020

See #17052 for the bug report promised above.

@tacaswell tacaswell dismissed their stale review April 13, 2020 21:31

Very old and out of date.

@QuLogic QuLogic modified the milestones: v3.3.0, v3.4.0 May 4, 2020
@timhoffm
Copy link
Member Author

Closing because there seems little approval of the idea. I'm not going to fight over it.

@timhoffm timhoffm closed this Jul 14, 2020
@QuLogic QuLogic removed this from the v3.4.0 milestone Mar 16, 2021
@timhoffm timhoffm deleted the normalize branch June 10, 2022 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants