[WIP] Masking invalid x and/or weights in hist (#6483) #7133

TrishGillett · 2016-09-18T15:59:17Z

I'm having trouble testing locally, so this is a WIP PR so I can see how this code does in the CI.

Figure out if the AppVeyor error is a problem or a false alarm
Make an image comparison test to make sure the masking is working
Clean up the commit history

NelleV · 2016-09-18T16:10:46Z

Hi @TrishGillett
Thanks for your patch!
I think I know why you are having problem running the tests locally… Do you have a traceback available so that I can confirm?

TrishGillett · 2016-09-18T17:12:18Z

Hi @NelleV, unfortunately there's no traceback to show, it's a matter of struggling to get the right environment setup. I wanted to manually try out some things, but I was having trouble making a conda environment which:
a) would install matplotlib from my local branch, and
b) would let me actually plot stuff.

Right when I got the former figured out, I hit trouble with the latter: I couldn't open a jupyter notebook from inside the env, and was having backend troubles trying to create plots from a console. If you have any advice, I'd be really grateful. For now I'm going to look at the Travis results and see what bugs it caught. :)

tacaswell · 2016-09-18T20:16:35Z

My tactic for this is something along the lines of

conda create -n mpl_dev python=3.5 ipython jupyter matplotlib
source activate mpl_dev
conda remove matplotlib
cd path/to/mpl_source
pip install -v -e .

TrishGillett · 2016-09-18T21:02:27Z

Works like a charm, @tacaswell! Thanks so much!

For this PR, here are some tasks that still need to be done:

Figure out if the AppVeyor error is a problem or a false alarm
Make an image comparison test to make sure the masking is working
Clean up the commit history

NelleV · 2016-09-19T01:25:15Z

lib/matplotlib/axes/_axes.py

+            for i in range(len(x)):
+                mask_i = x[i].mask
+                if w[i] is not None:
+                    mask_i = mask_i | w[i].mask


I would raise an error if the weight matrix contained mask elements that weren't masked in the data x itself. Unless I am not seeing a use case for that?

The question of whether to mask invalid weights was raised on #6483 but not settled. I don't know of a use case offhand and am okay with throwing an error instead if the feeling is that this is overbuilding. @efiring Do you think there could be a use case?

Using histograms with weights is reasonable way to do spatial integration of images, ex histogram(convert_pixel_locations_to_1D, bins, weights=image.ravel()). If you are looping over a bunch of images, some of which have artifacts that you need to mask out, being able to do this loop with out having to re-mask the pixel position values would be nice.

That said, this is something that should be pushed down into numpy?

But that explains the mask on x, not on the weights. I am also fine with a mask on the weights as long as it is identical as the one on x. But masks different on the weights and on x seem like it may be an error on the users side, hence throwing a ValueError.

I mean the other way around

xx = convent_to_1D(imgs.shape[:-1]) for im in imgs: m = compute_mask(im) w = ma.masked_array(im, m) ax.hist(m, bins, weight=w)

Basically, if we are going to allow masking on either, masking on both and taking the union seems like the best course.

I agree; combining independent masks on both via a union is not difficult, it is easy to explain, and it handles every situation that might arise.
See cbook.delete_masked_points; you could use this as soon as you have the list of sequences of values and the matching list of sequences of weights or of Nones. You would apply it to each matching sequence of values and its corresponding sequence of weights, or None. Maybe it's overkill, but it was designed for this sort of thing.

@efiring Thanks for the cbook tip again, between that and a realization that masks seem to stack, I believe I can rewrite this piece as:

if not input_empty: for i in range(len(w)): if w[i] is not None: x[i] = np.ma.masked_array(x[i], mask=w[i].mask) w[i] = np.ma.masked_array(w[i], mask=x[i].mask) x = cbook.delete_masked_points(x) w = cbook.delete_masked_points(w)

Not quite what I had in mind...

xclean, wclean = [], [] for xi, wi in zip(x, w): xiclean, wiclean = cbook.delete_masked_points(xi, wi) xclean.append(xiclean) wclean.append(wiclean)

This works regardless of whether wi is None or a sequence the length of xi. Using cbook.delete_masked_points is only worthwhile if you are giving it more than one argument. Deleting the masked points from a single 1-D masked array can be done simply by calling it's compressed() method, so that's all you need in your version above. You would call w[i] = w[i].compressed() inside the inner if, and x[i] = x[i].compressed() immediately after it. There are many variations on the theme; I'm not sure which is best.

Ahh, that's pretty nice, I didn't read the docs carefully enough to see the part about any points masked in one arg getting masked and deleted in the others. I'm not sure I understand what 'All input arguments that are not passed unchanged' means though...

tacaswell · 2016-09-19T01:47:18Z

lib/matplotlib/axes/_axes.py

                if inp.ndim == 2:
                    # 2-D input with columns as datasets; switch to rows
                    inp = inp.T
+
+                    if inp.shape[1] < inp.shape[0]:


I am 👎 on this warning, I think it is assuming too much about how users will be using this method.

I now see that this is just moving this warning around. I would be in favor of removing it, but if it is already in the code base, mostly ignore my previous comment.

This is actually just migrated up from where it was before (old line 6068). I don't care for it either but left it alone thinking it probably got debated when it was added. I'm more than happy to remove.

Gotcha. I'm good either way.

Absent a compelling reason I err on the side of not removing things (which may not be a fully defend able position, but it is a safe conservative one)

Chasing this back through git blame, the warning came into the code base in
214976f in 2008.

I am also in favor of removing it (I am biased as I am in a field where nsamples << nvariables )

Oh, wow, that I can see being a problem.
Maybe throw an error (or raise a warning) if x.shape[0] == 1?

NelleV · 2016-09-30T16:47:08Z

Hi @TrishGillett
Can remove the WIP in the title once this is ready to be reviewed and merged?

TrishGillett · 2016-09-30T18:27:05Z

Sorry this has been on pause, the checklist here is still valid and ongoing. I just started a new job and haven't had time to make a VM with the right setup for image comparison tests, and I also haven't decided on a good minimal set of tests for this change (help welcome).

NelleV · 2016-09-30T18:50:17Z

I took the liberty to move the check list to description of the comment. This way, github interprets it properly.

tacaswell · 2016-10-29T01:23:00Z

@TrishGillett See http://matplotlib.org/devdocs/devel/testing.html#building-matplotlib-for-image-comparison-tests with the right env mpl will build it's own copy of the right version of freetype.

tritemio · 2017-06-06T22:02:26Z

Any plan for finalizing this PR? I think it is a pretty important fix. With the current bug in numpy.histogram using plt.hist is becoming quite cumbersome when input has NaNs.

TrishGillett · 2017-06-07T11:30:34Z

Hey @tritemio, thanks for the report. I'll see if I can revive this.

dstansby · 2017-11-01T11:50:21Z

I've been stumbling upon the issue fixed by this PR a lot recently! @TrishGillett do you mind if I pick this up and try and finish the pull request?

TrishGillett · 2017-11-01T23:17:21Z

@dstansby You'd be more than welcome! I tried to pick this PR back up recently but found that it will need some nontrivial work to be brought up to date, made more difficult for me by my loss of context on this code over time. 😦

klapo · 2017-11-06T17:30:48Z

@TrishGillett @dstansby I took a look through the test errors and it seems possible that this PR failed due to a flaw in the test itself that was cleaned up a couple weeks after you submitted the PR.

The test that failed in this PR was disabled due to a missing fonts issue. Below is the image diff from the test. Very clearly a font issue that should be unrelated to masking nans when plotting histograms.

Is it possible to try re-testing?

jklymak · 2017-11-06T18:00:30Z

You'll need to follow the instructions at http://matplotlib.org/devel/gitwash/development_workflow.html Under "rewriting commit history" to 000000. Make a backup branch (Definitely do this!) 1. squash your commits, 2. rebase to master, 3. Force push back to origin...

dstansby · 2017-11-06T21:21:00Z

In order to not mess up your branches @TrishGillett , I've made a new PR with exactly the same code here: #9706. Lets see what happens with CI!

tacaswell · 2019-03-31T19:15:16Z

Found this while seeing what #8638 was related to.

This should be revived, but I am not sure where things landed in light of #9706 being closed.

jklymak · 2020-07-14T15:09:47Z

This seemed mostly there. If anyone takes it up, please read the history carefully to make sure all opinions are taken into account.

lucyleeow · 2020-10-25T23:29:20Z

Is there still interest in continuing this PR? From #9706 (review) it seems like there was discussion about how to deal with missing values but no link so not sure how that discussion evolved?

jklymak · 2020-10-26T03:57:31Z

I don't remember the details, but you are welcome to take another try at it if you have a reasonable idea how to proceed. Note that some changes wrt NaN handling have already gone in....

lucyleeow · 2020-10-26T07:13:42Z

Thanks @jklymak, looking at it more, I don't think I am familiar enough with the code base to tackle this.

jklymak · 2021-06-14T18:58:22Z

I'll closed as abandoned; thanks for the PR, and feel free to re-open if you want to come back to this!

Make _normalize_input always output a list of arrays.

1d9783b

mdboom added the status: needs review label Sep 18, 2016

TrishGillett force-pushed the hist-masking branch from da16535 to 471100f Compare September 19, 2016 01:19

TrishGillett added 2 commits September 18, 2016 21:21

Tidying up the data prep work in hist.

b16144f

Mask invalid entries in x and weights before plotting histograms.

bf8d2cc

TrishGillett force-pushed the hist-masking branch from 471100f to bf8d2cc Compare September 19, 2016 01:21

NelleV reviewed Sep 19, 2016

View reviewed changes

tacaswell reviewed Sep 19, 2016

View reviewed changes

tacaswell added this to the 2.1 (next point release) milestone Sep 19, 2016

NelleV removed the status: needs review label Sep 30, 2016

tacaswell modified the milestones: 2.1 (next point release), 2.2 (next next feature release) Aug 29, 2017

dstansby mentioned this pull request Nov 6, 2017

Masking invalid x and/or weights in hist #9706

Closed

tacaswell modified the milestones: needs sorting, v3.2.0 Mar 31, 2019

dstansby modified the milestones: v3.2.0, v3.3.0 Aug 25, 2019

QuLogic modified the milestones: v3.3.0, v3.4.0 May 27, 2020

jklymak added status: orphaned PR Good first issue Open a pull request against these issues if there are no active ones! labels Jul 14, 2020

jklymak added the status: needs rebase label Jul 14, 2020

jklymak marked this pull request as draft July 23, 2020 16:46

QuLogic modified the milestones: v3.4.0, unassigned Jan 21, 2021

jklymak closed this Jun 14, 2021

Uh oh!

[WIP] Masking invalid x and/or weights in hist (#6483) #7133

[WIP] Masking invalid x and/or weights in hist (#6483) #7133

Uh oh!

Conversation

TrishGillett commented Sep 18, 2016 • edited by NelleV Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NelleV commented Sep 18, 2016

Uh oh!

TrishGillett commented Sep 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tacaswell commented Sep 18, 2016

Uh oh!

TrishGillett commented Sep 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TrishGillett Sep 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

efiring Sep 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TrishGillett Sep 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NelleV commented Sep 30, 2016

Uh oh!

TrishGillett commented Sep 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NelleV commented Sep 30, 2016

Uh oh!

tacaswell commented Oct 29, 2016

Uh oh!

tritemio commented Jun 6, 2017

Uh oh!

TrishGillett commented Jun 7, 2017

Uh oh!

dstansby commented Nov 1, 2017

Uh oh!

TrishGillett commented Nov 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

klapo commented Nov 6, 2017

Uh oh!

jklymak commented Nov 6, 2017

Uh oh!

dstansby commented Nov 6, 2017

Uh oh!

TrishGillett commented Sep 18, 2016 •

edited by NelleV

Loading

TrishGillett commented Sep 18, 2016 •

edited

Loading

TrishGillett commented Sep 18, 2016 •

edited

Loading

TrishGillett Sep 19, 2016 •

edited

Loading

efiring Sep 19, 2016 •

edited

Loading

TrishGillett Sep 19, 2016 •

edited

Loading

TrishGillett commented Sep 30, 2016 •

edited

Loading

TrishGillett commented Nov 1, 2017 •

edited

Loading