-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[WIP] Masking invalid x and/or weights in hist (#6483) #7133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @TrishGillett |
Hi @NelleV, unfortunately there's no traceback to show, it's a matter of struggling to get the right environment setup. I wanted to manually try out some things, but I was having trouble making a conda environment which: Right when I got the former figured out, I hit trouble with the latter: I couldn't open a jupyter notebook from inside the env, and was having backend troubles trying to create plots from a console. If you have any advice, I'd be really grateful. For now I'm going to look at the Travis results and see what bugs it caught. :) |
My tactic for this is something along the lines of
|
Works like a charm, @tacaswell! Thanks so much! For this PR, here are some tasks that still need to be done:
|
da16535
to
471100f
Compare
471100f
to
bf8d2cc
Compare
for i in range(len(x)): | ||
mask_i = x[i].mask | ||
if w[i] is not None: | ||
mask_i = mask_i | w[i].mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would raise an error if the weight matrix contained mask elements that weren't masked in the data x
itself. Unless I am not seeing a use case for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using histograms with weights is reasonable way to do spatial integration of images, ex histogram(convert_pixel_locations_to_1D, bins, weights=image.ravel())
. If you are looping over a bunch of images, some of which have artifacts that you need to mask out, being able to do this loop with out having to re-mask the pixel position values would be nice.
That said, this is something that should be pushed down into numpy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that explains the mask on x, not on the weights. I am also fine with a mask on the weights as long as it is identical as the one on x. But masks different on the weights and on x seem like it may be an error on the users side, hence throwing a ValueError.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the other way around
xx = convent_to_1D(imgs.shape[:-1])
for im in imgs:
m = compute_mask(im)
w = ma.masked_array(im, m)
ax.hist(m, bins, weight=w)
Basically, if we are going to allow masking on either, masking on both and taking the union seems like the best course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree; combining independent masks on both via a union is not difficult, it is easy to explain, and it handles every situation that might arise.
See cbook.delete_masked_points
; you could use this as soon as you have the list of sequences of values and the matching list of sequences of weights or of Nones. You would apply it to each matching sequence of values and its corresponding sequence of weights, or None. Maybe it's overkill, but it was designed for this sort of thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@efiring Thanks for the cbook tip again, between that and a realization that masks seem to stack, I believe I can rewrite this piece as:
if not input_empty:
for i in range(len(w)):
if w[i] is not None:
x[i] = np.ma.masked_array(x[i], mask=w[i].mask)
w[i] = np.ma.masked_array(w[i], mask=x[i].mask)
x = cbook.delete_masked_points(x)
w = cbook.delete_masked_points(w)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite what I had in mind...
xclean, wclean = [], []
for xi, wi in zip(x, w):
xiclean, wiclean = cbook.delete_masked_points(xi, wi)
xclean.append(xiclean)
wclean.append(wiclean)
This works regardless of whether wi is None or a sequence the length of xi. Using cbook.delete_masked_points
is only worthwhile if you are giving it more than one argument. Deleting the masked points from a single 1-D masked array can be done simply by calling it's compressed()
method, so that's all you need in your version above. You would call w[i] = w[i].compressed()
inside the inner if
, and x[i]
= x[i].compressed()
immediately after it. There are many variations on the theme; I'm not sure which is best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, that's pretty nice, I didn't read the docs carefully enough to see the part about any points masked in one arg getting masked and deleted in the others. I'm not sure I understand what 'All input arguments that are not passed unchanged' means though...
if inp.ndim == 2: | ||
# 2-D input with columns as datasets; switch to rows | ||
inp = inp.T | ||
|
||
if inp.shape[1] < inp.shape[0]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am 👎 on this warning, I think it is assuming too much about how users will be using this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now see that this is just moving this warning around. I would be in favor of removing it, but if it is already in the code base, mostly ignore my previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually just migrated up from where it was before (old line 6068). I don't care for it either but left it alone thinking it probably got debated when it was added. I'm more than happy to remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. I'm good either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absent a compelling reason I err on the side of not removing things (which may not be a fully defend able position, but it is a safe conservative one)
Chasing this back through git blame, the warning came into the code base in
214976f in 2008.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also in favor of removing it (I am biased as I am in a field where nsamples << nvariables )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, wow, that I can see being a problem.
Maybe throw an error (or raise a warning) if x.shape[0] == 1?
Hi @TrishGillett |
Sorry this has been on pause, the checklist here is still valid and ongoing. I just started a new job and haven't had time to make a VM with the right setup for image comparison tests, and I also haven't decided on a good minimal set of tests for this change (help welcome). |
I took the liberty to move the check list to description of the comment. This way, github interprets it properly. |
@TrishGillett See http://matplotlib.org/devdocs/devel/testing.html#building-matplotlib-for-image-comparison-tests with the right env mpl will build it's own copy of the right version of freetype. |
Any plan for finalizing this PR? I think it is a pretty important fix. With the current bug in numpy.histogram using |
Hey @tritemio, thanks for the report. I'll see if I can revive this. |
I've been stumbling upon the issue fixed by this PR a lot recently! @TrishGillett do you mind if I pick this up and try and finish the pull request? |
@dstansby You'd be more than welcome! I tried to pick this PR back up recently but found that it will need some nontrivial work to be brought up to date, made more difficult for me by my loss of context on this code over time. 😦 |
@TrishGillett @dstansby I took a look through the test errors and it seems possible that this PR failed due to a flaw in the test itself that was cleaned up a couple weeks after you submitted the PR. The test that failed in this PR was disabled due to a missing fonts issue. Below is the image diff from the test. Very clearly a font issue that should be unrelated to masking nans when plotting histograms. Is it possible to try re-testing? |
You'll need to follow the instructions at |
In order to not mess up your branches @TrishGillett , I've made a new PR with exactly the same code here: #9706. Lets see what happens with CI! |
This seemed mostly there. If anyone takes it up, please read the history carefully to make sure all opinions are taken into account. |
Is there still interest in continuing this PR? From #9706 (review) it seems like there was discussion about how to deal with missing values but no link so not sure how that discussion evolved? |
I don't remember the details, but you are welcome to take another try at it if you have a reasonable idea how to proceed. Note that some changes wrt NaN handling have already gone in.... |
Thanks @jklymak, looking at it more, I don't think I am familiar enough with the code base to tackle this. |
I'll closed as abandoned; thanks for the PR, and feel free to re-open if you want to come back to this! |
I'm having trouble testing locally, so this is a WIP PR so I can see how this code does in the CI.
See #6483 and #6992.