-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[WIP] Masking invalid x and/or weights in hist (#6483) #7133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5871,7 +5871,7 @@ def hist(self, x, bins=None, range=None, normed=False, weights=None, | |
Parameters | ||
---------- | ||
x : (n,) array or sequence of (n,) arrays | ||
Input values, this takes either a single array or a sequency of | ||
Input values, this takes either a single array or a sequence of | ||
arrays which are not required to be of the same length | ||
|
||
bins : integer or array_like or 'auto', optional | ||
|
@@ -6042,8 +6042,8 @@ def hist(self, x, bins=None, range=None, normed=False, weights=None, | |
|
||
""" | ||
def _normalize_input(inp, ename='input'): | ||
"""Normalize 1 or 2d input into list of np.ndarray or | ||
a single 2D np.ndarray. | ||
"""Normalize 1 or 2d input into a list of one or more np.ndarrays | ||
which have potentially different lengths. | ||
|
||
Parameters | ||
---------- | ||
|
@@ -6054,25 +6054,30 @@ def _normalize_input(inp, ename='input'): | |
""" | ||
if (isinstance(x, np.ndarray) or | ||
not iterable(cbook.safe_first_element(inp))): | ||
# TODO: support masked arrays; | ||
inp = np.asarray(inp) | ||
|
||
inp = cbook.safe_masked_invalid(inp) | ||
if inp.ndim == 2: | ||
# 2-D input with columns as datasets; switch to rows | ||
inp = inp.T | ||
|
||
if inp.shape[1] < inp.shape[0]: | ||
warnings.warn( | ||
'2D hist input should be nsamples x nvariables;\n ' | ||
'this looks transposed ' | ||
'(shape is %d x %d)' % inp.shape[::-1]) | ||
|
||
# Change to a list of arrays | ||
inp = [inp[i, :] for i in range(inp.shape[0])] | ||
|
||
elif inp.ndim == 1: | ||
# new view, single row | ||
inp = inp.reshape(1, inp.shape[0]) | ||
# Change to a list with a single array | ||
inp = [inp.reshape(1, inp.shape[0])] | ||
else: | ||
raise ValueError( | ||
"{ename} must be 1D or 2D".format(ename=ename)) | ||
if inp.shape[1] < inp.shape[0]: | ||
warnings.warn( | ||
'2D hist input should be nsamples x nvariables;\n ' | ||
'this looks transposed ' | ||
'(shape is %d x %d)' % inp.shape[::-1]) | ||
else: | ||
# multiple hist with data of different length | ||
inp = [np.asarray(xi) for xi in inp] | ||
# Change to a list of arrays | ||
inp = [cbook.safe_masked_invalid(arr) for arr in inp] | ||
|
||
return inp | ||
|
||
|
@@ -6117,23 +6122,23 @@ def _normalize_input(inp, ename='input'): | |
binsgiven = (cbook.iterable(bins) or bin_range is not None) | ||
|
||
# basic input validation | ||
flat = np.ravel(x) | ||
|
||
input_empty = len(flat) == 0 | ||
input_empty = len(np.ravel(x)) == 0 | ||
|
||
# Massage 'x' for processing. | ||
# Massage shape of 'x' to be a list of arrays | ||
if input_empty: | ||
x = np.array([[]]) | ||
x = [np.array([[]])] | ||
else: | ||
x = _normalize_input(x, 'x') | ||
nx = len(x) # number of datasets | ||
|
||
# We need to do to 'weights' what was done to 'x' | ||
if weights is not None: | ||
w = _normalize_input(weights, 'weights') | ||
else: | ||
# Massage shape of 'weights' to be a list, where each element | ||
# weights[i] is either None or an array with the same shape as x[i] | ||
if weights is None: | ||
w = [None]*nx | ||
else: | ||
w = _normalize_input(weights, 'weights') | ||
|
||
# Comparing shape of weights vs. x | ||
if len(w) != nx: | ||
raise ValueError('weights should have the same shape as x') | ||
|
||
|
@@ -6142,6 +6147,16 @@ def _normalize_input(inp, ename='input'): | |
raise ValueError( | ||
'weights should have the same shape as x') | ||
|
||
# Combine the masks from x[i] and w[i] (if applicable) into a single | ||
# mask and apply it to both. | ||
if not input_empty: | ||
for i in range(len(x)): | ||
mask_i = x[i].mask | ||
if w[i] is not None: | ||
mask_i = mask_i | w[i].mask | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would raise an error if the weight matrix contained mask elements that weren't masked in the data There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using histograms with weights is reasonable way to do spatial integration of images, ex That said, this is something that should be pushed down into numpy? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But that explains the mask on x, not on the weights. I am also fine with a mask on the weights as long as it is identical as the one on x. But masks different on the weights and on x seem like it may be an error on the users side, hence throwing a ValueError. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean the other way around xx = convent_to_1D(imgs.shape[:-1])
for im in imgs:
m = compute_mask(im)
w = ma.masked_array(im, m)
ax.hist(m, bins, weight=w) Basically, if we are going to allow masking on either, masking on both and taking the union seems like the best course. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree; combining independent masks on both via a union is not difficult, it is easy to explain, and it handles every situation that might arise. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @efiring Thanks for the cbook tip again, between that and a realization that masks seem to stack, I believe I can rewrite this piece as:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not quite what I had in mind... xclean, wclean = [], []
for xi, wi in zip(x, w):
xiclean, wiclean = cbook.delete_masked_points(xi, wi)
xclean.append(xiclean)
wclean.append(wiclean) This works regardless of whether wi is None or a sequence the length of xi. Using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ahh, that's pretty nice, I didn't read the docs carefully enough to see the part about any points masked in one arg getting masked and deleted in the others. I'm not sure I understand what 'All input arguments that are not passed unchanged' means though... |
||
w[i] = np.ma.masked_array(w[i], mask=mask_i).compressed() | ||
x[i] = np.ma.masked_array(x[i], mask=mask_i).compressed() | ||
|
||
if color is None: | ||
color = [self._get_lines.get_next_color() for i in xrange(nx)] | ||
else: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am 👎 on this warning, I think it is assuming too much about how users will be using this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now see that this is just moving this warning around. I would be in favor of removing it, but if it is already in the code base, mostly ignore my previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually just migrated up from where it was before (old line 6068). I don't care for it either but left it alone thinking it probably got debated when it was added. I'm more than happy to remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. I'm good either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absent a compelling reason I err on the side of not removing things (which may not be a fully defend able position, but it is a safe conservative one)
Chasing this back through git blame, the warning came into the code base in
214976f in 2008.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also in favor of removing it (I am biased as I am in a field where nsamples << nvariables )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, wow, that I can see being a problem.
Maybe throw an error (or raise a warning) if x.shape[0] == 1?