Skip to content

Masking invalid x and/or weights in hist #9706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

dstansby
Copy link
Member

@dstansby dstansby commented Nov 6, 2017

A rebase of #7133, fixes #6483.

@dstansby dstansby added this to the v2.2 milestone Nov 6, 2017
@@ -6061,6 +6061,46 @@ def hist(self, x, bins=None, range=None, density=None, weights=None,
bin_range = range
del range
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a namespace conflict here. range is both the range of histogram and the python function. If I do just do a quick and dirty example:

del range
range(np.arange(100))

I get the below error message:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-21-0eb074616bf8> in <module>()
----> 1 del range
      2 range(np.arange(100))
NameError: name 'range' is not defined

This seems similar to the errors generated in the tests, which comes from using range to refer to the built-in python function on line 6171.

@@ -6061,6 +6061,46 @@ def hist(self, x, bins=None, range=None, density=None, weights=None,
bin_range = range
del range

def _normalize_input(inp, ename='input'):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is that this function has disappeared on the master branch, so I think another approach to the problem is needed.

@dstansby
Copy link
Member Author

dstansby commented Nov 7, 2017

I've tidied this up a bit - using masked arrays is not working at all with the rest of the ax.hist function though... should I be converting them back to normal arrays with the masked values just removed?

@dopplershift
Copy link
Contributor

Yeah, I think cbook.delete_masked_points is the usual way to go here.

@dstansby
Copy link
Member Author

dstansby commented Nov 8, 2017

🤦‍♀️ turns out the issue was PEBKAC and not the masked arrays... should be fine now

@dstansby
Copy link
Member Author

dstansby commented Nov 8, 2017

(this still needs a test and a rebase, but I think the code is fine now)

@efiring
Copy link
Member

efiring commented Nov 11, 2017

Looks like you also need to modify the docstring to note the support for masked arrays and invalid values, and also note the support for masked arrays in the whats_new entry.

TrishGillett and others added 4 commits November 14, 2017 20:10
Tidying up the data prep work in hist.

Mask invalid entries in x and weights before plotting histograms.

Clean up hist masking code

Fix masking
jklymak
jklymak previously approved these changes Dec 2, 2017
Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I'm a little curious what masked values do to density, but I guess thats really in np.histogram, not here.

@dstansby dstansby added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Jan 22, 2018
efiring
efiring previously approved these changes Jan 29, 2018
@tacaswell tacaswell modified the milestones: v2.2, v3.0 Jan 29, 2018
@tacaswell tacaswell removed the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Jan 29, 2018
@efiring efiring dismissed their stale review January 29, 2018 20:57

It looks like this is actually not supporting masked arrays as it says it is.

@tacaswell
Copy link
Member

Due to some on-going discussion about what to do in the case of missing data (and how the handle the averaging).

@jklymak jklymak dismissed their stale review January 29, 2018 20:58

As above - this needs a bit more discussion

Copy link
Member

@efiring efiring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. the docstring is contradictory.
  2. masked arrays and nans are being passed to np.histogram
  3. there are arguments as to how bad values should be handles

@dstansby
Copy link
Member Author

Going to close this since it's been a while - I'll leave my branch around for a bit if anyone thinks it's worthwhile picking up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Range determination for data with NaNs
7 participants