Skip to content

plt.hist fails when data contains nan values #6992

Closed
@maahn

Description

@maahn

I noticed that the histogram plot fails when the data includes nan values

import numpy as np
import matplotlib.pyplot as plt
data = np.random.random(100)
data[10] = np.nan
plt.hist(data)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-bf65746c0a8a> in <module>()
      3 data = np.random.random(100)
      4 data[10] = np.nan
----> 5 plt.hist(data)

//anaconda/lib/python2.7/site-packages/matplotlib/pyplot.pyc in hist(x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, hold, data, **kwargs)
   2955                       histtype=histtype, align=align, orientation=orientation,
   2956                       rwidth=rwidth, log=log, color=color, label=label,
-> 2957                       stacked=stacked, data=data, **kwargs)
   2958     finally:
   2959         ax.hold(washold)

//anaconda/lib/python2.7/site-packages/matplotlib/__init__.pyc in inner(ax, *args, **kwargs)
   1817                     warnings.warn(msg % (label_namer, func.__name__),
   1818                                   RuntimeWarning, stacklevel=2)
-> 1819             return func(ax, *args, **kwargs)
   1820         pre_doc = inner.__doc__
   1821         if pre_doc is None:

//anaconda/lib/python2.7/site-packages/matplotlib/axes/_axes.pyc in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
   5983             # this will automatically overwrite bins,
   5984             # so that each histogram uses the same bins
-> 5985             m, bins = np.histogram(x[i], bins, weights=w[i], **hist_kwargs)
   5986             m = m.astype(float)  # causes problems later if it's an int
   5987             if mlast is None:

//anaconda/lib/python2.7/site-packages/numpy/lib/function_base.pyc in histogram(a, bins, range, normed, weights, density)
    500     if mn > mx:
    501         raise ValueError(
--> 502             'max must be larger than min in range parameter.')
    503     if not np.all(np.isfinite([mn, mx])):
    504         raise ValueError(

ValueError: max must be larger than min in range parameter.

Or is that intended behavior to make sure people notice that they have faulty values in their data?

It is apparently related to the determination of range, so it would be very easy to fix

--- a/lib/matplotlib/axes/_axes.py
+++ b/lib/matplotlib/axes/_axes.py
@@ -6150,8 +6150,8 @@ class Axes(_AxesBase):
             xmax = -np.inf
             for xi in x:
                 if len(xi) > 0:
-                    xmin = min(xmin, xi.min())
-                    xmax = max(xmax, xi.max())
+                    xmin = min(xmin, np.nanmin(xi))
+                    xmax = max(xmax, np.nanmax(xi))
             bin_range = (xmin, xmax)

However, with this fix one gets a warning when plotting data with nans, but this is already addressed in #6483.

Max

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions