Description
I think that #6100 may have introduced a bug in some edge cases. Here is a case found through fuzz testing:
|35> import numpy as np
|36> np.__version__
'1.10.4'
|37> arr = np.array([337, 404, 739, 806, 1007, 1811, 2012])
|39> hist, edges = np.histogram(arr, bins=8296, range=(2, 2280))
|40> mask = hist > 0
|41> left_edges = edges[:-1][mask]
|42> right_edges = edges[1:][mask]
|43> zip(arr, left_edges, right_edges)
[(337, 337.00000000000006, 337.27459016393448),
(404, 404.00000000000006, 404.27459016393448),
(739, 739.00000000000011, 739.27459016393448),
(806, 806.00000000000011, 806.27459016393448),
(1007, 1007.0000000000001, 1007.2745901639345),
(1811, 1811.0000000000002, 1811.2745901639346),
(2012, 2012.0000000000002, 2012.2745901639346)]
This was found through fuzz testing an internal accelerated histogram routine (which takes roughly the same strategy as #6100 but implemented in Cython) and comparing the results to np.histogram()
. Comparing with numpy 1.9.2, everything worked identically. At least as of numpy 1.10.4, some of these edge cases popped up.
I believe the main difference between #6100 and our routine is that we do not precompute the histogram scaling factor bins / (mx - mn)
: https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py#L638
We compute ((tmp_a - mn) * bins) / (mx - mn)
. The comments of our routine mention floating point problems as the reason for deliberately avoiding precomputation.