Skip to content

np.histogram() inconsistent results #7628

Closed
@rkern

Description

@rkern

I think that #6100 may have introduced a bug in some edge cases. Here is a case found through fuzz testing:

|35> import numpy as np
|36> np.__version__
'1.10.4'
|37> arr = np.array([337, 404, 739, 806, 1007, 1811, 2012])
|39> hist, edges = np.histogram(arr, bins=8296, range=(2, 2280))
|40> mask = hist > 0
|41> left_edges = edges[:-1][mask]
|42> right_edges = edges[1:][mask]
|43> zip(arr, left_edges, right_edges)
[(337, 337.00000000000006, 337.27459016393448),
 (404, 404.00000000000006, 404.27459016393448),
 (739, 739.00000000000011, 739.27459016393448),
 (806, 806.00000000000011, 806.27459016393448),
 (1007, 1007.0000000000001, 1007.2745901639345),
 (1811, 1811.0000000000002, 1811.2745901639346),
 (2012, 2012.0000000000002, 2012.2745901639346)]

This was found through fuzz testing an internal accelerated histogram routine (which takes roughly the same strategy as #6100 but implemented in Cython) and comparing the results to np.histogram(). Comparing with numpy 1.9.2, everything worked identically. At least as of numpy 1.10.4, some of these edge cases popped up.

I believe the main difference between #6100 and our routine is that we do not precompute the histogram scaling factor bins / (mx - mn): https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py#L638
We compute ((tmp_a - mn) * bins) / (mx - mn) . The comments of our routine mention floating point problems as the reason for deliberately avoiding precomputation.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions