np.histogram() inconsistent results

I think that #6100 may have introduced a bug in some edge cases. Here is a case found through fuzz testing:

```
|35> import numpy as np
|36> np.__version__
'1.10.4'
|37> arr = np.array([337, 404, 739, 806, 1007, 1811, 2012])
|39> hist, edges = np.histogram(arr, bins=8296, range=(2, 2280))
|40> mask = hist > 0
|41> left_edges = edges[:-1][mask]
|42> right_edges = edges[1:][mask]
|43> zip(arr, left_edges, right_edges)
[(337, 337.00000000000006, 337.27459016393448),
 (404, 404.00000000000006, 404.27459016393448),
 (739, 739.00000000000011, 739.27459016393448),
 (806, 806.00000000000011, 806.27459016393448),
 (1007, 1007.0000000000001, 1007.2745901639345),
 (1811, 1811.0000000000002, 1811.2745901639346),
 (2012, 2012.0000000000002, 2012.2745901639346)]
```

This was found through fuzz testing an internal accelerated histogram routine (which takes roughly the same strategy as #6100 but implemented in Cython) and comparing the results to `np.histogram()`. Comparing with numpy 1.9.2, everything worked identically. At least as of numpy 1.10.4, some of these edge cases popped up.

I believe the main difference between #6100 and our routine is that we do not precompute the histogram scaling factor `bins / (mx - mn)`: https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py#L638
We compute `((tmp_a - mn) * bins) / (mx - mn)` . The comments of our routine mention floating point problems as the reason for deliberately avoiding precomputation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

np.histogram() inconsistent results #7628

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

np.histogram() inconsistent results #7628

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions