-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Open
Labels
Description
Describe the issue:
numpy.histogram
returns negative values for some bins when the data is provided as list of numerical strings.
Reproduce the code example:
import numpy as np
data = [100]*10 + [200]*14
bins = list(range(0, 2050, 50))
# the string version
print(list(np.histogram([str(x) for x in data], bins=bins)[0]))
# [24, -24, 10, 0, 14, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -14, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 14]
# the int version
print(list(np.histogram([int(x) for x in data], bins=bins)[0]))
# [0, 0, 10, 0, 14, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Error message:
No response
Runtime information:
1.21.5
3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0]
Context for the issue:
I post this issue to throw light on the fact that some issue might lie deeper within numpy and cause other bugs even if the numpy.histogram
function specificly has never been designed to work with a list of strings.
The reason this is significant IMHO is that the result is quite similar yet faulty instead of raising an error or being total garbage: it poses the risk of the resulting data being trusted and not being recognized as errorneous.