Skip to content

BUG: numpy.histogram should raise an error for string arrays #24032

@Semnodime

Description

@Semnodime

Describe the issue:

numpy.histogram returns negative values for some bins when the data is provided as list of numerical strings.

Reproduce the code example:

import numpy as np
data =  [100]*10 + [200]*14
bins = list(range(0, 2050, 50))

# the string version
print(list(np.histogram([str(x) for x in data], bins=bins)[0]))
# [24, -24, 10, 0, 14, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -14, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 14]

# the int version
print(list(np.histogram([int(x) for x in data], bins=bins)[0]))
# [0, 0, 10, 0, 14, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Error message:

No response

Runtime information:

1.21.5
3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0]

Context for the issue:

I post this issue to throw light on the fact that some issue might lie deeper within numpy and cause other bugs even if the numpy.histogram function specificly has never been designed to work with a list of strings.
The reason this is significant IMHO is that the result is quite similar yet faulty instead of raising an error or being total garbage: it poses the risk of the resulting data being trusted and not being recognized as errorneous.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions