Description
I am using numpy to calulate the number of each element in an uint8 dtype ndarray, but I met some strange issue,like this:
a = np.random.randint(0, 256, (10980, 10980)).astype(np.uint8)
s_values, s_idx, s_counts = np.unique(a, return_inverse=True, return_counts=True)
print(lenth(s_values) # 256
print(s_counts)
[470996 472336 472608 469978 471579 470698 470149 471167 470199 471153
471447 471972 470542 471369 470223 472189 470976 470971 471516 472120
470907 470808 470535 471825 470549 470794 471589 471267 470354 471701
472085 471199 469698 471675 470855 471282 470304 470156 471347 469881
470653 470901 470820 470253 470350 471043 470839 470262 470185 471895
470491 470297 471471 472009 471284 470593 470489 470297 470743 470461
472091 470997 472254 470388 470334 471444 470368 470054 468965 471678
470659 471519 470368 471808 470282 470356 471117 470904 470146 471554
470983 470734 471132 472531 471347 471159 471958 470035 470082 470979
471059 471713 472054 472314 470982 470142 471811 469707 471153 470774
470882 469754 470347 471465 471326 470490 470157 470703 470851 471749
470820 471016 472073 471125 470411 471177 470608 472016 470410 470624
470940 471711 471198 471620 470899 470480 471047 471037 470763 469869
471405 471485 470928 470446 470314 469986 471456 470344 469462 471189
471236 470927 470971 470620 471029 470045 470194 471149 472302 470903
470800 471068 471584 469641 471862 471931 471446 471432 469624 471306
470597 470624 471715 470632 470675 469995 472048 472247 470595 470474
470176 470209 469369 470637 471426 471391 470602 471379 469996 471050
470192 470801 470168 470905 471115 471436 471910 471125 469920 470043
470541 470743 471300 471162 471920 472646 471269 471604 469770 470841
470523 471890 470018 470805 470178 471287 470340 470491 470361 470354
470911 469871 470247 471402 470242 470931 471327 471024 472331 470700
471708 470661 470969 471026 471450 471053 470415 470623 470546 470612
470266 470994 471355 470044 470713 471846 471249 471964 470706 469506
470391 471127 471511 472138 471170 471721 471438 471965 471573 471211
471939 470819 469529 470699 470280 471779]
When the array is not masked, the lenth of s_value and S-counts is correct, and we can see that the number of element 0 is 470996.but if i masked 0 element, things got changed.
# mask 0 value
y = np.ma.masked_equal(a, 0)
s_values2, s_idx2, s_counts2 = np.unique(y, return_inverse=True, return_counts=True)
print(len(s_values2)) # 471534
Now normally s_values2 should only have 256 values, but in my case, it shows 471534 values. If np.ma.masked_equal works right, this processure should only convert element 0 into a value that is not within the range of (0,256), so it will be recongized as a masked value, but the lenth of the s_value should still be 256, that is from 1 to 255 plus a masked value which is 0, but things aren't going as I expected. I don't know why this is happening.
Note, this issue only happens when the dtype is uint8, if the dtype is float64, no matter we masked this array or not, this won't happen, also, if the array is not that big, like a 1000*1000 uin8 ndarray, it will also works right.