Using numpy.unique with a masked array get some strange issue

I am using numpy to calulate the number of each element in an uint8 dtype ndarray, but I met some strange issue,like this:
    
    a = np.random.randint(0, 256, (10980, 10980)).astype(np.uint8)
    s_values, s_idx, s_counts = np.unique(a, return_inverse=True, return_counts=True)
    print(lenth(s_values)  # 256
    print(s_counts)
    
    [470996 472336 472608 469978 471579 470698 470149 471167 470199 471153
    471447 471972 470542 471369 470223 472189 470976 470971 471516 472120
    470907 470808 470535 471825 470549 470794 471589 471267 470354 471701
    472085 471199 469698 471675 470855 471282 470304 470156 471347 469881
    470653 470901 470820 470253 470350 471043 470839 470262 470185 471895
    470491 470297 471471 472009 471284 470593 470489 470297 470743 470461
    472091 470997 472254 470388 470334 471444 470368 470054 468965 471678
    470659 471519 470368 471808 470282 470356 471117 470904 470146 471554
    470983 470734 471132 472531 471347 471159 471958 470035 470082 470979
    471059 471713 472054 472314 470982 470142 471811 469707 471153 470774
    470882 469754 470347 471465 471326 470490 470157 470703 470851 471749
    470820 471016 472073 471125 470411 471177 470608 472016 470410 470624
    470940 471711 471198 471620 470899 470480 471047 471037 470763 469869
    471405 471485 470928 470446 470314 469986 471456 470344 469462 471189
    471236 470927 470971 470620 471029 470045 470194 471149 472302 470903
    470800 471068 471584 469641 471862 471931 471446 471432 469624 471306
    470597 470624 471715 470632 470675 469995 472048 472247 470595 470474
    470176 470209 469369 470637 471426 471391 470602 471379 469996 471050
    470192 470801 470168 470905 471115 471436 471910 471125 469920 470043
    470541 470743 471300 471162 471920 472646 471269 471604 469770 470841
    470523 471890 470018 470805 470178 471287 470340 470491 470361 470354
    470911 469871 470247 471402 470242 470931 471327 471024 472331 470700
    471708 470661 470969 471026 471450 471053 470415 470623 470546 470612
    470266 470994 471355 470044 470713 471846 471249 471964 470706 469506
    470391 471127 471511 472138 471170 471721 471438 471965 471573 471211
    471939 470819 469529 470699 470280 471779]

When the array is not masked, the lenth of s_value and S-counts is correct, and we can see that the number of element 0 is 470996.but if i masked 0 element, things got changed.
    
    # mask 0 value
    y = np.ma.masked_equal(a, 0)

    s_values2, s_idx2, s_counts2 = np.unique(y, return_inverse=True, return_counts=True)

    print(len(s_values2))  # 471534

Now normally s_values2 should only have 256 values, but in my case, it shows 471534 values. If np.ma.masked_equal works right, this processure should only convert element 0 into a value that is not within the range of (0,256), so it will be recongized as a masked value, but the lenth of the s_value should still be 256, that is from 1 to 255 plus a masked value which is 0, but things aren't going as I expected. I don't know why this is happening.
    
Note, this issue only happens when the dtype is uint8, if the dtype is float64, no matter we masked this array or not, this won't happen, also, if the array is not that big, like a 1000*1000 uin8 ndarray, it will also works right.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Using numpy.unique with a masked array get some strange issue #14804

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Using numpy.unique with a masked array get some strange issue #14804

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions