-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
.itemsize vs .dtype.itemsize on np.unicode_ objects #8901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is problematic with something like this: >>> x.view((np.byte, x.nbytes)) # this should usually work
ValueError: new type not compatible with array.
>>> x.view((np.byte, x.nbytes * 2)) # ???
array([116, 0, 0, 0, 101, 0, 0, 0, 115, 0, 0, 0, 116,
0, 0, 0], dtype=int8) Although it seems that it is sometimes correct: >>> b = buffer(x)
>>> len(b) == x.nbytes
True
>>> list(b)
>>> list(buffer(x))
['t', '\x00', 'e', '\x00', 's', '\x00', 't', '\x00'] Why does unicode seem to have different binary representations depending on how you look at it? Aren't both |
Python 2,7 can be compiled with either 16 or 32 bit unicode whereas Numpy always uses 32 bit unicode when storing unicode strings in order to avoid problems. |
Except in |
I'll guess that
and
I believe Microsoft itself uses wide characters (16 bits) for unicode. |
If you mean
Indeed - I'm only seeing this for that reason, but I think the difference in storage between arrays and numpy scalars is screwy |
The numpy scalar type looks to subclass
|
See
|
Yep, you're right, that is the reason |
Pretty sure this was fixed by #15385, now all unicode objects in numpy are UCS4. |
Uh oh!
There was an error while loading. Please reload this page.
Was slightly surprised today to find that these are not equal
One of these is not like the others. Why?
The text was updated successfully, but these errors were encountered: