Skip to content

BUG: np.unicode_ scalars misbehave on narrow builds #3258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pavle opened this issue Apr 17, 2013 · 7 comments · Fixed by #15385
Closed

BUG: np.unicode_ scalars misbehave on narrow builds #3258

pavle opened this issue Apr 17, 2013 · 7 comments · Fixed by #15385

Comments

@pavle
Copy link

pavle commented Apr 17, 2013

Follows on from #1123

I'm assigning a numpy.unicode_ to a slice of array of numpy.unicode_ and see the following:

In [85]: l = np.array([u'blah',u'blah',u'blah'])

In [87]: type(l[0])
Out[87]: numpy.unicode_

In [88]: v = np.unicode_('bubu')

In [89]: l[0] = v

In [90]: l
Out[90]: 
array([u'bubu', u'blah', u'blah'], 
      dtype='<U4')

In [91]: l[1:3] = v

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
In [92]: l
Out[92]: 
array([u'bubu', u'\uf500\udc62\uf500\udc62\u8d00\udc00',
       u'\uf500\udc62\uf500\udc62\u8d00\udc00'], 
      dtype='<U4')
<<<<<<<<<<<< I consider this a problem

In [93]: {k: v for k, v in sysconfig.get_config_vars().items() if 'unicode' in k.lower()}
Out[93]: 
{'PY_UNICODE_TYPE': 'unsigned short',
 'Py_UNICODE_SIZE': 2,
 'Py_USING_UNICODE': 1,
 'UNICODE_OBJS': 'Objects/unicodeobject.o Objects/unicodectype.o'}

In [94]: np.**version**
Out[94]: '1.7.1'
$ uname -a
Linux ... 2.6.32-279.19.1.el6.x86_64 #1 SMP Wed Dec 19 07:05:20 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

On a wide unicode build I can do the same with no problem:

In [81]: l = np.array([u'blah',u'blah',u'blah'])

In [82]: type(l[0])
Out[82]: numpy.unicode_

In [83]: v = np.unicode_('bubu')

In [84]: l[0] = v

In [85]: l
Out[85]: 
array([u'bubu', u'blah', u'blah'], 
      dtype='<U4')

In [86]: l[1:3] = v

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
In [87]: l
Out[87]: 
array([u'bubu', u'bubu', u'bubu'], 
      dtype='<U4')
<<<<<<<<<<<<<<<<<<<<<<<<<<<< this works fine

In [88]: {k: v for k, v in sysconfig.get_config_vars().items() if 'unicode' in k.lower()}
Out[88]: 
{'PY_UNICODE_TYPE': 0,
 'Py_UNICODE_SIZE': 4,
 'Py_USING_UNICODE': 1,
 'UNICODE_OBJS': 'Objects/unicodeobject.o Objects/unicodectype.o'}

In [89]: np.**version**
Out[89]: '1.7.1'
(my27py)pavel@pavel-ThinkPad-T520:~$ uname -a
Linux pavel-ThinkPad-T520 3.2.0-40-generic #64-Ubuntu SMP Mon Mar 25 21:22:10 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
@charris
Copy link
Member

charris commented Feb 22, 2014

Need to track this down with a narrow build.

@eric-wieser
Copy link
Member

eric-wieser commented Apr 7, 2017

To avoid having two issue about the "that unicode scalar bug", #7227 notes that there is the same bug with l.fill(v)

@eric-wieser
Copy link
Member

eric-wieser commented Nov 13, 2017

Another example from #10015:

>>> a = np.zeros(2, dtype='U3')
>>> a[:] = np.unicode_('xxx')
>>> a.view(np.void)
array([b'\x78\x00\x78\x00\x78\x00\x00\x00\xF0\xE6\xD4\x68',
       b'\x78\x00\x78\x00\x78\x00\x00\x00\xF0\xE6\xD4\x68'], dtype='|V12')
# '\x78' == 'x'

@eric-wieser eric-wieser changed the title appranent inconsistency in numpy.unicode_ on narrow builds (follow up to dated Trac #525) BUG: np.unicode_ scalars misbehave on narrow builds (follow up to dated Trac #525) Nov 13, 2017
@eric-wieser eric-wieser changed the title BUG: np.unicode_ scalars misbehave on narrow builds (follow up to dated Trac #525) BUG: np.unicode_ scalars misbehave on narrow builds Nov 13, 2017
@eric-wieser
Copy link
Member

eric-wieser commented Nov 13, 2017

Problem seems to be within PyArray_CopyObject

Specifically, scalar_value is not usable on np.unicode_ objects, since the return value depends on how the codepoints are stored (UCS8. UCS16, UCS32, etc).

@OmerJog
Copy link

OmerJog commented Jun 17, 2019

Whats the status of this? Any plans to fix it?

@seberg
Copy link
Member

seberg commented Jun 17, 2019

Nothing specific right now unfortunately @OmerJog if you have some time to look into things, that is always appreciated!

@eric-wieser
Copy link
Member

#15363 is related

eric-wieser added a commit to eric-wieser/numpy that referenced this issue Feb 8, 2020
These APIs work with either UCS2 or UCS4, depending on the value of `Py_UNICODE_WIDE`.
After python 3.3, there's a better way to handle this type of thing, which means we no longer have to care about this.

Fixes numpygh-3258
Fixes numpygh-15363
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants