BUG: np.unicode_ scalars misbehave on narrow builds #3258

pavle · 2013-04-17T23:52:35Z

Follows on from #1123

I'm assigning a numpy.unicode_ to a slice of array of numpy.unicode_ and see the following:

In [85]: l = np.array([u'blah',u'blah',u'blah'])

In [87]: type(l[0])
Out[87]: numpy.unicode_

In [88]: v = np.unicode_('bubu')

In [89]: l[0] = v

In [90]: l
Out[90]: 
array([u'bubu', u'blah', u'blah'], 
      dtype='<U4')

In [91]: l[1:3] = v

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
In [92]: l
Out[92]: 
array([u'bubu', u'\uf500\udc62\uf500\udc62\u8d00\udc00',
       u'\uf500\udc62\uf500\udc62\u8d00\udc00'], 
      dtype='<U4')
<<<<<<<<<<<< I consider this a problem

In [93]: {k: v for k, v in sysconfig.get_config_vars().items() if 'unicode' in k.lower()}
Out[93]: 
{'PY_UNICODE_TYPE': 'unsigned short',
 'Py_UNICODE_SIZE': 2,
 'Py_USING_UNICODE': 1,
 'UNICODE_OBJS': 'Objects/unicodeobject.o Objects/unicodectype.o'}

In [94]: np.**version**
Out[94]: '1.7.1'

$ uname -a
Linux ... 2.6.32-279.19.1.el6.x86_64 #1 SMP Wed Dec 19 07:05:20 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

On a wide unicode build I can do the same with no problem:

In [81]: l = np.array([u'blah',u'blah',u'blah'])

In [82]: type(l[0])
Out[82]: numpy.unicode_

In [83]: v = np.unicode_('bubu')

In [84]: l[0] = v

In [85]: l
Out[85]: 
array([u'bubu', u'blah', u'blah'], 
      dtype='<U4')

In [86]: l[1:3] = v

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
In [87]: l
Out[87]: 
array([u'bubu', u'bubu', u'bubu'], 
      dtype='<U4')
<<<<<<<<<<<<<<<<<<<<<<<<<<<< this works fine

In [88]: {k: v for k, v in sysconfig.get_config_vars().items() if 'unicode' in k.lower()}
Out[88]: 
{'PY_UNICODE_TYPE': 0,
 'Py_UNICODE_SIZE': 4,
 'Py_USING_UNICODE': 1,
 'UNICODE_OBJS': 'Objects/unicodeobject.o Objects/unicodectype.o'}

In [89]: np.**version**
Out[89]: '1.7.1'

(my27py)pavel@pavel-ThinkPad-T520:~$ uname -a
Linux pavel-ThinkPad-T520 3.2.0-40-generic #64-Ubuntu SMP Mon Mar 25 21:22:10 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

The text was updated successfully, but these errors were encountered:

charris · 2014-02-22T21:56:24Z

Need to track this down with a narrow build.

eric-wieser · 2017-04-07T09:02:29Z

To avoid having two issue about the "that unicode scalar bug", #7227 notes that there is the same bug with l.fill(v)

eric-wieser · 2017-11-13T01:19:44Z

Another example from #10015:

>>> a = np.zeros(2, dtype='U3')
>>> a[:] = np.unicode_('xxx')
>>> a.view(np.void)
array([b'\x78\x00\x78\x00\x78\x00\x00\x00\xF0\xE6\xD4\x68',
       b'\x78\x00\x78\x00\x78\x00\x00\x00\xF0\xE6\xD4\x68'], dtype='|V12')
# '\x78' == 'x'

eric-wieser · 2017-11-13T07:27:51Z

Problem seems to be within PyArray_CopyObject

Specifically, scalar_value is not usable on np.unicode_ objects, since the return value depends on how the codepoints are stored (UCS8. UCS16, UCS32, etc).

OmerJog · 2019-06-17T18:48:11Z

Whats the status of this? Any plans to fix it?

seberg · 2019-06-17T18:50:33Z

Nothing specific right now unfortunately @OmerJog if you have some time to look into things, that is always appreciated!

eric-wieser · 2020-02-04T13:52:38Z

#15363 is related

These APIs work with either UCS2 or UCS4, depending on the value of `Py_UNICODE_WIDE`. After python 3.3, there's a better way to handle this type of thing, which means we no longer have to care about this. Fixes numpygh-3258 Fixes numpygh-15363

charris added Defect labels Feb 22, 2014

jreback mentioned this issue May 27, 2014

HDFStore still corrupted reads with utf8 pandas-dev/pandas#7244

Open

eric-wieser mentioned this issue Apr 6, 2017

.fill called with unicode scalar Python3 on Windows yields UnicodeDecodeError when accessing the array values #7227

Closed

eric-wieser mentioned this issue Nov 13, 2017

UnicodeDecodeError on Windows #10015

Closed

eric-wieser changed the title ~~appranent inconsistency in numpy.unicode_ on narrow builds (follow up to dated Trac #525)~~ BUG: np.unicode_ scalars misbehave on narrow builds (follow up to dated Trac #525) Nov 13, 2017

eric-wieser changed the title ~~BUG: np.unicode_ scalars misbehave on narrow builds (follow up to dated Trac #525)~~ BUG: np.unicode_ scalars misbehave on narrow builds Nov 13, 2017

mattip removed the priority: normal label Oct 21, 2018

eric-wieser mentioned this issue Jan 11, 2019

Problem to accessing column items assigned from single numpy.str_ value in structured array #12670

Closed

eric-wieser mentioned this issue Feb 4, 2020

BUG, MAINT: Stop using the error-prone deprecated Py_UNICODE apis #15385

Merged

seberg closed this as completed in #15385 Feb 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: np.unicode_ scalars misbehave on narrow builds #3258

BUG: np.unicode_ scalars misbehave on narrow builds #3258

pavle commented Apr 17, 2013 •

edited by eric-wieser

Loading

charris commented Feb 22, 2014

eric-wieser commented Apr 7, 2017 •

edited

Loading

eric-wieser commented Nov 13, 2017 •

edited

Loading

eric-wieser commented Nov 13, 2017 •

edited

Loading

OmerJog commented Jun 17, 2019

seberg commented Jun 17, 2019

eric-wieser commented Feb 4, 2020

BUG: np.unicode_ scalars misbehave on narrow builds #3258

BUG: np.unicode_ scalars misbehave on narrow builds #3258

Comments

pavle commented Apr 17, 2013 • edited by eric-wieser Loading

charris commented Feb 22, 2014

eric-wieser commented Apr 7, 2017 • edited Loading

eric-wieser commented Nov 13, 2017 • edited Loading

eric-wieser commented Nov 13, 2017 • edited Loading

OmerJog commented Jun 17, 2019

seberg commented Jun 17, 2019

eric-wieser commented Feb 4, 2020

pavle commented Apr 17, 2013 •

edited by eric-wieser

Loading

eric-wieser commented Apr 7, 2017 •

edited

Loading

eric-wieser commented Nov 13, 2017 •

edited

Loading

eric-wieser commented Nov 13, 2017 •

edited

Loading