Skip to content

Constructing object array from void array leads to use-after-free #8129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jzwinck opened this issue Oct 10, 2016 · 4 comments · Fixed by #10044
Closed

Constructing object array from void array leads to use-after-free #8129

jzwinck opened this issue Oct 10, 2016 · 4 comments · Fixed by #10044

Comments

@jzwinck
Copy link
Contributor

jzwinck commented Oct 10, 2016

If you do this:

np.array(np.zeros(3, 'i4'), object)

You get an array of 3 objects, where each is a (reference to an) integer. However, if you do this:

np.array(np.zeros(3, 'V4'), object)

You get an array of 3 read-write buffer ptr objects, for example:

array([<read-write buffer ptr 0x19965d0, size 4 at 0x7f556a3568f0>,
       <read-write buffer ptr 0x19965d4, size 4 at 0x7f556a356930>,
       <read-write buffer ptr 0x19965d8, size 4 at 0x7f556a3568b0>], dtype=object)

And while these buffers do allow you to access the void data, they do not hold any reference to it. This leads to use-after-free memory access violations, like this:

va = np.zeros(300000, 'V4')
oa = np.array(va, object)
del va
print(oa[0][0], oa[-1][-1])

That often causes a segmentation fault on my machine (Linux).

This is also the root cause of a crash in Pandas: pandas-dev/pandas#14349

I'm using NumPy 1.10.2 and 1.11.1 on Python 3.5.

@ahaldane
Copy link
Member

Right. The problem occurs at the end of VOID_getitem where it returns the result of PyBuffer_FromReadWriteMemory, which does not hold any references to the original array. Presumably we can also get segfaults by doing x = va[0].item(); del va; x[0] although I can't get it to happen here.

I would have to read up on it, but it looks like the "old-style" PyBufferObject being used here has a base field, since PyBuffer_FromObject says it keeps a ref to the base object. Maybe there is a way to finagle it so the base gets set. It might also be worth upgrading that code to use memoryviews.

@charris
Copy link
Member

charris commented Oct 13, 2016

Moving to memoryviews would be good. They are available in Python 2.7, so python version dependency is not a problem.

@ahaldane
Copy link
Member

ahaldane commented Oct 13, 2016

Yeah that would be good because it would unify the py2 and py3 code paths, which are totally separate right now.

Also, I was debugging this a little more, and did find a simpler (py2) example that segfaults:

>>> va = np.zeros(300000, 'V4')
>>> x = va[:1].item()
>>> del va
>>> x[0]
zsh: segmentation fault (core dumped)

The difference beween va[0].item() and va[:1].item() is that the former calls item on a void scalar, and it turns out that unstructured void scalars make a copy (not a view like structured void scalars), so x does not point to va's memory. The latter case with [:1] calls item on a size-1 array, which ends up being a view, and so can segfault.

Also, because the py3 code is totally different it does not have this problem. Although, I can get some pretty strange things to happen:

>>> va = np.zeros(300000, 'V4')
>>> va[0].item().base
array(array([0, 0, 0, 0], dtype=int8), 
      dtype='|V4')

not sure why it prints that way!

@ahaldane
Copy link
Member

Oops this shouldn't be closed yet. I accidentally used the phrase "fixes" in the PR description of #10044.

This will be be fixed in the future by #8157

@ahaldane ahaldane reopened this Nov 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants