Skip to content

Conversation

ziyan
Copy link
Contributor

@ziyan ziyan commented Nov 8, 2017

GIL unnecessary when numpy floating point error handling is set to ignore.

Fixes an issue where numpy might deadlock when computing a**2 where a is tiny, e.g. -2.3693744349064819e-197:

#0  0x00007f6ff8c5b536 in do_futex_wait.constprop () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f6ff8c5b5e4 in __new_sem_wait_slow.constprop.0 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f6ff20b6768 in PyThread_acquire_lock (lock=0x55e085e26020, waitflag=<optimized out>) at ../Python/thread_pthread.h:324
#3  0x00007f6ff2028556 in PyEval_RestoreThread (tstate=0x7f6fe11adf40) at ../Python/ceval.c:359
#4  0x00007f6ff20e0b96 in PyGILState_Ensure () at ../Python/pystate.c:611
#5  0x00007f6fcb880dd6 in _error_handler (method=method@entry=0, errobj=errobj@entry=('double_scalars', None), errtype=errtype@entry=0x7f6fcb8d28c1 "underflow", retstatus=retstatus@entry=4, first=first@entry=0x7f6fef64c7b0) at numpy/core/src/umath/ufunc_object.c:119
#6  0x00007f6fcb8872ff in PyUFunc_handlefperr (errmask=521, errobj=('double_scalars', None), retstatus=retstatus@entry=4, first=first@entry=0x7f6fef64c7b0) at numpy/core/src/umath/ufunc_object.c:209
#7  0x00007f6fcb894f08 in double_power (a=<optimized out>, b=2, __NPY_UNUSED_TAGGEDc=<optimized out>) at numpy/core/src/umath/scalarmath.c.src:1168
#8  0x00007f6ff2083a17 in ternary_op.isra.5 (v=<optimized out>, w=<optimized out>, z=None, op_slot=48) at ../Objects/abstract.c:1065
#9  0x00007f6ff2029e0a in PyEval_EvalFrameEx

P.S. the stack is coming from numpy 1.11, but same problem seems to exist on master.

GIL unnecessary when numpy floating point error handling is set to ignore.
@njsmith
Copy link
Member

njsmith commented Nov 8, 2017 via email

@ziyan
Copy link
Contributor Author

ziyan commented Nov 8, 2017

There is no bug open, because I have not found a way to reproduce the deadlock with simple script. I will get back to you on that.

@ziyan
Copy link
Contributor Author

ziyan commented Nov 8, 2017

I added a simple reproduction in this repository:
https://github.com/ziyan/numpy-wsgi-deadlock

You will need docker to reproduce it. But it is a simple apache2 wsgi running a simple python script hello.wsgi.

@eric-wieser
Copy link
Member

eric-wieser commented Nov 8, 2017

Diff looks fine, but if there's a deadlock here, can't we still hit it in the other modes anyway? This doesn't seem to solve the underlying problem to me.

@ziyan
Copy link
Contributor Author

ziyan commented Nov 8, 2017

You are right. It does not. I think the deadlock is an instance of gh-8559. And the conclusion there is that wsgi under default settings is not supported by numpy because it uses multiple sub interpreters. I tried WSGIApplicationGroup %{GLOBAL} and the problem seems to go away.

@eric-wieser
Copy link
Member

Sounds pretty convincing to me. The patch looks uncontroversial, so I'll put it in. Thanks!

@eric-wieser eric-wieser merged commit 3d0c041 into numpy:master Nov 8, 2017
@eric-wieser
Copy link
Member

Yeah, this sure looks like gh-5856 to me

@ziyan
Copy link
Contributor Author

ziyan commented Nov 8, 2017

Thank you for the quick response!

@eric-wieser eric-wieser added 01 - Enhancement component: numpy._core Embedded Issues regarding embedded python interpreters labels Nov 8, 2017
@twmr
Copy link

twmr commented Dec 1, 2017

Does this PR fix the deadlock in the following simple program ?

//
// gcc pyinterptest.c -I/usr/include/python2.7 -Wall -lpython2.7 -o pyinterptest && ./pyinterptest
//
#include <Python.h>


void run_numpy_code(void){
    PyRun_SimpleString("import numpy as np");
    PyRun_SimpleString("print(3)");
    PyRun_SimpleString("print(np.array([1.23434e-312])**2)");
    /* PyRun_SimpleString("print(np.array([1.23434e-312])**1.)"); */
}


void foo1(void) {
    // does not hang
    PyThreadState* mainThread = PyEval_SaveThread(); // REL GIL
    PyThreadState_New(mainThread->interp);
    PyEval_RestoreThread(mainThread); // ACC GIL

    run_numpy_code(); // does not hang
}

void foo2(void) {
    PyThreadState* mainThread = PyEval_SaveThread(); // REL GIL
    PyThreadState* t1 = PyThreadState_New(mainThread->interp);
    PyEval_RestoreThread(t1); // ACC GIL

    run_numpy_code(); // hangs
}

void foo3(void) {
    PyThreadState* mainThread = PyEval_SaveThread(); // REL GIL
    PyThreadState* t1  = PyThreadState_New(mainThread->interp);
    PyEval_RestoreThread(t1);
    PyEval_SaveThread(); // REL GIL
    PyEval_RestoreThread(mainThread); // ACC GIL

    run_numpy_code(); // does not hang
}

int main(int argc, char *argv[]) {
    Py_Initialize();
    PyEval_InitThreads(); // ACC GIL

    foo2();
    Py_Finalize();
    return 0;
}

@twmr
Copy link

twmr commented Dec 2, 2017

I've just compiled numpy@master locally and can verify that the above program does not hang anymore. 👍

Do we want to add a stripped down version of the above program to the numpy unit tests? @eric-wieser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement component: numpy._core Embedded Issues regarding embedded python interpreters
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants