Skip to content

numpy.linalg and multiprocessing crash (Trac #2201) #654

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
numpy-gitbot opened this issue Oct 19, 2012 · 12 comments · Fixed by #4194
Closed

numpy.linalg and multiprocessing crash (Trac #2201) #654

numpy-gitbot opened this issue Oct 19, 2012 · 12 comments · Fixed by #4194
Assignees
Labels
00 - Bug component: numpy.linalg Priority: high High priority, also add milestones for urgent issues

Comments

@numpy-gitbot
Copy link

Original ticket http://projects.scipy.org/numpy/ticket/2201 on 2012-08-14 by trac user agchang, assigned to @pv.

There seems to be an issue with numpy.linalg being called in unison with python's multiprocessing module. When invoking a numpy.linalg method from within a subprocess, e.g., through a worker pool and map() function, the program hangs and ignores all interrupt signals.

The code provided below will demonstrate the case:

import numpy as np
import multiprocessing as mp

def foo(x):
    print np.linalg.inv([[2,3],[2,2]]) #this causes the crash
    #print np.dot([[1,2],[3,4]],[[1,2],[3,4]]) # this works fine

def test():
    print "running..."
    print np.__version__
    print mp.__version__
    vals = [1,2,3,4]
    #pool = mp.Pool(1) #This has an issue
    #pool = mp.Pool(mp.cpu_count()) #this has an issue
    pool.map(foo, vals)

if __name__ == "__main__":
    test() 
    #foo(1) #this works fine

By hang I mean the program becomes unresponsive and does not respond to interrupts(Ctrl-C).

I took a look using pdb and it seems that it hangs after the call
waiter.acquire() in threading.py(the python system module) so I suspect some sort of deadlock?

multiprocessing.version is 0.70a1
numpy.version is 1.6.1

Some potentially related packages?

ii libblas3gf 1.2.20110419-2 Basic Linear Algebra Reference implementatio
ii liblapack3gf 3.3.1-1 library of linear algebra routines 3 - share
ii python 2.7.3-0ubuntu2 interactive high-level object-oriented langu
ii python-numpy 1:1.6.1-6ubunt Numerical Python adds a fast array facility

My kernel version is(uname -a):

Linux agc 3.2.0-29-generic #46-Ubuntu SMP Fri Jul 27 17:03:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Let me know for any additional information.

Thanks.

@numpy-gitbot
Copy link
Author

@pv wrote on 2012-08-14

I cannot reproduce this (Ubuntu x86, OpenSUSE x86_64, both with Numpy 1.6.1).

@numpy-gitbot
Copy link
Author

trac user agchang wrote on 2012-08-14

Sorry, I forgot to mention something very relevant. This bug seems to occur only on specific hardware, i.e, Macbook Pro(Mid-2010). I ran this also on a non Macintosh machine and it was able to work. Also, I'm not sure how relevant this ticket it is, but it seems to have similar problems as well.

http://projects.scipy.org/numpy/ticket/2091

However, this person uses OSX so it is presumably on Macintosh hardware as well.

@numpy-gitbot
Copy link
Author

trac user agchang wrote on 2012-08-14

The machine that it works on(the non-Macintosh) has a kernel version of:

Linux byrd 3.2.0-27-generic #43-Ubuntu SMP Fri Jul 6 14:25:57 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

@numpy-gitbot
Copy link
Author

trac user agchang wrote on 2012-08-15

This also seems to run on Mac OSX Snow Leopard my machine as well.

@numpy-gitbot
Copy link
Author

@pv wrote on 2012-08-15

This is probably relevant:
http://mail.scipy.org/pipermail/numpy-discussion/2012-August/063589.html

If this is indeed the origin of this crash, then it's an Apple issue with multiprocessing, and not fixable by us (except by shipping binaries with Accelerate disabled, but that's a workaround).

@numpy-gitbot
Copy link
Author

trac user agchang wrote on 2012-08-22

Hm. Sounds like it. Would it still make sense that it works on Mac OSX Snow Leopard on my machine, but not Ubuntu? Maybe Ubuntu does utilize the Accelerate framework while Mac OSX does not, by default?

@numpy-gitbot
Copy link
Author

@pv wrote on 2012-08-22

I must have misread this bug report originally. Accelerate is Apple software that comes with OSX, so I don't think running Ubuntu on the hardware uses it. So this seems to be a different issue than #2683

@iskandr
Copy link

iskandr commented Jan 18, 2013

I'm experiencing the same bug on Ubuntu 12.04 with NumPy 1.6.1 and Python 2.7.

I just upgraded to NumPy 1.6.2 and the problem has gone away.

@francisquintallauzon
Copy link

I get the same issue on :

Ubuntu 13.04 running kernel version 3.8.0-30-generic
Python version = 2.7.4
numpy version = 1.7.1
multiprocessing version = 0.70a1

In an experiment very similar to the one reported in the bug description by @numpy-gitbot, the code hangs when executing the np.linalg.eig function (the same happens with the np.linalg.inv)

from multiprocessing        import Process as process

def eigen_values():
    import numpy as np
    print 'Test eig with multiprocessing...'
    data = np.random.normal(0, 1, (2,2))
    cov = np.cov(data, rowvar=0)

    # Execution hangs when executing the following
    l, w = np.linalg.eig(cov)

p = process(target=eigen_values)
p.start()
p.join()

@francisquintallauzon
Copy link

Update : This seems to be an issue with OpenBLAS and multiprocessing. Setting an environment variable OPENBLAS_NUM_THREADS=1 is a workaround to this issue.

I got this from :
http://numpy-discussion.10968.n7.nabble.com/svd-multiprocessing-hangs-tc34252.html#none

@SmokinCaterpillar
Copy link

I'm having the very same issue on Ubuntu 12.04 with NumPy 1.8.0, Python 2.7. and multiprocessing 0.70a1.

The following snippet (from above) does not terminate:

import numpy as np
import multiprocessing as mp

def foo(x):
    print np.linalg.inv([[2,3],[2,2]]) #this causes the crash
    #print np.dot([[1,2],[3,4]],[[1,2],[3,4]]) # this works fine

def test():
    print "running..."
    print np.__version__
    print mp.__version__
    vals = [1,2,3,4]
    pool = mp.Pool(1) #This has an issue
    #pool = mp.Pool(mp.cpu_count()) #this has an issue
    pool.map(foo, vals)

if __name__ == "__main__":
    test()
    #foo(1) #this works fine

EDIT:
In fact, setting OPENBLAS_NUM_THREADS=1 did help to solve the problem!

@larsmans
Copy link
Contributor

This should really be documented somewhere; I don't think it can be fixed by NumPy. The thing is that multiprocessing on a POSIX system is unreliable in the face of multithreaded libraries such as OpenBLAS, because POSIX does not allow fork without exec in a multithreaded context. Python 3.4 promises to solve this by a new feature in multiprocessing called the forkserver, but the problem is only solved when that is enabled explicitly at program startup.

larsmans added a commit to larsmans/numpy that referenced this issue Jan 14, 2014
Fixes numpy#654 by not fixing it; I don't think NumPy *can* actually fix
the problem as it's a design flaw in Python's multiprocessing. Listed
various alternatives (Python 3.4 forkserver, single-threaded OpenBLAS,
Python threading).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: numpy.linalg Priority: high High priority, also add milestones for urgent issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants