-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Parallel K-Means hangs on Mac OS X Lion #636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Under linux or windows? |
Mac OS X Lion (10.7), can anybody reproduce? |
And |
Not in the same session because the process needs to be killed, but it works in an identical IPython session in the same virtualenv. I don't really know how to debug this, I'll try some prints to see where it hangs. Vlad On Feb 16, 2012, at 12:15 , Olivier Grisel wrote:
|
After a heavy tracing session I (strangely) found the hanging line to be a call to The exact same code runs when The interesting part (though I haven't checked this extensively) is that this happens in all 3 of my virtualenvs: stable numpy&scipy, git head numpy&scipy and scipy-superpack. Any ideas? On Feb 16, 2012, at 13:15 , Olivier Grisel wrote:
|
Can you try to attach a gdb to the running python processes and get a backtrace?
|
I'm trying to run scikit-learn for the first time on my Mac and am getting a hang on Mac OS X Lion (10.7), Python 2.7, latest scikit-learn master from Github, virtualenv, numpy==1.6.1, scipy==0.10.1 I haven't tried gdb or anything yet. I'll be at the PyData Workshop tomorrow (Saturday) in case someone is there and wants to poke around on my computer. |
I did a Git bisect to narrow it down:
So far I have been unable to get GDB working with Python. This probably isn't useful, but here's what I get when I Ctrl-C it after it hangs:
|
Ok so the git bisect tells us that this bug was introduced at the same time as we introduce the parallel kmeans feature itself. The python stacktrace might tell use that there is a race condition in multiprocessing: this might be related: to http://bugs.python.org/issue9205 which has been fixed in python 3 but maybe not backported in python 2.7. By the way @vene and @njwilson, which versions of python are you using? Edit: I mean exactly, such as |
I'm using Python 2.7.2 |
Same here... I could never reproduce this race condition. So this might not be linked to an already backported python bugfix. |
How many cores / CPUs do you have on your machine? I am running a dual core machine. |
Same. Dual core. |
@njwilson can you try to checkout and build this branch and see if you can still reproduce the issue? https://github.com/ogrisel/scikit-learn/compare/no-kmeans-profile cython profiling seems to be messing around with python threads and I have heard recently that python forking and the the python threads are not good friends: http://bugs.python.org/issue6721 |
I tried the no-kmeans-profile branch and the problem still exists. |
@cournape and I tracked the bug down to a segfault in Grand Central Dispatch on the BLAS DGEMM function (only after a fork): https://gist.github.com/2027412 I have just reported the issue to Apple. Hopefully it will be fixed at some point. In the mean time we will have to skip this test on OSX I think. |
Wow, great work! Let's hope they are responsive. On Mar 13, 2012, at 09:51 , Olivier Grisel wrote:
|
There is a bug that occurs in the BLAS DGEMM function after a fork on Mac OS X Lion that causes this test to hang. See issue scikit-learn#636 for details. This commit causes the test_k_means_plus_plus_init_2_jobs test to be skipped on this platform. The Mac version is determined from platform.mac_ver() similar to how it is described in the following link. I couldn't find a 'cleaner' way. http://stackoverflow.com/questions/1777344/how-to-detect-mac-os-version-using-python
Now (on Mountain Lion) I don't get hanging but I get random segfaults in the |
I assume this is the same bug @ogrisel and @cournape found, that Apple confirmed as expected behaviour (you can't call accelerate in subprocesses) Can I compile my own parallel, optimized blas and lapack then? Did anybody do this on Lion or Mountain Lion because I can't find much about this endeavour. |
Accelerate can be called after a fork but you need to execv the subprocess with the python binary (which multiprocessing does not do under posix). I think @cournape is working on a numpy build that uses OpenBlas and that could server as a good alternative for Accelerate (even though early benchmark should a slight decrease in performances). |
So is there any way we can avoid these problems on OS X? Can we check the blas before trying to do something in parallel? |
I think @cournape would like to make it possible to switch the blas / lapack implementation for numpy at runtime but that might be pretty tricky to implement and will obviously only be available in some undetermined future version of numpy if ever. I don't know if we can detect whether the Blas in used is linked to Accelerate and issue a warning in joblib prior to forking in that case. |
Hm... ok probably there is no chance really... should we just add this to the docs as a known issue in OS X? |
+1 it never hurts to document stuff. |
does anybody know if EPD on osx produces the same bug? |
I use EPD with MKL on my mac and I've never had a problem so I would |
Which version of scikit-learn? We believe that we have fixed this in |
Hey @GaelVaroquaux, I'm using scikit-learn 0.14.1 np17py27_1 from Anaconda. |
@kevindavenport can you please give us the output of: from numpy.distutils.system_info import get_info
print(get_info('blas_opt')) If you have gdb installed, can you also try?
|
@ogrisel Thanks for your help. The dedication you guys have to this project is incredible!
I found an article stating how I can install gdb with homebrew:
but I also read "gdb has been replaced by lldb, and is no longer supported. gcc and llvm-gcc are also gone, replaced by clang." |
Alright: anaconda is apparently linked against OSX vecLib instead of MKL or Atlas which causes the issue. Therefore this is the same issue as previously. We don't have any clean solution for this in the short term besides advising you to build Atlas from source to use that instead of vecLib. |
Peter Wang from continuum states that they build against Apple Accelerate Framework. Do any of you guys have experience with pointing my environment to Atlas instead? |
Maybe |
Numpy developers started working on a binary package for numpy on OSX that builds against an embedded Atlas that does not suffer from this bug, this the comments in numpy/numpy#4007. If this experiment is validated a such numpy whl packages get published on PyPI we can close this bug and re-enable the skipped tests for future versions of numpy. |
A few months ago I decided to just conda install MKL https://store.continuum.io/cshop/mkl-optimizations/ which would make my python install use the Intel MKL optimizations over the built in accelerate framework. This let me use the parallel features of joblib in scikit, but recently the anaconda 2.0 and/or numpy upgrade broke it again. Everything checks out here though: from numpy.distutils.system_info import get_info
print(get_info('blas_opt'))
print(get_info('lapack_opt'))
|
This is bad news. Can you confirm that the following breaks with numpy + mkl? import numpy as np
from joblib import Parallel, delayed
a = np.random.randn(3000, 3000)
# Force initialization of multithreaded BLAS context in main process
x = np.dot(a, a.T)
if __name__ == "__main__":
# Force usage of of multithreaded BLAS in fork-child processes
Parallel(n_jobs=2)(delayed(np.dot)(a, a.T) for _ in range(3))
print('ok') |
If you get a segfault with the OSX process crash report, please report the stack trace here. |
I close this issue as the original bug has now been fixed by the new numpy wheel package on PyPI. @kevindavenport could you please open a new issue for if you can reproduce the crash with numpy MKL and the code snippet provided in my earlier comment? |
I am still getting this bug with the latest versions of everything on a mac. numpy + mkl breaks. |
mkl, not accelerate? I didn't realize that was a problem, too. |
I'm getting this with Python 2.7.10, Anaconda 2.3, OS X 10.10.5, scikit-learn 16.1 Not sure if this will help: Process: python [1515] Date/Time: 2015-08-24 13:52:01.586 -0500 Time Awake Since Boot: 1500 seconds Crashed Thread: 0 Dispatch queue: com.apple.main-thread Exception Type: EXC_BAD_ACCESS (SIGSEGV) VM Regions Near 0x110: Application Specific Information: Thread 0 Crashed:: Dispatch queue: com.apple.main-thread Thread 0 crashed with X86 Thread State (64-bit): Logical CPU: 6 Binary Images: External Modification Summary: VM Region Summary: REGION TYPE VIRTUAL Model: MacBookPro11,2, BootROM MBP112.0138.B15, 4 processors, Intel Core i7, 2.2 GHz, 16 GB, SMC 2.18f15 |
can you give us from numpy.distutils.system_info import get_info
print(get_info('blas_opt')) |
{'extra_link_args': ['-Wl,-framework', '-Wl,Accelerate'], 'extra_compile_args': ['-msse3', '-DAPPLE_ACCELERATE_SGEMV_PATCH', '-I/System/Library/Frameworks/vecLib.framework/Headers'], 'define_macros': [('NO_ATLAS_INFO', 3)]} |
@ogrisel can you remind me of the details with accelerate? @AWNystrom you are using apple accelerate blas, which is known to not cooperate with multi-processing. Did you install numpy via conda? |
The problem is that multiprocessing does a fork without an exec. Many libraries like (some versions of) Accelerate / vecLib, (some versions of) MKL, the OpenMP runtime of GCC, nvidia's cuda (and probably many others), manage their own internal thread pool. Upon a syscall to fork, the thread pool state in the child process is corrupted: the thread pool things it has many threads while only the main thread state has been forked. It's possible to change the libraries to make them detect when a fork happens and reinitialize the thread pool in that case: we did that for OpenBLAS (merged upstream in master since 0.2.9) and we contributed a patch (not yet reviewed) to GCC's OpenMP runtime. In the end the real culprit is Python's multiprocessing that does fork without exec (to reduce the overhead of starting and using new Python process for parallel computing, it's kind of a hack). This is a violation of the POSIX standard and therefore organizations like Apple refuse to consider the lack of fork-safety in Accelerate / vecLib as a bug. In Python 3.4+ it's now possible to configure multiprocessing to use the 'forkserver' or 'spawn' start methods (instead of the default 'fork') to manage the process pools. This should make it possible to not be subject to this issue anymore. We don't use it by default in joblib because it causes some overhead and would make the default behavior slightly different in Python 2.7 and Python 3.4+. Maybe we should change the default to 'forkserver' under POSIX to have this problem disappear for Python 3.4+ users. |
Thank you for the great explanation. Should we add this / a short version of this to the FAQ? And open a dedicated fork-savety issue to keep track? |
Thanks, Andreas. I have some latent memory of having to change numpy versions to get something to work and doing it via pip. |
Good idea, I am on it. |
In addition to documenting the problem in the scikit-learn FAQ I went ahead and make joblib use the forkserver start method by default under POSIX system with Python 3.4 or later: joblib/joblib#232. Feedback appreciated. |
I first noticed this when running 'make test' hanged. I tried with stable and bleeding edge scipy (I initially thought it was something arpack related).
The test
sklearn.cluster.tests.test_k_means.test_k_means_plus_plus_init_2_jobs
hangs the process.Running in IPython something like
KMeans(init='k-means++', n_jobs=2).fit(np.random.randn(100, 100))
hangs as well.I thought maybe there was something wrong with my setup, but
cross_val_score
works OK withn_jobs=2
.The text was updated successfully, but these errors were encountered: