-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Unstable LogisticRegression with saga solver l1 penalty under macOS #9351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It could also be the case that our Cython code for saga is not numerically stable when compiled with clang. |
I have built scikit-learn with clang under linux and I cannot reproduce. Here is the clang version I have:
so the failure is probably not related to the compiler. |
Building scikit-learn macOS wheels that embed openblas is possible but not trivial. @matthew-brett I wonder what are the plans for numpy and scipy. Otherwise we can probably change our setup.py to rely upon the atlas source fragments that we already embed in scikit-learn to ease the windows build. |
Actually I relaunched the tests and the failure is random both with OpenBLAS and Accelerate... |
I'm generating this error on some runs on my up-to-date Sierra instance with Anaconda (py=3.5.2, np=1.11.3). |
The randomness of the error and the range of values it can fail with makes me wonder if it's an uninitialised array rather than an overflow; the fit doesn't appear to be running to convergence. |
I can also replicate on my mac.
|
At least one of the failing assertions, line 961 in |
I'm trying to debug that test with random_state=1 (not that this is my expertise, but I might as well give it a go). |
Curiously, when I run a truncated
So some runs fail 1/3 of the time. In these runs, a ConvergenceWarning is raised (but that's no surprise). |
It seems https://github.com/scikit-learn/scikit-learn/blob/92e38f2/sklearn/linear_model/sag_fast.pyx#L618 is the issue: accessing the -1th element of a C array. |
So I suppose I was correct in my first assumption that this was random memory access rather than overflow. |
I hope #9376 is the right (and only) fix. |
Should Cython have found this for me if I set boundscheck=True? It didn't, but maybe that's just because it was failing to raise the exception with the gil off. Should we be regularly running all our Cython tests with boundscheck enabled and gil fixed to on? |
I woke up this morning realising why boundscheck wouldn't work. It's a pointer, not an array. |
numpy and scipy are imported from the following wheels from PyPI:
numpy-1.11.3-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (at buil time)
numpy-1.13.1-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (at test time)
scipy-0.19.1-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Cython-0.25.2-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
The full log of the build is here:
https://s3.amazonaws.com/archive.travis-ci.org/jobs/253231280/log.txt?X-Amz-Expires=30&X-Amz-Date=20170713T160513Z&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAJRYRXRSVGNKPKO5A/20170713/us-east-1/s3/aws4_request&X-Amz-SignedHeaders=host&X-Amz-Signature=06e6866cac7abfc10cc67aa061a0e45b8db234c71a5997c67d8cc63828838df2
The configuration scripts of this build and test run can be found here:
https://github.com/MacPython/scikit-learn-wheels
A similar problem happened with Python 2.7 on macOS:
https://travis-ci.org/MacPython/scikit-learn-wheels/jobs/253231277
but not with python 3.4 for some reason:
https://travis-ci.org/MacPython/scikit-learn-wheels/builds/253231260
The same tests pass under Linux and Windows both 64 and 32 bit for all supported versions of Python.
Maybe this is a problem caused by the Apple Accelerate implementation of BLAS.
The text was updated successfully, but these errors were encountered: