Skip to content

Unstable LogisticRegression with saga solver l1 penalty under macOS #9351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ogrisel opened this issue Jul 13, 2017 · 15 comments · Fixed by #9376
Closed

Unstable LogisticRegression with saga solver l1 penalty under macOS #9351

ogrisel opened this issue Jul 13, 2017 · 15 comments · Fixed by #9376
Milestone

Comments

@ogrisel
Copy link
Member

ogrisel commented Jul 13, 2017

======================================================================
FAIL: sklearn.linear_model.tests.test_logistic.test_logreg_l1_sparse_data
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.6/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.6/site-packages/sklearn/linear_model/tests/test_logistic.py", line 961, in test_logreg_l1_sparse_data
    assert_array_almost_equal(lr_saga.coef_, lr_liblinear.coef_)
  File "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.6/site-packages/numpy/testing/utils.py", line 962, in assert_array_almost_equal
    precision=decimal)
  File "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.6/site-packages/numpy/testing/utils.py", line 778, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 6 decimals
(mismatch 64.0%)
 x: array([[ -128.209824,  -990.760917,     0.      ,     0.      ,
            0.      ,   783.90525 ,  1345.642841,     0.      ,
           34.880916,  -287.813868,  -364.684306,  -668.47532 ,...
 y: array([[-0.764085, -0.29863 ,  0.      ,  0.269297,  0.      ,  0.      ,
         2.629204,  0.      ,  0.      ,  0.      , -0.450364, -0.61554 ,
         0.      , -0.262512,  0.161514,  0.      ,  0.314769,  0.      ,...
======================================================================
FAIL: sklearn.linear_model.tests.test_logistic.test_saga_vs_liblinear
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.6/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.6/site-packages/sklearn/linear_model/tests/test_logistic.py", line 1144, in test_saga_vs_liblinear
    assert_array_almost_equal(saga.coef_, liblinear.coef_, 3)
  File "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.6/site-packages/numpy/testing/utils.py", line 962, in assert_array_almost_equal
    precision=decimal)
  File "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.6/site-packages/numpy/testing/utils.py", line 778, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 3 decimals
(mismatch 100.0%)
 x: array([[ -105.601, -1199.326,  2377.574,   479.408]])
 y: array([[ 0.   , -1.06 ,  1.222,  0.   ]])

numpy and scipy are imported from the following wheels from PyPI:

  • numpy-1.11.3-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (at buil time)

  • numpy-1.13.1-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (at test time)

  • scipy-0.19.1-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl

  • Cython-0.25.2-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl

The full log of the build is here:

https://s3.amazonaws.com/archive.travis-ci.org/jobs/253231280/log.txt?X-Amz-Expires=30&X-Amz-Date=20170713T160513Z&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAJRYRXRSVGNKPKO5A/20170713/us-east-1/s3/aws4_request&X-Amz-SignedHeaders=host&X-Amz-Signature=06e6866cac7abfc10cc67aa061a0e45b8db234c71a5997c67d8cc63828838df2

The configuration scripts of this build and test run can be found here:

https://github.com/MacPython/scikit-learn-wheels

A similar problem happened with Python 2.7 on macOS:

https://travis-ci.org/MacPython/scikit-learn-wheels/jobs/253231277

but not with python 3.4 for some reason:

https://travis-ci.org/MacPython/scikit-learn-wheels/builds/253231260

The same tests pass under Linux and Windows both 64 and 32 bit for all supported versions of Python.

Maybe this is a problem caused by the Apple Accelerate implementation of BLAS.

@ogrisel
Copy link
Member Author

ogrisel commented Jul 13, 2017

It could also be the case that our Cython code for saga is not numerically stable when compiled with clang.

@ogrisel
Copy link
Member Author

ogrisel commented Jul 14, 2017

I have built scikit-learn with clang under linux and I cannot reproduce. Here is the clang version I have:

lang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

so the failure is probably not related to the compiler.

@ogrisel
Copy link
Member Author

ogrisel commented Jul 14, 2017

I built OpenBLAS 0.2.19 from source on my oldish mac and built scikit-learn against it and I got the test_logistic.py to pass. So this is a confirmation that the saga solver tests in test_logistic.py trigger a numerical instability in Accelerate...

Building scikit-learn macOS wheels that embed openblas is possible but not trivial. @matthew-brett I wonder what are the plans for numpy and scipy.

Otherwise we can probably change our setup.py to rely upon the atlas source fragments that we already embed in scikit-learn to ease the windows build.

@ogrisel
Copy link
Member Author

ogrisel commented Jul 14, 2017

Actually I relaunched the tests and the failure is random both with OpenBLAS and Accelerate...

@jnothman
Copy link
Member

I'm generating this error on some runs on my up-to-date Sierra instance with Anaconda (py=3.5.2, np=1.11.3).

@jnothman
Copy link
Member

The randomness of the error and the range of values it can fail with makes me wonder if it's an uninitialised array rather than an overflow; the fit doesn't appear to be running to convergence.

@agramfort
Copy link
Member

agramfort commented Jul 16, 2017 via email

@jnothman
Copy link
Member

jnothman commented Jul 16, 2017

At least one of the failing assertions, line 961 in test_logreg_l1_sparse_data does not have a fixed random state for SAGA. But it still randomly fails with a fixed random state. However, when it fails, so far, the failing coef is identical every time. Suggesting it is not mere lack of initialisation.

@jnothman
Copy link
Member

I'm trying to debug that test with random_state=1 (not that this is my expertise, but I might as well give it a go).

@jnothman
Copy link
Member

Curiously, when I run a truncated test_logreg_l1_sparse_data in a loop (each run of 1000 trials in a separate Python process) I get either:

  • 1000 of 1000 successes
  • roughly 666 of 1000 successes

So some runs fail 1/3 of the time. In these runs, a ConvergenceWarning is raised (but that's no surprise).

@jnothman
Copy link
Member

It seems https://github.com/scikit-learn/scikit-learn/blob/92e38f2/sklearn/linear_model/sag_fast.pyx#L618 is the issue: accessing the -1th element of a C array.

@jnothman
Copy link
Member

So I suppose I was correct in my first assumption that this was random memory access rather than overflow.

@jnothman
Copy link
Member

I hope #9376 is the right (and only) fix.

@jnothman
Copy link
Member

Should Cython have found this for me if I set boundscheck=True? It didn't, but maybe that's just because it was failing to raise the exception with the gil off. Should we be regularly running all our Cython tests with boundscheck enabled and gil fixed to on?

@jnothman
Copy link
Member

I woke up this morning realising why boundscheck wouldn't work. It's a pointer, not an array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants