Unstable LogisticRegression with saga solver l1 penalty under macOS #9351

ogrisel · 2017-07-13T16:22:41Z

======================================================================
FAIL: sklearn.linear_model.tests.test_logistic.test_logreg_l1_sparse_data
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.6/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.6/site-packages/sklearn/linear_model/tests/test_logistic.py", line 961, in test_logreg_l1_sparse_data
    assert_array_almost_equal(lr_saga.coef_, lr_liblinear.coef_)
  File "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.6/site-packages/numpy/testing/utils.py", line 962, in assert_array_almost_equal
    precision=decimal)
  File "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.6/site-packages/numpy/testing/utils.py", line 778, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 6 decimals
(mismatch 64.0%)
 x: array([[ -128.209824,  -990.760917,     0.      ,     0.      ,
            0.      ,   783.90525 ,  1345.642841,     0.      ,
           34.880916,  -287.813868,  -364.684306,  -668.47532 ,...
 y: array([[-0.764085, -0.29863 ,  0.      ,  0.269297,  0.      ,  0.      ,
         2.629204,  0.      ,  0.      ,  0.      , -0.450364, -0.61554 ,
         0.      , -0.262512,  0.161514,  0.      ,  0.314769,  0.      ,...
======================================================================
FAIL: sklearn.linear_model.tests.test_logistic.test_saga_vs_liblinear
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.6/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.6/site-packages/sklearn/linear_model/tests/test_logistic.py", line 1144, in test_saga_vs_liblinear
    assert_array_almost_equal(saga.coef_, liblinear.coef_, 3)
  File "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.6/site-packages/numpy/testing/utils.py", line 962, in assert_array_almost_equal
    precision=decimal)
  File "/Users/travis/build/MacPython/scikit-learn-wheels/venv/lib/python3.6/site-packages/numpy/testing/utils.py", line 778, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 3 decimals
(mismatch 100.0%)
 x: array([[ -105.601, -1199.326,  2377.574,   479.408]])
 y: array([[ 0.   , -1.06 ,  1.222,  0.   ]])

numpy and scipy are imported from the following wheels from PyPI:

numpy-1.11.3-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (at buil time)
numpy-1.13.1-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (at test time)
scipy-0.19.1-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Cython-0.25.2-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl

The full log of the build is here:

https://s3.amazonaws.com/archive.travis-ci.org/jobs/253231280/log.txt?X-Amz-Expires=30&X-Amz-Date=20170713T160513Z&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAJRYRXRSVGNKPKO5A/20170713/us-east-1/s3/aws4_request&X-Amz-SignedHeaders=host&X-Amz-Signature=06e6866cac7abfc10cc67aa061a0e45b8db234c71a5997c67d8cc63828838df2

The configuration scripts of this build and test run can be found here:

https://github.com/MacPython/scikit-learn-wheels

A similar problem happened with Python 2.7 on macOS:

https://travis-ci.org/MacPython/scikit-learn-wheels/jobs/253231277

but not with python 3.4 for some reason:

https://travis-ci.org/MacPython/scikit-learn-wheels/builds/253231260

The same tests pass under Linux and Windows both 64 and 32 bit for all supported versions of Python.

Maybe this is a problem caused by the Apple Accelerate implementation of BLAS.

The text was updated successfully, but these errors were encountered:

ogrisel · 2017-07-13T16:24:05Z

It could also be the case that our Cython code for saga is not numerically stable when compiled with clang.

ogrisel · 2017-07-14T16:47:57Z

I have built scikit-learn with clang under linux and I cannot reproduce. Here is the clang version I have:

lang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

so the failure is probably not related to the compiler.

ogrisel · 2017-07-14T17:29:53Z

I built OpenBLAS 0.2.19 from source on my oldish mac and built scikit-learn against it and I got the test_logistic.py to pass. So this is a confirmation that the saga solver tests in test_logistic.py trigger a numerical instability in Accelerate...

Building scikit-learn macOS wheels that embed openblas is possible but not trivial. @matthew-brett I wonder what are the plans for numpy and scipy.

Otherwise we can probably change our setup.py to rely upon the atlas source fragments that we already embed in scikit-learn to ease the windows build.

ogrisel · 2017-07-14T18:27:12Z

Actually I relaunched the tests and the failure is random both with OpenBLAS and Accelerate...

jnothman · 2017-07-16T12:11:17Z

I'm generating this error on some runs on my up-to-date Sierra instance with Anaconda (py=3.5.2, np=1.11.3).

jnothman · 2017-07-16T12:12:48Z

The randomness of the error and the range of values it can fail with makes me wonder if it's an uninitialised array rather than an overflow; the fit doesn't appear to be running to convergence.

agramfort · 2017-07-16T12:34:20Z

I can also replicate on my mac.

jnothman · 2017-07-16T12:37:40Z

At least one of the failing assertions, line 961 in test_logreg_l1_sparse_data does not have a fixed random state for SAGA. But it still randomly fails with a fixed random state. However, when it fails, so far, the failing coef is identical every time. Suggesting it is not mere lack of initialisation.

jnothman · 2017-07-16T12:38:51Z

I'm trying to debug that test with random_state=1 (not that this is my expertise, but I might as well give it a go).

jnothman · 2017-07-16T12:51:08Z

Curiously, when I run a truncated test_logreg_l1_sparse_data in a loop (each run of 1000 trials in a separate Python process) I get either:

1000 of 1000 successes
roughly 666 of 1000 successes

So some runs fail 1/3 of the time. In these runs, a ConvergenceWarning is raised (but that's no surprise).

jnothman · 2017-07-16T13:39:34Z

It seems https://github.com/scikit-learn/scikit-learn/blob/92e38f2/sklearn/linear_model/sag_fast.pyx#L618 is the issue: accessing the -1th element of a C array.

jnothman · 2017-07-16T13:40:40Z

So I suppose I was correct in my first assumption that this was random memory access rather than overflow.

jnothman · 2017-07-16T13:50:30Z

I hope #9376 is the right (and only) fix.

jnothman · 2017-07-16T13:53:21Z

Should Cython have found this for me if I set boundscheck=True? It didn't, but maybe that's just because it was failing to raise the exception with the gil off. Should we be regularly running all our Cython tests with boundscheck enabled and gil fixed to on?

jnothman · 2017-07-16T21:25:18Z

I woke up this morning realising why boundscheck wouldn't work. It's a pointer, not an array.

ogrisel added Blocker Bug Need Contributor labels Jul 13, 2017

ogrisel added this to the 0.19 milestone Jul 13, 2017

jnothman mentioned this issue Jul 16, 2017

[MRG+1] FIX out of bounds array access in SAGA #9376

Merged

jnothman removed the Need Contributor label Jul 16, 2017

ogrisel closed this as completed in #9376 Jul 17, 2017

TomDLT mentioned this issue Jul 17, 2017

Flaky tests in logistic regression #8879

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unstable LogisticRegression with saga solver l1 penalty under macOS #9351

Unstable LogisticRegression with saga solver l1 penalty under macOS #9351

ogrisel commented Jul 13, 2017

ogrisel commented Jul 13, 2017

ogrisel commented Jul 14, 2017

ogrisel commented Jul 14, 2017 •

edited

Loading

ogrisel commented Jul 14, 2017

jnothman commented Jul 16, 2017

jnothman commented Jul 16, 2017

agramfort commented Jul 16, 2017 via email

jnothman commented Jul 16, 2017 •

edited

Loading

jnothman commented Jul 16, 2017

jnothman commented Jul 16, 2017

jnothman commented Jul 16, 2017

jnothman commented Jul 16, 2017

jnothman commented Jul 16, 2017

jnothman commented Jul 16, 2017

jnothman commented Jul 16, 2017

Unstable LogisticRegression with saga solver l1 penalty under macOS #9351

Unstable LogisticRegression with saga solver l1 penalty under macOS #9351

Comments

ogrisel commented Jul 13, 2017

ogrisel commented Jul 13, 2017

ogrisel commented Jul 14, 2017

ogrisel commented Jul 14, 2017 • edited Loading

ogrisel commented Jul 14, 2017

jnothman commented Jul 16, 2017

jnothman commented Jul 16, 2017

agramfort commented Jul 16, 2017 via email

jnothman commented Jul 16, 2017 • edited Loading

jnothman commented Jul 16, 2017

jnothman commented Jul 16, 2017

jnothman commented Jul 16, 2017

jnothman commented Jul 16, 2017

jnothman commented Jul 16, 2017

jnothman commented Jul 16, 2017

jnothman commented Jul 16, 2017

ogrisel commented Jul 14, 2017 •

edited

Loading

jnothman commented Jul 16, 2017 •

edited

Loading