Skip to content

test_fit_csr_matrix failing on Linux py35_conda_openblas #14168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jnothman opened this issue Jun 23, 2019 · 4 comments · Fixed by #14171 or #15158
Closed

test_fit_csr_matrix failing on Linux py35_conda_openblas #14168

jnothman opened this issue Jun 23, 2019 · 4 comments · Fixed by #14171 or #15158

Comments

@jnothman
Copy link
Member

Azure is choking on this tSNE test. I don't think we've changed anything here recently.

_____________________________ test_fit_csr_matrix ______________________________

    def test_fit_csr_matrix():
        # X can be a sparse matrix.
        random_state = check_random_state(0)
        X = random_state.randn(50, 2)
        X[(np.random.randint(0, 50, 25), np.random.randint(0, 2, 25))] = 0.0
        X_csr = sp.csr_matrix(X)
        tsne = TSNE(n_components=2, perplexity=10, learning_rate=100.0,
                    random_state=0, method='exact', n_iter=500)
        X_embedded = tsne.fit_transform(X_csr)
        assert_almost_equal(trustworthiness(X_csr, X_embedded, n_neighbors=1), 1.0,
>                           decimal=1)
E       AssertionError: 
E       Arrays are not almost equal to 1 decimals
E        ACTUAL: 0.93583333333333329
E        DESIRED: 1.0

X          = array([[ 0.        ,  0.        ],
       [ 0.97873798,  0.        ],
       [ 1.86755799, -0.97727788],
       [ 0.95... 0.97663904],
       [ 0.3563664 ,  0.70657317],
       [ 0.01050002,  1.78587049],
       [ 0.12691209,  0.40198936]])
X_csr      = <50x2 sparse matrix of type '<class 'numpy.float64'>'
	with 78 stored elements in Compressed Sparse Row format>
X_embedded = array([[ -28.21473122,   24.69126129],
       [ -43.97845459,  -39.56318283],
       [ -82.35324097,  -55.03578568],
 ...44132996,   48.34313583],
       [  98.00850677,   70.21777344],
       [  17.60460854,   52.21545029]], dtype=float32)
random_state = <mtrand.RandomState object at 0x7f0d05283ea0>
tsne       = TSNE(angle=0.5, early_exaggeration=12.0, init='random', learning_rate=100.0,
     method='exact', metric='euclidean', ...orm=1e-07, n_components=2,
     n_iter=500, n_iter_without_progress=300, perplexity=10, random_state=0,
     verbose=0)

../1/s/sklearn/manifold/tests/test_t_sne.py:280: AssertionError

Versions:


Package         Version  
--------------- ---------
atomicwrites    1.3.0    
attrs           19.1.0   
certifi         2018.8.24
chardet         3.0.4    
codecov         2.0.15   
coverage        4.5.1    
cycler          0.10.0   
Cython          0.28.5   
idna            2.8      
joblib          0.12.3   
matplotlib      1.5.1    
more-itertools  4.3.0    
numpy           1.11.0   
olefile         0.46     
pathlib2        2.3.2    
Pillow          4.0.0    
pip             10.0.1   
pluggy          0.11.0   
py              1.6.0    
pyparsing       2.4.0    
pytest          3.8.1    
pytest-cov      2.7.1    
python-dateutil 2.7.3    
pytz            2019.1   
requests        2.22.0   
scipy           0.17.0   
setuptools      40.2.0   
six             1.11.0   
urllib3         1.25.3   
wheel           0.31.1 
@rth
Copy link
Member

rth commented Jun 23, 2019

My fault, looks like the change in #14136 was a bit too brittle,

def test_fit_csr_matrix():
    # X can be a sparse matrix.
    random_state = check_random_state(0)
-    X = random_state.randn(100, 2)
-    X[(np.random.randint(0, 100, 50), np.random.randint(0, 2, 50))] = 0.0
+    X = random_state.randn(50, 2)
+    X[(np.random.randint(0, 50, 25), np.random.randint(0, 2, 25))] = 0.0
    X_csr = sp.csr_matrix(X)
    tsne = TSNE(n_components=2, perplexity=10, learning_rate=100.0,
-                random_state=0, method='exact')
+                random_state=0, method='exact', n_iter=500)

that's the unfortunate risk that goes with tests optimizations.. Will make a PR.

@rth
Copy link
Member

rth commented Jun 23, 2019

It is interesting that it doesn't seem to be fully deterministic as well.

@qinhanmin2014
Copy link
Member

ping @rth @thomasjpfan
https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=5363

_____________________________ test_fit_csr_matrix ______________________________

    def test_fit_csr_matrix():
        # X can be a sparse matrix.
        random_state = check_random_state(0)
        X = random_state.randn(50, 2)
        X[(np.random.randint(0, 50, 25), np.random.randint(0, 2, 25))] = 0.0
        X_csr = sp.csr_matrix(X)
        tsne = TSNE(n_components=2, perplexity=10, learning_rate=100.0,
                    random_state=0, method='exact', n_iter=750)
        X_embedded = tsne.fit_transform(X_csr)
        assert_allclose(trustworthiness(X_csr, X_embedded, n_neighbors=1),
>                       1.0, rtol=1.1e-1)
E       AssertionError: 
E       Not equal to tolerance rtol=0.11, atol=0
E       
E       (mismatch 100.0%)
E        x: array(0.8895833333333333)
E        y: array(1.0)

X          = array([[ 1.76405235,  0.40015721],
       [ 0.97873798,  2.2408932 ],
       [ 0.        , -0.97727788],
       [ 0.95... 0.        ],
       [ 0.        ,  0.        ],
       [ 0.01050002,  1.78587049],
       [ 0.12691209,  0.40198936]])
X_csr      = <50x2 sparse matrix of type '<class 'numpy.float64'>'
	with 77 stored elements in Compressed Sparse Row format>
X_embedded = array([[  9.97755432e+01,  -2.06562851e+02],
       [  1.56048294e+02,   1.87234314e+02],
       [ -1.60145798e+02,  -...1466187e+02],
       [ -1.12968300e+02,   2.12564621e+02],
       [  1.31453125e+02,   3.10321541e+01]], dtype=float32)
random_state = <mtrand.RandomState object at 0x7f2f2bc9f168>
tsne       = TSNE(angle=0.5, early_exaggeration=12.0, init='random', learning_rate=100.0,
     method='exact', metric='euclidean', ...orm=1e-07, n_components=2,
     n_iter=750, n_iter_without_progress=300, perplexity=10, random_state=0,
     verbose=0)

../1/s/sklearn/manifold/tests/test_t_sne.py:276: AssertionError

rtol=0.11? that's surprising.

@rth
Copy link
Member

rth commented Oct 7, 2019

rtol=0.11? that's surprising.

Yeah, it's a lot. It was there before.

When I run this locally in Py3.7 and latest numpy I get rtol=2e-2 needed.

I'm not quite sure where why this would fail with py35_conda_openblas. Note that it hasn't failed for a while, and just now we got another failure on master.

I don't this that increasing tolerance further is a solution. Increasing the data size might be one, or maybe we could side-step this by dropping python 3.5 in #15106

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants