eigen_tol in _spectral_embedding.py does not propagate to solvers other than arpack #21243

lobpcg · 2021-10-05T15:46:00Z

Describe the bug

In _spectral_embedding.py the option

    eigen_tol : float, default=0.0
        Stopping criterion for eigendecomposition of the Laplacian matrix
        when ``eigen_solver='arpack'``.

does not appear to propagate to solvers 'amg' and 'lobpcg', e.g., adding eigen_tol and changing its value in examples/cluster/plot_coin_segmentation.py affects the outcome only if eigen_solver='arpack', which is the default but not for the other two: 'lobpcg' and 'amg'.

Steps/Code to Reproduce

See above

Expected Results

eigen_tol works for all supported solvers

Actual Results

eigen_tol works for `eigen_solver='arpack' only and is ignored otherwise

While at it

Also introduce and pass max_iter param to all eigen solvers, although maybe it should be named eigen_max_iter to avoid confusion.

Versions

System:
python: 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\Name\AppData\Local\Programs\Python\Python37\python.exe
machine: Windows-10-10.0.19041-SP0

Python dependencies:
pip: 21.2.4
setuptools: 58.1.0
sklearn: 1.0
numpy: 1.21.2
scipy: 1.7.1
Cython: 0.29.24
pandas: 1.3.3
matplotlib: 3.4.3
joblib: 1.0.1
threadpoolctl: 2.2.0

The text was updated successfully, but these errors were encountered:

thomasjpfan · 2021-10-06T21:58:45Z

Currently, eigen_tol is different between solvers i.e. logpcg uses 1e-5 and arpack uses 0.0. Should the solvers have different default tolerances?

lobpcg · 2021-10-06T22:30:47Z

The solvers should ideally have the same default accuracy, but arpack and lobpcg likely have different stopping criteria, so it may need some testing to figure out what the default numeric values of tolerances should be to make the accuracy the same.
arpack formally uses 0.0, but mathematically the tolerance cannot be zero - the iterations would never stop due to numeric errors. Thus, arpack probably treats 0.0 as None, i.e., sets up some tolerance >0 internally.
At least amg is just lobpcg with some extra tricks, so tolerances in amg and lobpcg should be the same.
Figuring out and setting default tolerances is a different issue, unrelated to the reported bug; e.g., see FIX: tol in _spectral_embedding.py #21194

thomasjpfan · 2021-10-08T19:44:40Z

To be backward compatible, we can have eigen_tol=None and pass in 0 to arpack, which is machine precision internally. For lobpcg and amg, we continue to pass in 1e-5 when eigen_tol=None. If eigen_tol is a float, we can pass it in directly.

Does this sound reasonable?

lobpcg · 2021-10-08T20:47:34Z

To be backward compatible, we can have eigen_tol=None and pass in 0 to arpack, which is machine precision internally. For lobpcg and amg, we continue to pass in 1e-5 when eigen_tol=None. If eigen_tol is a float, we can pass it in directly.

Does this sound reasonable?

@thomasjpfan Yes.

In a future PR, may be someone should also introduce parameter maxit, pass it to all solvers to overwrite the internal defaults, and set some defaults in spectral_clustering, just like for eigen_tol. That would be a desirable enhancement.

On a side note, I am unsure that 0 in arpack is indeed machine precision internally. But if it is 0, I am confident that it's not a good default for arpack here. E.g., something like 1e-8 seems to work just fine in #21148 plot_coin_segmentation.py while cutting 2x the compute time. But that's also not relevant to the present issue and should be addressed separately.

ogrisel · 2021-10-12T08:54:03Z

It would indeed be interesting to be able to pass a max_iter param to the eigen solvers, although maybe it should be named eigen_max_iter to avoid confusion with any outer-level max_iter loop in the estimator itself.

Having eigen_max_iter exposed would probably better help study the influence of the eigensolver convergence on the outer statistical estimation problem.

On a side note, I am unsure that 0 in arpack is indeed machine precision internally. But if it is 0, I am confident that it's not a good default for arpack here. E.g., something like 1e-8 seems to work just fine in #21148 plot_coin_segmentation.py while cutting 2x the compute time. But that's also not relevant to the present issue and should be addressed separately.

That would deserve more investigation in tracking down the meaning of tol=0 in ARPACK. Also to move forward I like the plan suggested by @thomasjpfan in #21243 (comment) to move backward compat problems out of the way and decouple the discussion of the choice of the default value for the arpack solver from the fix of the original problem reported here (namely eigen_tol propagation for other solvers).

glemaitre · 2021-12-17T16:30:38Z

In addition to spectral_embedding, the same changes need to be done in SpectralEmbedding, spectral_clustering, and SpectralClustering.

The PR already did those changes: #11968
It did most of the backward compatibility things that are discussed here. The tricky one is the max_iter=2000 in the second call to lobpcg.

My concern is about the tests. I don't really know how to trigger some convergence issue to be sure that we probably handle the exception.

lobpcg · 2021-12-17T16:58:01Z

I don't really know how to trigger some convergence issue

@glemaitre The theory tells us that convergence of any eigensolver will be slow for a random initiation if there is a small separation of wanted and unwanted eigenvalues, e.g., finding one, the smallest eigenvalue, in the case where the second smallest is very close. In spectral clustering that would correspond to determining two clusters in the case where there are three or more nearly identical clusters. E.g., make a Gaussian point cloud (a blob) in the positive quadrant in 2D far enough from the origin, then flip the signs of the 2 components of all the points to create symmetric copies of the blob in every quadrant. Now you have 4 identical clusters, due to symmetry, just perturb them slightly and use any distance to generate the graph adjacency matrix. As you try to partition this union of 4 nearly identical clusters into two clusters, where do you partition, vertically or horizontally? This ambiguity should manifest itself in slow convergence in the solver.

I have not tried such an example, but the theory predicts that it should work. Could be actually a good new demo...

lobpcg · 2021-12-17T17:08:03Z

While at it, may be also fix #14713 ?

lobpcg added the Bug: triage label Oct 5, 2021

glemaitre added Enhancement and removed Bug: triage labels Dec 17, 2021

glemaitre mentioned this issue Dec 17, 2021

ENH propagate eigen_tol to all eigen solver #11968

Closed

Micky774 mentioned this issue Apr 24, 2022

ENH propagate eigen_tol to all eigen solver #23210

Merged

jeremiedbb closed this as completed in #23210 Jun 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eigen_tol in _spectral_embedding.py does not propagate to solvers other than arpack #21243

eigen_tol in _spectral_embedding.py does not propagate to solvers other than arpack #21243

lobpcg commented Oct 5, 2021 •

edited

Loading

thomasjpfan commented Oct 6, 2021 •

edited

Loading

lobpcg commented Oct 6, 2021 •

edited

Loading

thomasjpfan commented Oct 8, 2021

lobpcg commented Oct 8, 2021 •

edited

Loading

ogrisel commented Oct 12, 2021

glemaitre commented Dec 17, 2021

lobpcg commented Dec 17, 2021 •

edited

Loading

lobpcg commented Dec 17, 2021

eigen_tol in _spectral_embedding.py does not propagate to solvers other than arpack #21243

eigen_tol in _spectral_embedding.py does not propagate to solvers other than arpack #21243

Comments

lobpcg commented Oct 5, 2021 • edited Loading

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

While at it

Versions

thomasjpfan commented Oct 6, 2021 • edited Loading

lobpcg commented Oct 6, 2021 • edited Loading

thomasjpfan commented Oct 8, 2021

lobpcg commented Oct 8, 2021 • edited Loading

ogrisel commented Oct 12, 2021

glemaitre commented Dec 17, 2021

lobpcg commented Dec 17, 2021 • edited Loading

lobpcg commented Dec 17, 2021

lobpcg commented Oct 5, 2021 •

edited

Loading

thomasjpfan commented Oct 6, 2021 •

edited

Loading

lobpcg commented Oct 6, 2021 •

edited

Loading

lobpcg commented Oct 8, 2021 •

edited

Loading

lobpcg commented Dec 17, 2021 •

edited

Loading