-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH propagate eigen_tol
to all eigen solver
#11968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is not how the default should be done but it gives an idea. |
I ran a few tests using the new code and it now seems to work as I expected. Thanks! |
IIRC, that made pep8 scream
…On Mon, Sep 17, 2018, 12:58 Joel Nothman ***@***.***> wrote:
***@***.**** approved this pull request.
------------------------------
In sklearn/cluster/tests/test_spectral.py
<#11968 (comment)>
:
> @@ -205,3 +205,20 @@ def test_spectral_clustering_with_arpack_amg_solvers():
assert_raises(
ValueError, spectral_clustering,
graph, n_clusters=2, eigen_solver='amg', random_state=0)
+
+
+def test_6489_regression():
+ X = np.ones((50, 50))
+ X[[3, 3, 3, 3, 3, 3, 8, 8, 8, 8, 8, 8, 17, 17, 17, 17, 17,
put a space before that first 3 so things line up
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#11968 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGt-4znJ6bScBSS6Ct4BBTzefQop58dQks5ub4BagaJpZM4WWUeI>
.
|
ping: @rth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a note to the docstring of eigen_tol
saying that for amg
solver, it will be always set to larger or equal to 1e12, and document the behavior of this parameter for lobpcg. Also see comment below. Otherwise LGTM.
@@ -303,6 +305,10 @@ def spectral_embedding(adjacency, n_components=8, eigen_solver=None, | |||
raise ValueError | |||
|
|||
elif eigen_solver == "lobpcg": | |||
|
|||
if eigen_tol == 0: | |||
eigen_tol = 1e-15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be a disadvantage of doing, eigen_tol = max(1e-15, eigen_tol)
same as for amg
? I find it strange that eigen_tol
is not monotonic in this case: e.g. 0
will produce 1e-15
but 1e-16
will produce 1e-16
. Or are there use cases for tolerances below 1e-15? cc @lobpcg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
"amg" solve actually is the same lobpcg, only with an extra option, so indeed, the default eigen_tol should be the same for "amg" and "lobpcg" My recommendation is to set eigen_tol=None in both cases and thus let the lobpcg code to choose its default.
-
1e-15 is way too small for the default eigen_tol. The default eigen_tol in the https://github.com/scipy/scipy/blob/v1.1.0/scipy/sparse/linalg/eigen/lobpcg/lobpcg.py line 307-308 is np.sqrt(1e-15) * n where n is the size of the matrix. The clean fix may be NOT to specify ANY default eigen_tol in "spectral_embedding" but rather let lobpcg to do it, as I suggest above.
-
I do not like in lines 136-138 of in "spectral_embedding":
def spectral_embedding(adjacency, n_components=8, eigen_solver=None,
random_state=None, eigen_tol=0.0,
Why not eigen_tol=None ? As far as I can see, eigen_tol=0.0 makes little sense, while eigen_tol=None would be passed to "lobpcg" and let it choose its default, as I propose. I do not know how "arpack" reacts to eigen_tol=None, however.
- Yes, there may be use cases for tolerances below 1e-15 in general, because the stopping criteria currently used in "lobpcg" is not scale invariant, i.e., finding eigenvectors of the matrix 1e-10*A with the same accuracy as of finding the eigenvectors of the matrix A would require to scale the eigen_tol by the same magnitude, 1e-10. I have not actually tested it, but this is how the code is written. The default eigen_tol in "lobpcg" which is np.sqrt(1e-15) * n implicitly assumes that the matrix A if reasonably scaled.
what's the status of this? |
This PR has been updated taking into account the changes in #13707 that were merged in master. The original issue in #6489 is already fixed in master probably by #13707. But the tolerance in the function call was not propagated to all solvers still. This PR proposes to unify them. The PR changes the default to @lobpcg do you think that using |
maybe it would be also nice to cath if |
propagate eigen_tol is a good idea. There only two solvers involved: arpack and lobpcg. AMG is also lobpcg, just with an extra option. |
I also advocate propagating |
Any updates on this? |
Ok, sorry for misunderstanding.
Am I understanding correctly? |
Yes. And of course unit tests checking that each one actually works for all (both) solvers implemented. |
Take |
@glemaitre sorry, I just saw the label "help wanted", but afterwards I realized you are assignee. I guess you are already working on this... |
eigen_tol
to all eigen solver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some suggestions. It will also require a what's new entry, and some tests as well.
@@ -8,7 +8,7 @@ doctest_optionflags = NORMALIZE_WHITESPACE ELLIPSIS | |||
testpaths = sklearn | |||
addopts = | |||
--doctest-modules | |||
--disable-pytest-warnings | |||
# --disable-pytest-warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# --disable-pytest-warnings | |
--disable-pytest-warnings |
@@ -339,7 +355,13 @@ def spectral_embedding( | |||
X = random_state.randn(laplacian.shape[0], n_components + 1) | |||
X[:, 0] = dd.ravel() | |||
X = X.astype(laplacian.dtype) | |||
_, diffusion_map = lobpcg(laplacian, X, M=M, tol=1.0e-5, largest=False) | |||
# Until scikit-learn minimum scipy dependency <1.4.0 we require high |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Until scikit-learn minimum scipy dependency <1.4.0 we require high | |
# As long as scikit-learn has minimum scipy dependency <1.4.0 we require high |
eigen_tol : float, default=0.0 | ||
Stopping criterion for eigendecomposition of the Laplacian matrix | ||
when using arpack eigen_solver. | ||
eigen_tol : float or None, default=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eigen_tol : float or None, default=None | |
eigen_tol : float, default=None |
eigen_tol : float, default=0.0 | ||
Stopping criterion for eigendecomposition of the Laplacian matrix | ||
when ``eigen_solver='arpack'``. | ||
eigen_tol : float or None, default=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eigen_tol : float or None, default=None | |
eigen_tol : float, default=None |
eigen_tol : float, default=0.0 | ||
Stopping criterion for eigendecomposition of the Laplacian matrix | ||
when using arpack eigen_solver. | ||
eigen_tol : float or None, default=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eigen_tol : float or None, default=None | |
eigen_tol : float, default=None |
@@ -443,6 +470,21 @@ class SpectralEmbedding(BaseEstimator): | |||
to be installed. It can be faster on very large, sparse problems. | |||
If None, then ``'arpack'`` is used. | |||
|
|||
eigen_tol : float or None, default=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eigen_tol : float or None, default=None | |
eigen_tol : float, default=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recently, we have been using "auto" to change behavior depending on another parameter.
@jeremiedbb What do you think of eigen_tol="auto"
?
# Until scikit-learn minimum scipy dependency <1.4.0 we require high | ||
# tolerance as explained in: | ||
# https://github.com/scikit-learn/scikit-learn/pull/13707#discussion_r314028509 | ||
tol = max(1e-5, 1e-5 if eigen_tol is None else eigen_tol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May we update the docstrings to explain this clipping behavior for eigen_tol
+ lobpcg
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an irl discussion with @ogrisel and @glemaitre, we don't think clipping is a good solution here. We should let the user chose the tol he wants. Instead we can document in the parameter description that it's advised to no use a too low tolerance when scipy < 1.4
makes sense |
Is someone still actively working on this PR? @massich are you still involved with this one? |
Reference Issues/PRs
closes #21243
closes #6489
What does this implement/fix? Explain your changes.
Propagate the
eigen_tol
for the 'arpack', 'amg', and 'lobpcg' solvers. By default, we are using the scipy default ('arpack'=>0 otherwise None).Any other comments?