[WIP] No longer use SVD for fastica #11860

pierreablin · 2018-08-20T15:26:15Z

In most use cases, ICA is performed on arrays with p << n . In this case, using linalg.svd can be order of magnitude slower than just computing the eigen decompostion of X.dot(X.T):

In [1]: import numpy as np

In [2]: X = np.random.randn(10, 1000)

In [3]: %timeit np.linalg.svd(X)
36.9 ms ± 1.39 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [4]: %timeit np.linalg.eigh(X.dot(X.T))
73.7 µs ± 4.25 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

And then, SVD becomes the bottleneck of ICA!

Now, this PR breaks one test because of numerical issues (where the problem is degenerate), and I don't really know how to fix this. I do not think that this problem is likely to happen on any real case application though.

(Note that does not slow things down when p and n are similar, and is only slower when p is much larger than n. This hardly ever occurs in practice.)

amueller · 2018-08-20T22:10:32Z

It might make sense to decide how to solve depending on p / n, right?

pierreablin · 2018-08-20T22:23:52Z

Yes this is correct. But I think that the region boundary is hard to perfectly determine.

There are actually quite a lot of calls to linalg.svd in scikit-learn where you do not need the three outputs, but I don't know if it's the same story as for ICA where it is in 99% of the cases much cheaper to simply do eigh(X.dot(X.T)).

amueller · 2018-08-20T22:26:03Z

It's possible to at least roughly determine a good cut-off for the heuristic, while saying "people usually use this with n>>p" is a hard to support statement and we'd do a disservice to everybody using it in a different setting, even if that's a small group.

pierreablin · 2018-08-21T09:25:20Z

Ok, I have pushed a new commit where the heuristic is simply n < p. On my machine eigh is faster by a factor ~2 when n=p.

PS: Just figured out that what I called nin the above conversation is actually p in the code and vice versa.

pierreablin · 2018-08-22T12:54:40Z

After discussing with @agramfort , I have rather added an option to choose different solvers for svd. Indeed, linalg.svd is numerically more stable than the eighmethod, so it might be a problem when the input data are degenerate.

Here is a simple example to convince oneself that it accelerates fastica:

import numpy as np
from sklearn.decomposition import fastica


p, n = 100, 50000  # Typical EEG size problem
n_components = 10


S = np.random.laplace(size=(p, n))  # Generate sources
A = np.random.randn(p, p)  # Generate mixing matrix
X = np.dot(A, S).T  # Generate signals

%timeit fastica(X, n_components=n_components, svd_solver='svd')
1.02 s ± 55.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit fastica(X, n_components=n_components, svd_solver='eigh')
191 ms ± 68.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

agramfort · 2018-08-22T16:00:41Z

sklearn/decomposition/fastica_.py

@@ -203,6 +203,11 @@ def my_g(x):
        Initial un-mixing array of dimension (n.comp,n.comp).
        If None (default) then an array of normal r.v.'s is used.

+    svd_solver : str, optional


I would call this whitening_solver

agramfort · 2018-08-22T16:01:43Z

sklearn/decomposition/fastica_.py

-        u, d, _ = linalg.svd(X, full_matrices=False)
-
-        del _
+        if svd_solver == 'eigh' and n < p:


with n < p you change the behavior of existing code. Does it affect any test?

agramfort · 2018-08-22T16:02:07Z

sklearn/decomposition/tests/test_fastica.py

+    _, _, sources_fun = fastica(m.T, fun=nl, algorithm=algo, random_state=0,
+                                svd_solver='eigh')
+    ica = FastICA(fun=nl, algorithm=algo, random_state=0,
+                  svd_solver='eigh')


is the svd method still tested?

you should loop over solver options

Micky774 · 2022-02-18T01:46:30Z

Continuing in #22527 if anyone is interested!

update pca

d55e9af

change algorithm depending on n

ba1cec1

added a choice between solvers for svd

7903b6c

fix docstring

c5272ec

agramfort reviewed Aug 22, 2018

View reviewed changes

amueller added Performance Stalled labels Aug 5, 2019

ivirshup mentioned this pull request Nov 30, 2019

[WIP] ICA scverse/scanpy#941

Open

5 tasks

github-actions bot added the module:decomposition label Mar 2, 2020

Base automatically changed from master to main January 22, 2021 10:50

Micky774 mentioned this pull request Feb 18, 2022

ENH Add eigh solver to FastICA #22527

Merged

cmarmo added Superseded PR has been replace by a newer PR and removed Stalled labels Mar 10, 2022

lesteve closed this Mar 11, 2022

Micky774 mentioned this pull request Apr 16, 2022

Use SVD of covariance matrix instead of X if #features < #data #14291

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP] No longer use SVD for fastica #11860

[WIP] No longer use SVD for fastica #11860

Uh oh!

pierreablin commented Aug 20, 2018 •

edited

Loading

Uh oh!

amueller commented Aug 20, 2018

Uh oh!

pierreablin commented Aug 20, 2018

Uh oh!

amueller commented Aug 20, 2018

Uh oh!

pierreablin commented Aug 21, 2018 •

edited

Loading

Uh oh!

pierreablin commented Aug 22, 2018 •

edited

Loading

Uh oh!

agramfort Aug 22, 2018

Uh oh!

agramfort Aug 22, 2018

Uh oh!

agramfort Aug 22, 2018

Uh oh!

Micky774 commented Feb 18, 2022

Uh oh!

Uh oh!

Uh oh!

[WIP] No longer use SVD for fastica #11860

[WIP] No longer use SVD for fastica #11860

Uh oh!

Conversation

pierreablin commented Aug 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Aug 20, 2018

Uh oh!

pierreablin commented Aug 20, 2018

Uh oh!

amueller commented Aug 20, 2018

Uh oh!

pierreablin commented Aug 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pierreablin commented Aug 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agramfort Aug 22, 2018

Choose a reason for hiding this comment

Uh oh!

agramfort Aug 22, 2018

Choose a reason for hiding this comment

Uh oh!

agramfort Aug 22, 2018

Choose a reason for hiding this comment

Uh oh!

Micky774 commented Feb 18, 2022

Uh oh!

Uh oh!

pierreablin commented Aug 20, 2018 •

edited

Loading

pierreablin commented Aug 21, 2018 •

edited

Loading

pierreablin commented Aug 22, 2018 •

edited

Loading