[MRG] Add sparse efficiency warning to randomized_svd (fixes randomized_svd is slow for dok_matrix and lil_matrix #11262) #11264

scottgigante · 2018-06-14T18:10:41Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Adds a scipy.sparse.SparseEfficiencyWarning when randomized_svd is run on lil_matrix or dok_matrix.

Any other comments?

coo_matrix is about 1.5x slower than csr_matrix. Maybe it is worth also warning about this, but that's a judgement call. Also worth considering whether it is better to simply coerce the matrix to a better data type, rather than printing a warning.

lil_matrix is 4x slower than csr_matrix, dok_matrix is ~50x slower.

rth

Thanks for this PR @scottgigante !

Minor comment below, otherwise LGTM.

rth · 2018-06-14T18:48:39Z

sklearn/utils/extmath.py

@@ -322,6 +321,15 @@ def randomized_svd(M, n_components, n_oversamples=10, n_iter='auto',
        # this implementation is a bit faster with smaller shape[1]
        M = M.T

+    if isinstance(M, sparse.lil_matrix):


Maybe

if isinstance(M, (sparse.lil_matrix, sparse.dok_matrix)): warnings.warn(("Calculating SVD of a %s is expensive. " "csr_matrix is more efficient.") % type(M).__name__))

scottgigante · 2018-06-14T18:58:30Z

Thanks for the review @rth ! Unapologetically used the modern format method introduced in Python 2.6, but otherwise that change is in ;)

rth · 2018-06-14T22:51:29Z

sklearn/utils/extmath.py

+    if isinstance(M, (sparse.lil_matrix, sparse.dok_matrix)):
+        warnings.warn("Calculating SVD of a {} is expensive. "
+                      "csr_matrix is more efficient.".format(
+                          type(M).__name__))


Actually adding a unit test under sklearn/utils/tests/test_extmath.py to ensure this warning is raised would be useful. Also in earlier version this raised SparseEfficiencyWarning I didn't mean to suggest to remove that.

You caught me right in the middle of writing the test! Also my apologies for dropping off the warning type, that's my poor attention span.

qinhanmin2014

Otherwise LGTM

qinhanmin2014 · 2018-06-16T12:13:24Z

sklearn/utils/tests/test_extmath.py

@@ -365,6 +365,21 @@ def test_randomized_svd_power_iteration_normalizer():
            assert_greater(15, np.abs(error_2 - error))


+def test_randomized_svd_sparse_warnings():


Can you figure out a way to make the test less time-consuming (e.g., smaller matrix?). Currently, it's among the most time-consuming tests (See Travis log).

qinhanmin2014 · 2018-06-16T14:32:59Z

Another minor thing (you might ignore and keep current version), will it be better to raise the warning at the beginning of the function? This might makes the code more friendly.

scottgigante · 2018-06-17T13:33:00Z

Thanks for the review, @qinhanmin2014 . I've moved the warning to the start of the function and made the test matrix much smaller.

rth · 2018-06-17T14:25:40Z

Thanks @scottgigante !

…matrix (scikit-learn#11264)

Add sparse efficiency warning to randomized_svd

cd69c1a

lil_matrix is 4x slower than csr_matrix, dok_matrix is ~50x slower.

rth approved these changes Jun 14, 2018

View reviewed changes

compact sparseefficiencywarning code to address comment from @rth

52d0fc9

scottgigante changed the title ~~Add sparse efficiency warning to randomized_svd (fixes randomized_svd is slow for dok_matrix and lil_matrix #11262)~~ [MRG] Add sparse efficiency warning to randomized_svd (fixes randomized_svd is slow for dok_matrix and lil_matrix #11262) Jun 14, 2018

rth reviewed Jun 14, 2018

View reviewed changes

test for sparseefficiencywarning as suggested by @rth

c36ec92

qinhanmin2014 approved these changes Jun 16, 2018

View reviewed changes

scottgigante added 2 commits June 17, 2018 09:28

move warning to start of randomized_svd function

aaa489c

make test_randomized_svd_sparse_warnings run faster

2442b6f

rth merged commit 56dc374 into scikit-learn:master Jun 17, 2018

georgipeev pushed a commit to georgipeev/scikit-learn that referenced this pull request Jun 20, 2018

Add sparse efficiency warning to randomized_svd for dok_matrix / lil_…

cb5ec0a

…matrix (scikit-learn#11264)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] Add sparse efficiency warning to randomized_svd (fixes randomized_svd is slow for dok_matrix and lil_matrix #11262) #11264

[MRG] Add sparse efficiency warning to randomized_svd (fixes randomized_svd is slow for dok_matrix and lil_matrix #11262) #11264

Uh oh!

scottgigante commented Jun 14, 2018 •

edited

Loading

Uh oh!

rth left a comment •

edited

Loading

Uh oh!

rth Jun 14, 2018

Uh oh!

scottgigante commented Jun 14, 2018

Uh oh!

rth Jun 14, 2018

Uh oh!

scottgigante Jun 14, 2018

Uh oh!

qinhanmin2014 left a comment

Uh oh!

qinhanmin2014 Jun 16, 2018

Uh oh!

qinhanmin2014 commented Jun 16, 2018

Uh oh!

scottgigante commented Jun 17, 2018

Uh oh!

rth commented Jun 17, 2018

Uh oh!

Uh oh!

		@@ -365,6 +365,21 @@ def test_randomized_svd_power_iteration_normalizer():
		assert_greater(15, np.abs(error_2 - error))


		def test_randomized_svd_sparse_warnings():

Uh oh!

[MRG] Add sparse efficiency warning to randomized_svd (fixes randomized_svd is slow for dok_matrix and lil_matrix #11262) #11264

[MRG] Add sparse efficiency warning to randomized_svd (fixes randomized_svd is slow for dok_matrix and lil_matrix #11262) #11264

Uh oh!

Conversation

scottgigante commented Jun 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

rth left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rth Jun 14, 2018

Choose a reason for hiding this comment

Uh oh!

scottgigante commented Jun 14, 2018

Uh oh!

rth Jun 14, 2018

Choose a reason for hiding this comment

Uh oh!

scottgigante Jun 14, 2018

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 Jun 16, 2018

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Jun 16, 2018

Uh oh!

scottgigante commented Jun 17, 2018

Uh oh!

rth commented Jun 17, 2018

Uh oh!

Uh oh!

scottgigante commented Jun 14, 2018 •

edited

Loading

rth left a comment •

edited

Loading