-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG] Add sparse efficiency warning to randomized_svd (fixes randomized_svd is slow for dok_matrix and lil_matrix #11262) #11264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Add sparse efficiency warning to randomized_svd (fixes randomized_svd is slow for dok_matrix and lil_matrix #11262) #11264
Conversation
lil_matrix is 4x slower than csr_matrix, dok_matrix is ~50x slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR @scottgigante !
Minor comment below, otherwise LGTM.
sklearn/utils/extmath.py
Outdated
@@ -322,6 +321,15 @@ def randomized_svd(M, n_components, n_oversamples=10, n_iter='auto', | |||
# this implementation is a bit faster with smaller shape[1] | |||
M = M.T | |||
|
|||
if isinstance(M, sparse.lil_matrix): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
if isinstance(M, (sparse.lil_matrix, sparse.dok_matrix)):
warnings.warn(("Calculating SVD of a %s is expensive. "
"csr_matrix is more efficient.")
% type(M).__name__))
Thanks for the review @rth ! Unapologetically used the modern |
sklearn/utils/extmath.py
Outdated
if isinstance(M, (sparse.lil_matrix, sparse.dok_matrix)): | ||
warnings.warn("Calculating SVD of a {} is expensive. " | ||
"csr_matrix is more efficient.".format( | ||
type(M).__name__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually adding a unit test under sklearn/utils/tests/test_extmath.py
to ensure this warning is raised would be useful. Also in earlier version this raised SparseEfficiencyWarning
I didn't mean to suggest to remove that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You caught me right in the middle of writing the test! Also my apologies for dropping off the warning type, that's my poor attention span.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
@@ -365,6 +365,21 @@ def test_randomized_svd_power_iteration_normalizer(): | |||
assert_greater(15, np.abs(error_2 - error)) | |||
|
|||
|
|||
def test_randomized_svd_sparse_warnings(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you figure out a way to make the test less time-consuming (e.g., smaller matrix?). Currently, it's among the most time-consuming tests (See Travis log).
Another minor thing (you might ignore and keep current version), will it be better to raise the warning at the beginning of the function? This might makes the code more friendly. |
Thanks for the review, @qinhanmin2014 . I've moved the warning to the start of the function and made the test matrix much smaller. |
Thanks @scottgigante ! |
Reference Issues/PRs
Fixes #11262
What does this implement/fix? Explain your changes.
Adds a
scipy.sparse.SparseEfficiencyWarning
whenrandomized_svd
is run onlil_matrix
ordok_matrix
.Any other comments?
coo_matrix
is about 1.5x slower thancsr_matrix
. Maybe it is worth also warning about this, but that's a judgement call. Also worth considering whether it is better to simply coerce the matrix to a better data type, rather than printing a warning.