-
-
Notifications
You must be signed in to change notification settings - Fork 26k
Avoid extra copy when using astype in sparsefuncs_fast #11966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @massich !
Waiting for CI..
Actually, the list of files can be ignored right now and we can do it in a subsequent PR |
For a random sparse CSR array of shape (5000, 40000) with a 0.01 sparsity, this reduces the runtime of |
I timed the modified functions following @rth comment and here are the results: import numpy as np
from scipy.sparse import random
from sklearn.utils.sparsefuncs_fast import (
csr_row_norms, csr_mean_variance_axis0,
csc_mean_variance_axis0, incr_mean_variance_axis0)
csr = random(5000, 40000, format='csr')
csc = csr.asformat('csc')
print('csr_row_norms')
%timeit csr_row_norms(csr)
print('csr_mean_variance_axis0')
%timeit csr_mean_variance_axis0(csr)
print('csc_mean_variance_axis0')
%timeit csc_mean_variance_axis0(csc)
print('incr_mean_variance_axis0')
%timeit incr_mean_variance_axis0(csr, np.zeros(40000), np.ones(40000), np.array([1])) Master:
This PR:
|
Reference Issues/PRs
What does this implement/fix? Explain your changes.
It avoids getting an extra copy of the data in the following case:
Files to review:
Any other comments?