-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Revert change in sklearn.extmath.util and fix randomized_svd benchmark #23421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
af36339
b016b73
cbf4906
4641cd1
3e2aec8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -107,7 +107,7 @@ | |||||||||
|
||||||||||
# Determine when to switch to batch computation for matrix norms, | ||||||||||
# in case the reconstructed (dense) matrix is too large | ||||||||||
MAX_MEMORY = int(2e9) | ||||||||||
MAX_MEMORY = int(4e9) | ||||||||||
|
||||||||||
# The following datasets can be downloaded manually from: | ||||||||||
# CIFAR 10: https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz | ||||||||||
|
@@ -323,8 +323,11 @@ def norm_diff(A, norm=2, msg=True, random_state=None): | |||||||||
|
||||||||||
|
||||||||||
def scalable_frobenius_norm_discrepancy(X, U, s, V): | ||||||||||
# if the input is not too big, just call scipy | ||||||||||
if X.shape[0] * X.shape[1] < MAX_MEMORY: | ||||||||||
if not sp.sparse.issparse(X) or ( | ||||||||||
X.shape[0] * X.shape[1] * X.dtype.itemsize < MAX_MEMORY | ||||||||||
): | ||||||||||
Comment on lines
+326
to
+328
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Small nit:
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought the same at one point, but There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On closer inspection, the
I'm guessing, we want:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If The comment "Call Scipy" is slightly misleading, I think the potential blocker is creating the dense matrix There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I pushed a commit tweaking the comment and using |
||||||||||
# if the input is not sparse or sparse but not too big, | ||||||||||
# U.dot(np.diag(s).dot(V)) will fit in RAM | ||||||||||
A = X - U.dot(np.diag(s).dot(V)) | ||||||||||
return norm_diff(A, norm="fro") | ||||||||||
|
||||||||||
|
@@ -498,7 +501,7 @@ def bench_c(datasets, n_comps): | |||||||||
if __name__ == "__main__": | ||||||||||
random_state = check_random_state(1234) | ||||||||||
|
||||||||||
power_iter = np.linspace(0, 6, 7, dtype=int) | ||||||||||
power_iter = np.arange(0, 6) | ||||||||||
n_comps = 50 | ||||||||||
|
||||||||||
for dataset_name in datasets: | ||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: despite the name, it was actually a MAX_ELEMENTS before, I now multiply by X.dtype.itemsize to be in terms of memory.