-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX Update randomized SVD benchmark #23373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@lorentzenchr I think this is fine. The benchmark was used at some point to check the optimum values of the parameter. So in this regard, your changes allow coming back to this experiment. I would be +1 for merging. |
Oh, I see that I did not use the right GitHub handle. Sorry @lorentzbao |
@@ -243,6 +240,11 @@ def randomized_range_finder( | |||
# Sample the range of A using by linear projection of Q | |||
# Extract an orthonormal basis | |||
Q, _ = linalg.qr(safe_sparse_dot(A, Q), mode="economic") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm so this changes now the dot product between A and Q will use float64 right, so will use twice more memory.
This seems a too wide-reaching change only to make the benchmark run ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups I completely missed that we changed something extmath.py
.
Sorry for merging this one.
We need to revert it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the bad PR. I have overlooked the downsides of using the f64 matrix. Do you have other ideas on how to fix the benchmark? Thanks for the follow-up FIX in #23421. I should have taken a closer look at the documentation, which clearly indicates large n_iter might cause numeric instability when power_iteration_normalizer='none'. @lesteve @glemaitre.
power_iteration_normalizer : {'auto', 'QR', 'LU', 'none'}, default='auto'
Whether the power iterations are normalized with step-by-step
QR factorization (the slowest but most accurate), 'none'
(the fastest but numerically unstable when `n_iter` is large, e.g.
typically 5 or larger), or 'LU' factorization (numerically stable
but can lose slightly in accuracy). The 'auto' mode applies no
normalization if `n_iter` <= 2 and switches to LU otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, things like that happen! Thanks a lot for this PR, we are now very close to have a working benchmark 😉
This reverts commit 6ab4ebd.
Reference Issues/PRs
Fixes #23262
What does this implement/fix? Explain your changes.
Try to fix #23262. After some debugging, I found the Randomized SVD benchmark is broken because the
float32
matrixQ
will cause numeric overflow. Simply removing the following lines of code could solve the error. I am not familiar with the history and background behind these lines of code. Appreciate any extra information.Also, replace
skip
with0
to match the return value ofhandle_missing_dataset
which returns0
when the dataset is missing.Any other comments?
Especially, the lfw_people dataset seems to fail by using float32 matrix Q.
Appreciate any extra information and comments.