FIX Update randomized SVD benchmark #23373

lorentzbao · 2022-05-15T15:02:35Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Try to fix #23262. After some debugging, I found the Randomized SVD benchmark is broken because the float32 matrix Q will cause numeric overflow. Simply removing the following lines of code could solve the error. I am not familiar with the history and background behind these lines of code. Appreciate any extra information.

 if A.dtype.kind == "f":
      # Ensure f32 is preserved as f32
      Q = Q.astype(A.dtype, copy=False)

Also, replace skip with 0 to match the return value of handle_missing_dataset which returns 0 when the dataset is missing.

Any other comments?

Especially, the lfw_people dataset seems to fail by using float32 matrix Q.
Appreciate any extra information and comments.

glemaitre · 2022-05-17T08:44:48Z

@lorentzenchr I think this is fine. The benchmark was used at some point to check the optimum values of the parameter. So in this regard, your changes allow coming back to this experiment.

I would be +1 for merging.

glemaitre · 2022-05-19T08:24:45Z

Oh, I see that I did not use the right GitHub handle. Sorry @lorentzbao

lesteve · 2022-05-19T09:33:33Z

sklearn/utils/extmath.py

@@ -243,6 +240,11 @@ def randomized_range_finder(
    # Sample the range of A using by linear projection of Q
    # Extract an orthonormal basis
    Q, _ = linalg.qr(safe_sparse_dot(A, Q), mode="economic")


Hmmm so this changes now the dot product between A and Q will use float64 right, so will use twice more memory.

This seems a too wide-reaching change only to make the benchmark run ...

Ups I completely missed that we changed something extmath.py.
Sorry for merging this one.

We need to revert it.

Sorry for the bad PR. I have overlooked the downsides of using the f64 matrix. ~~Do you have other ideas on how to fix the benchmark?~~ Thanks for the follow-up FIX in #23421. I should have taken a closer look at the documentation, which clearly indicates large n_iter might cause numeric instability when power_iteration_normalizer='none'. @lesteve @glemaitre.

power_iteration_normalizer : {'auto', 'QR', 'LU', 'none'}, default='auto' Whether the power iterations are normalized with step-by-step QR factorization (the slowest but most accurate), 'none' (the fastest but numerically unstable when `n_iter` is large, e.g. typically 5 or larger), or 'LU' factorization (numerically stable but can lose slightly in accuracy). The 'auto' mode applies no normalization if `n_iter` <= 2 and switches to LU otherwise.

No worries, things like that happen! Thanks a lot for this PR, we are now very close to have a working benchmark 😉

This reverts commit 6ab4ebd.

fix benchmark randonmizd svd

06e9171

github-actions bot added the module:utils label May 15, 2022

lorentzbao marked this pull request as draft May 15, 2022 15:53

addback type casting

0c9d48a

glemaitre approved these changes May 17, 2022

View reviewed changes

lorentzbao marked this pull request as ready for review May 17, 2022 10:51

glemaitre changed the title ~~Fix: Randomized SVD benchmark is broken~~ FIX Update randomized SVD benchmark May 19, 2022

glemaitre merged commit 6ab4ebd into scikit-learn:main May 19, 2022

lesteve reviewed May 19, 2022

View reviewed changes

glemaitre added a commit that referenced this pull request May 19, 2022

Revert "FIX Update randomized SVD benchmark (#23373)"

73e56f5

This reverts commit 6ab4ebd.

glemaitre mentioned this pull request May 19, 2022

Revert "FIX Update randomized SVD benchmark" #23418

Closed

lesteve mentioned this pull request May 19, 2022

Revert change in sklearn.extmath.util and fix randomized_svd benchmark #23421

Merged

lesteve pushed a commit to lesteve/scikit-learn that referenced this pull request May 19, 2022

FIX Update randomized SVD benchmark (scikit-learn#23373)

de87490

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Aug 4, 2022

FIX Update randomized SVD benchmark (scikit-learn#23373)

d18be5b

glemaitre pushed a commit that referenced this pull request Aug 5, 2022

FIX Update randomized SVD benchmark (#23373)

2b7a272

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX Update randomized SVD benchmark #23373

FIX Update randomized SVD benchmark #23373

lorentzbao commented May 15, 2022 •

edited

Loading

glemaitre commented May 17, 2022

glemaitre commented May 19, 2022

lesteve May 19, 2022

glemaitre May 19, 2022

lorentzbao May 20, 2022 •

edited

Loading

lesteve May 20, 2022

FIX Update randomized SVD benchmark #23373

FIX Update randomized SVD benchmark #23373

Conversation

lorentzbao commented May 15, 2022 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

glemaitre commented May 17, 2022

glemaitre commented May 19, 2022

lesteve May 19, 2022

Choose a reason for hiding this comment

glemaitre May 19, 2022

Choose a reason for hiding this comment

lorentzbao May 20, 2022 • edited Loading

Choose a reason for hiding this comment

lesteve May 20, 2022

Choose a reason for hiding this comment

lorentzbao commented May 15, 2022 •

edited

Loading

lorentzbao May 20, 2022 •

edited

Loading