Skip to content

FIX randomized_svd for complex valued input #30737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

clane9
Copy link
Contributor

@clane9 clane9 commented Jan 30, 2025

Reference Issues/PRs

Fixes #30736

What does this implement/fix? Explain your changes.

The current implementation of randomized_svd accepts complex valued inputs, but the results are incorrect. Replacing transpose with conjugate transpose in the power iteration makes the results now consistent with scipy.linalg.svd. I also added a test to check consistency with scipy.linalg.svd on complex inputs.

Any other comments?

I feel that a randomized_svd function that supports complex inputs is a nice feature. Personally, I would like it so that I can implement complex PCA.

Copy link

github-actions bot commented Jan 30, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: c40b721. Link to the linter CI: here

@clane9
Copy link
Contributor Author

clane9 commented Jan 30, 2025

I see that there are failing tests due to .conj() not being part of the array API. Could possibly fix this by replacing with xp.conj().

@adrinjalali
Copy link
Member

adrinjalali commented Feb 4, 2025

Not sure how this would play with array API (cc @OmarManzoor on that), maybe @antoinebaker could check the implementation?

This would also need a changelog.

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @clane9

"""Return the transpose of A, or conjugate transpose for complex input."""
xp, _ = get_namespace(A, xp=xp)

return xp.conj(A).T if _iscomplexobj(A, xp=xp) else A.T
Copy link
Contributor

@OmarManzoor OmarManzoor Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks fine with respect to the array api from a general look. However these two functions need to move into the _array_api utils as that is where we are placing all such helper functions.
I'll also run the CUDA CI to ensure there are no errors but in case we want the 'complex' updates to run on a GPU we will need a test which covers the complex case with the array api namespaces and device combinations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @OmarManzoor! I moved the helper functions to _array_api. I could also add tests of these cases for randomized_svd, but was not sure if it would be in scope of this PR, since extmath as a whole does not seem to have tests of Array API compliance.

Also, I considered adding tests for these two new utils in test_array_api, but decided they were small enough as to not need it. Lmk if you think otherwise and I will add.

@clane9
Copy link
Contributor Author

clane9 commented Feb 4, 2025

Thanks for the comment @adrinjalali! I added a changelog entry. Lmk if you need anything else.

@jeremiedbb
Copy link
Member

I'm not sure that we should include this fix. We've never officially supported complex numbers in any estimator or function. I understand that the fix is small but it would open a door that I don't think we want to open and we've rejected several small fixes like this one in the past.

In addition, note that the fact that it almost worked is the result of randomized_svd not properly validating its input. It's not using check_array as it should imo, and if it did, the function would error early on complex input.

@clane9
Copy link
Contributor Author

clane9 commented Feb 4, 2025

Thanks for the comment @jeremiedbb. I think your point is fair. I get that general support for complex valued data across scikit-learn is out of scope. E.g. #10422 related to complex pca. My argument to include the fix is that it is small, it applies only to a relatively low level utility function (i.e. not a major estimator like PCA), and it makes the behavior of the function more consistent with the standard reference scipy.linalg.svd. Also, I intentionally did not edit the docs to advertise the complex data support, to reduce the risk of expanding the scope more in the future.

At the same time, I get that including the fix creates more surface area for testing, and sets a potential precedent for people to request more complex data support later. So I totally get if you all decide not to include it.

@antoinebaker
Copy link
Contributor

Thanks for the PR @clane9.

The implementation seems okay but I agree with @jeremiedbb that we should probably not support complex inputs in randomized_svd, perhaps raise an error and update the docstring.

@ogrisel
Copy link
Member

ogrisel commented Feb 5, 2025

I also agree, but since randomized_svd is part of the public API, we should do proper input validation to reject this kind of data explicitly (while also introducing a private version to avoid double input validation).

@clane9
Copy link
Contributor Author

clane9 commented Feb 11, 2025

Ok, sounds good, looks like the consensus is to not include this fix. No worries. My thought then is to close this PR and open a new one with added input validation to reject complex valued data. Thoughts?

@antoinebaker
Copy link
Contributor

Thanks again @clane9. If you want to, you can still publish ComplexPCA and other estimators with complex input support as a scikit-learn-contrib project.

@clane9
Copy link
Contributor Author

clane9 commented Feb 12, 2025

For sure, I will look into that! Thanks for the pointer @antoinebaker. But ok, going ahead and closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

randomized_svd incorrect for complex valued matrices
6 participants