-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX randomized_svd
for complex valued input
#30737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I see that there are failing tests due to |
Not sure how this would play with array API (cc @OmarManzoor on that), maybe @antoinebaker could check the implementation? This would also need a changelog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @clane9
sklearn/utils/extmath.py
Outdated
"""Return the transpose of A, or conjugate transpose for complex input.""" | ||
xp, _ = get_namespace(A, xp=xp) | ||
|
||
return xp.conj(A).T if _iscomplexobj(A, xp=xp) else A.T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks fine with respect to the array api from a general look. However these two functions need to move into the _array_api utils as that is where we are placing all such helper functions.
I'll also run the CUDA CI to ensure there are no errors but in case we want the 'complex' updates to run on a GPU we will need a test which covers the complex case with the array api namespaces and device combinations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @OmarManzoor! I moved the helper functions to _array_api
. I could also add tests of these cases for randomized_svd
, but was not sure if it would be in scope of this PR, since extmath
as a whole does not seem to have tests of Array API compliance.
Also, I considered adding tests for these two new utils in test_array_api
, but decided they were small enough as to not need it. Lmk if you think otherwise and I will add.
Thanks for the comment @adrinjalali! I added a changelog entry. Lmk if you need anything else. |
I'm not sure that we should include this fix. We've never officially supported complex numbers in any estimator or function. I understand that the fix is small but it would open a door that I don't think we want to open and we've rejected several small fixes like this one in the past. In addition, note that the fact that it almost worked is the result of |
Thanks for the comment @jeremiedbb. I think your point is fair. I get that general support for complex valued data across scikit-learn is out of scope. E.g. #10422 related to complex pca. My argument to include the fix is that it is small, it applies only to a relatively low level utility function (i.e. not a major estimator like At the same time, I get that including the fix creates more surface area for testing, and sets a potential precedent for people to request more complex data support later. So I totally get if you all decide not to include it. |
Thanks for the PR @clane9. The implementation seems okay but I agree with @jeremiedbb that we should probably not support complex inputs in |
I also agree, but since |
Ok, sounds good, looks like the consensus is to not include this fix. No worries. My thought then is to close this PR and open a new one with added input validation to reject complex valued data. Thoughts? |
Thanks again @clane9. If you want to, you can still publish |
For sure, I will look into that! Thanks for the pointer @antoinebaker. But ok, going ahead and closing. |
Reference Issues/PRs
Fixes #30736
What does this implement/fix? Explain your changes.
The current implementation of
randomized_svd
accepts complex valued inputs, but the results are incorrect. Replacing transpose with conjugate transpose in the power iteration makes the results now consistent withscipy.linalg.svd
. I also added a test to check consistency withscipy.linalg.svd
on complex inputs.Any other comments?
I feel that a
randomized_svd
function that supports complex inputs is a nice feature. Personally, I would like it so that I can implement complex PCA.