-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Adress decomposition.PCA mle option problem #16224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
You can check why codecov is complaining; |
I'm a bit stuck due to my limited knowledge of the linear algebra behind this. Two questions occur to me:
I appreciate any input :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good
I can confirm that the following new tests fail at master:
- test_infer_dim_bad_spec
- test_assess_dimension_same_n_rank_and_features
- test_assess_dimension_small_eigenvalues
- test_infer_dim_mle
|
||
|
||
def test_assess_dimension_same_n_rank_and_features(): | ||
# Test that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please finish this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an entry to the change log at doc/whats_new/v0.23.rst under bug fixes. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:
I agree. I would remove the parameter because it is never used elsewhere. So let's internally create the threshold. In addition, could you rename:
They are internal functions so we can change them. |
Thanks for the input! Unfortunately I'm having some trouble writing sensible tests for this and I'm starting to understand what is wrong. While I'm a bit reluctant to get into the off by one error problem, I think it might not be avoidable. So the way I understand it the variable The problem, which is probably the same as the off-by-one problem we mentioned previously, is that in In my understanding, if we find 3 eigenvalues, we should be looking at the LL for the cases rank=0, rank=1, rank=2, and rank=3. Currently we look at ranks 0,1, and 2. That was also why the code coverage was bad before, because the Still To Do:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that this now fixes the small eigenvalue problem, but whenever there is only one component, our n_components_ is 0?
It seems on master:
In [...]: X, _ = datasets.make_classification(n_informative=1, n_repeated=0,
...: n_redundant=0, n_clusters_per_class=1,
...: random_state=42)
...: pca = PCA(n_components='mle').fit(X)
In [...]: pca.n_components_
Out[...]: 0
That makes me think this PR is kinda complete as is, and the off by one issue can have its own issue+pr.
You could also add a test with a large cut-off to check that the passed value always works.
Okay, so then let's work on the off by one error in a separate issue. I left TODOs in the code, where I found the code coverage never reaches the lines on account of this. |
The CI issue is not related, I opened #16545 to figure it out. You can safely ignore it (I think) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to have this in and work on #16546 on a different PR. Thanks @lschwetlick
@lschwetlick minimal dependencies have been updated in the meanwhile. Could you please synchronise with upstream? this will update the build and (hopefully) green-lights all the tests. Thanks! |
Please change the title from WiP to MRG if this is awaiting review but otherwise ready for merge |
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…to _infer_dimension
I rebased the PR onto the current upstream master. I hope that was the right way to synchronise it? |
Looks good though merging master into your branch is cleaner :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise lgtm
Someone available to merge this twice approved all green PR? :) |
Thanks @lschwetlick (and @cmarmo for the reminder) ! |
Yay! About time this got closed! Thanks @lschwetlick
|
Reference Issues/PRs
closes #10359
closes #4441
This PR uses the proposed solutions from PR #10359 to issue #4441 and translates it to the current version of the decomposition package. We adressed 2 of the 3 issues raised by the reviewer:
We reverted changes relating to the off-by-one error. Maybe someone with more experience in linear algebra can weigh in and we can fix it as a seperate issue. It was discussed here and here.
other comments
Contributing authors were @gelavizh1, @marijavlajic and @lschwetlick at #Wimlds sprint Berlin :)
CC: @adrinjalali @noatamir