-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] LDA refactoring #3523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] LDA refactoring #3523
Conversation
This PR will need to be rebased on top of the current master and the input validation need to be updated to use the new utilities, see: Also as @larsmans said in the previous PR the narrative doc in Finally it would be great to write and example in the Once all this change are made please change the title of this PR from |
@cle1109 I've updated |
Changes Unknown when pulling 5a18f80 on cle1109:slda into * on scikit-learn:master*. |
OK, I think this should be it, everything should be done. |
Changes Unknown when pulling e610f87 on cle1109:slda into * on scikit-learn:master*. |
Changes Unknown when pulling 64dc88b on cle1109:slda into * on scikit-learn:master*. |
Shrinkage LDA can be used by setting the ``use_covariance`` parameter of the | ||
:class:`lda.LDA` class to 'ledoitwolf'. This automatically determines the | ||
optimal shrinkage parameter in an analytic way following the lemma introduced by | ||
Ledoit and Wolf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add ref here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forget it. It's right below. my bad
Changes Unknown when pulling 807c83f on cle1109:slda into * on scikit-learn:master*. |
# centered group data | ||
Xgc = Xg - meang | ||
Xc.append(Xgc) | ||
if self.use_covariance is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would extract the two solvers as private functions (or private methods if you need access to the object).
OK, so here's my take on the discussions in #3500 and #3105 (might I suggest to move it here, because here's where our latest code is). I really like the idea of introducing I support the idea of introducing the possibility to optimize the parameter via cross-validation. The other question is how close to the textbook we want to be in the So in principle we could go ahead and start implementing it according to all suggestions here. The only thing that needs to be clarified first is how we do both classification and transform in the |
Instead of introducing a We could also add a (I took the liberty to change the title of this PR and add a TODO list.) |
Actually, we can drop the |
Just |
OK! (I deleted my comment because I saw it in the todo list at the top) |
i got a comment on one of the items in the todo list:
Numerical errors can blow up and cause big differences in classification results if feature space is high dimensional. That is because these errors cause a small amount of uncertainty in the orientation of the hyperplane. The volume of this uncertainty increases exponentially with dimensionality. The bigger this volume the higher the chance of unseen samples lying on opposing sides of the hyperplane in different implementations. In short, classification results can differ noticeably due to numerical errors. Something to keep in mind when designing the tests. |
|
||
Parameters | ||
---------- | ||
n_components: int | ||
n_components : int | ||
Number of components (< n_classes - 1) for dimensionality reduction | ||
|
||
priors : array, optional, shape = [n_classes] | ||
Priors on classes | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring for intercept_
is missing. This is an array of size n_classes
.
…leaned up code and added references.
@agramfort, I am very confident now that the code does what it is supposed to do :-). The minor issue of differently scaled |
I think we should merge this. Maybe squash the commits, there are quite a few. |
Do you want me to squash the commits (to only one?) - or will you do that? |
I don't mind pushing these commits as they are... let's merge this ASAP. |
+1 |
Ok, it looks like we don't do squashing any more... I felt that was helpful for bugfix releases but ok.... |
I agree with you. I think that we should encourage it. But it don't feel |
🍻 @cle1109 ! |
😄 - awesome! Thank you guys, I really learned a lot while working on this PR. Looking forward to contributing more soon! |
This fixes #1649, right? Or only for the new solvers? |
I think so |
@agramfort is that to "fixed #1649" or to "only for the new solvers"? |
OK, here is a new version of the LDA class with (optional) shrinkage. The one in #3105 was already very messy, so I thought I'd start from scratch. By default, nothing has changed and the old SVD-based code is used.