-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Implemented shrinkage LDA classifier. #3105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The duplicate LDA (lda.LDA and slda.LDA) caused problems with some automated tests.
sLDA: example, tests and fixes
Does shrinkage mean regularization? (Statistics terminology is driven me crazy...) |
@@ -0,0 +1,235 @@ | |||
""" | |||
The :mod:`sklearn.slda` module implements Shrinkage LDA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's an LDA variant, it should go in sklearn.lda
.
Also, why is this a separate estimator? Can't this feature be added to the existing |
Yes, you are correct, shrinkage means regularization (we apply it to the covariance estimate). I can put it in Concerning creating a separate estimator, I did not want to change the original class to avoid breaking anything. There are some problems when we also want to have LDA transformation with our implementation. To use shrinkage, we have to work with covariance estimates. The original LDA class uses SVD instead, which does have some advantages. For example, to compute the transformation, we would have to solve a generalized eigenvalue problem. In an underdetermined case, this works well (that's what shrinkage is used for), but it fails when using no shrinkage (eigh raises an exception I believe). In contrast, this does work with the current SVD implementation. In addition, I'm not sure how to handle issue #1649 with our code. It looks like our implementation isn't robust to scaling either (same like the SVD implementation). However, with our implementation, this problem is not limited to the underdetermined case, but also seems to occur in the overdetermined case. This is not to say that the algorithm does not work; after all, it's the standard LDA algorithm described in Duda & Hart, which everyone seems to be using. |
You can avoid breaking working code by hardening the tests--I prefer new features to be added to existing estimators, as long as the classes don't become too heavy. Could you try to add the functionality to the existing class to see if it works? |
Maybe a stupid question, but could you explain why it is not possible to use shrinkage in the SVD implementation by @ksemb? |
Because we shrink covariance matrices used in the generalized eigenvalue problem. SVD does not work on covariance matrices, but instead decomposes the data directly into singular values. It would be great if we could use shrinkage with SVD, but I believe this is not possible (correct me if I'm wrong). |
The duplicate LDA (lda.LDA and slda.LDA) caused problems with some automated tests.
The reason we put the implementation in it's own class and module (for now) is that we made two fundamental changes to the original LDA.
Once we know how to tackle these differences we can merge the different LDAs according to your guidlines. Would be great to know if SVD can be shrunk somehow, but I don't believe that's possible either. |
Still we could put the How would you like us to proceed? |
@ksemb, memory consumption could be an issue. What is a typical ballpark number of features in your field? Concerning your second argument, I don't think that non-diagonal regularization is limited to SVD. In the case of Ledoit-Wolf shrinkage, we only modify the diagonal, but we could use other regularization methods (such as Tikhonov regularization). Do you have a suggestion on how we could move forward with the two LDA classifier implementations? I am kind of confused, because there seem to be two issues at the moment:
I think we should focus on the first point here, and discuss the second issue separately. |
The duplicate LDA (lda.LDA and slda.LDA) caused problems with some automated tests.
…sldac Conflicts: sklearn/tests/test_lda.py
I just wanted to briefly point out that I'm very glad that someone wrote code for carrying out shrinkage LDA in scikit-learn. Before I found this thread, I tried implementing it myself but I was not familiar with the implementation of LDA via SVD instead of explicit covariance matrices. If it isn't possible to perform shrinkage in the SVD scenario, I'd strongly favor the algorithm that calculates covariance matrices (because, from a pragmatic point of view, if the data are big, there is no benefit in computational efficiency if the estimators are unstable) |
I think automatic selection of the regularization constant by cross-validation is also useful. What you really want is to maximize classification accuracy on unseen data (generalization performance). I would thus add an |
This would certainly make sense. Note that we no longer work on this PR, but continue in #3523. Could you move your suggestion over there, please? Btw, I vaguely remember that Blankertz proofed in his shrinkage-LDA paper that the ledoit-wolf solution is optimal. If that is case cross-validation would probably not be required. |
I'd guess it is optimal on training but not on test data, isn't it? CV uses validation data so the result would be different. |
Well, in that case 100% overfitting would be optimal :) Sorry, I don't remember the details, but I'm sure that guy knew what he did. Anyway, I might be wrong. In that case a numerical parameter would certainly be useful. |
What is the title of the paper? |
Blankertz et al. "Single-Trial Analysis and Classification of ERP Components - a Tutorial", NeuroImage, 2010. I looked at the paper again and have to admit I was mistaken. There is no proof, and they even state that cross-validation could get give better results :) Sorry for the fuss. |
I made a new comment in #3523, please continue the discussion over there. |
@kazemakase and I implemented shrinkage LDA (see also my comment in #1649). Note that unlike
lda.LDA
, our implementation does only classification and not transformation (dimensionality reduction). Note that we are not using the existing implementation because shrinkage is not possible with SVD.We would be happy if someone could review the code.