Skip to content

Implemented shrinkage LDA classifier. #3105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 39 commits into from
Closed

Implemented shrinkage LDA classifier. #3105

wants to merge 39 commits into from

Conversation

cbrnr
Copy link
Contributor

@cbrnr cbrnr commented Apr 24, 2014

@kazemakase and I implemented shrinkage LDA (see also my comment in #1649). Note that unlike lda.LDA, our implementation does only classification and not transformation (dimensionality reduction). Note that we are not using the existing implementation because shrinkage is not possible with SVD.

We would be happy if someone could review the code.

@mbillingr
Copy link
Contributor

Here is an example of the shrinkage LDA in action:
slda
(20 training samples, 100 test samples, 50 repetitions, normally distributed features)

@cle1109 the lambda function makes the build fail on python 2.
Any idea what's causing the TypeError: unorderable types: type() < type() on python 3?

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 2edb383 on cle1109:sldac into 463cea5 on scikit-learn:master.

@larsmans
Copy link
Member

Does shrinkage mean regularization? (Statistics terminology is driven me crazy...)

@@ -0,0 +1,235 @@
"""
The :mod:`sklearn.slda` module implements Shrinkage LDA.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's an LDA variant, it should go in sklearn.lda.

@larsmans
Copy link
Member

Also, why is this a separate estimator? Can't this feature be added to the existing LDA class?

@cbrnr
Copy link
Contributor Author

cbrnr commented Apr 28, 2014

Yes, you are correct, shrinkage means regularization (we apply it to the covariance estimate). I can put it in sklearn.lda. I will fix the init function, and this should also be done for the original class in sklearn.lda(I took the code from there).

Concerning creating a separate estimator, I did not want to change the original class to avoid breaking anything. There are some problems when we also want to have LDA transformation with our implementation. To use shrinkage, we have to work with covariance estimates. The original LDA class uses SVD instead, which does have some advantages. For example, to compute the transformation, we would have to solve a generalized eigenvalue problem. In an underdetermined case, this works well (that's what shrinkage is used for), but it fails when using no shrinkage (eigh raises an exception I believe). In contrast, this does work with the current SVD implementation.

In addition, I'm not sure how to handle issue #1649 with our code. It looks like our implementation isn't robust to scaling either (same like the SVD implementation). However, with our implementation, this problem is not limited to the underdetermined case, but also seems to occur in the overdetermined case. This is not to say that the algorithm does not work; after all, it's the standard LDA algorithm described in Duda & Hart, which everyone seems to be using.

@larsmans
Copy link
Member

You can avoid breaking working code by hardening the tests--I prefer new features to be added to existing estimators, as long as the classes don't become too heavy. Could you try to add the functionality to the existing class to see if it works?

@amueller
Copy link
Member

Maybe a stupid question, but could you explain why it is not possible to use shrinkage in the SVD implementation by @ksemb?

@cbrnr
Copy link
Contributor Author

cbrnr commented Apr 29, 2014

Because we shrink covariance matrices used in the generalized eigenvalue problem. SVD does not work on covariance matrices, but instead decomposes the data directly into singular values.

It would be great if we could use shrinkage with SVD, but I believe this is not possible (correct me if I'm wrong).

@mbillingr
Copy link
Contributor

The reason we put the implementation in it's own class and module (for now) is that we made two fundamental changes to the original LDA.

  1. Different API: lda.LDA is both, a classifier and a transform, while slda.SLDA is a classifier only.
  2. Different implementation: SVD vs. covariance estimation

Once we know how to tackle these differences we can merge the different LDAs according to your guidlines.

Would be great to know if SVD can be shrunk somehow, but I don't believe that's possible either.

@cbrnr
Copy link
Contributor Author

cbrnr commented Apr 29, 2014

Still we could put the SLDA class in the lda module. However, because of these issues, it would be very cumbersome to merge everything into one class. Even creating a base LDA class would not be feasible, because we couldn't reuse anything.

How would you like us to proceed?

@cbrnr
Copy link
Contributor Author

cbrnr commented Jun 3, 2014

@ksemb, memory consumption could be an issue. What is a typical ballpark number of features in your field?

Concerning your second argument, I don't think that non-diagonal regularization is limited to SVD. In the case of Ledoit-Wolf shrinkage, we only modify the diagonal, but we could use other regularization methods (such as Tikhonov regularization).

Do you have a suggestion on how we could move forward with the two LDA classifier implementations? I am kind of confused, because there seem to be two issues at the moment:

  1. SVD vs. covariance-based algorithms
  2. Issue Univariate variate scaling in LDA #1649

I think we should focus on the first point here, and discuss the second issue separately.

@mmbannert
Copy link

I just wanted to briefly point out that I'm very glad that someone wrote code for carrying out shrinkage LDA in scikit-learn. Before I found this thread, I tried implementing it myself but I was not familiar with the implementation of LDA via SVD instead of explicit covariance matrices. If it isn't possible to perform shrinkage in the SVD scenario, I'd strongly favor the algorithm that calculates covariance matrices (because, from a pragmatic point of view, if the data are big, there is no benefit in computational efficiency if the estimators are unstable)

@mblondel
Copy link
Member

I think automatic selection of the regularization constant by cross-validation is also useful. What you really want is to maximize classification accuracy on unseen data (generalization performance). I would thus add an alpha parameter (like other regularized classifiers) which can take a numerical value (e.g. alpha=1e-3) or a string (e.g. alpha="ledoit-wolf").

@mbillingr
Copy link
Contributor

This would certainly make sense. Note that we no longer work on this PR, but continue in #3523. Could you move your suggestion over there, please?

Btw, I vaguely remember that Blankertz proofed in his shrinkage-LDA paper that the ledoit-wolf solution is optimal. If that is case cross-validation would probably not be required.

@mblondel
Copy link
Member

I'd guess it is optimal on training but not on test data, isn't it? CV uses validation data so the result would be different.

@mbillingr
Copy link
Contributor

Well, in that case 100% overfitting would be optimal :)

Sorry, I don't remember the details, but I'm sure that guy knew what he did. Anyway, I might be wrong. In that case a numerical parameter would certainly be useful.

@mblondel
Copy link
Member

What is the title of the paper?

@mbillingr
Copy link
Contributor

Blankertz et al. "Single-Trial Analysis and Classification of ERP Components - a Tutorial", NeuroImage, 2010.

I looked at the paper again and have to admit I was mistaken. There is no proof, and they even state that cross-validation could get give better results :) Sorry for the fuss.

@cbrnr
Copy link
Contributor Author

cbrnr commented Aug 12, 2014

I made a new comment in #3523, please continue the discussion over there.

@cbrnr cbrnr deleted the sldac branch September 22, 2014 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants