Skip to content

Implemented Supervised PCA #5196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

stylianos-kampakis
Copy link

Implemented Supervised PCA algorithm by Bair et al. plus an extension of the model for classification based on logistic regression.

References: Bair, Eric, et al. "Prediction by supervised principal
components." Journal of the American Statistical Association 101.473
(2006).

of the model for classification based on logistic regression.

References:     Bair, Eric, et al. "Prediction by supervised principal
components." Journal of the American Statistical Association 101.473
(2006).
@amueller
Copy link
Member

amueller commented Sep 2, 2015

Thanks for the PR.
I am not that familiar with the algorithm, and it would be great if you could add examples that compare this against elasticnet and linear discriminant analysis to show the benefit.

Also, is this seems to be the same as
make_pipeline(PCA(), LogisticRegression(), LogisticRegression()), right?
If that is the case, I don't think adding an extra model is warranted, maybe an example?

It is not that easy to do for LogisticRegression, but it will be with #4242.

@stylianos-kampakis
Copy link
Author

Hello,

It is similar but not equivalent. Supervised PCA also contains an extra step of filtering out useless attributes. So, the steps are

  1. Filter out attributes: Fit a model having only a single feature as input, and if the coefficient is above a threshold, then keep it.
  2. Conduct PCA on the reduced dataset
  3. Fit the final model.

The first step is something that can require a few lines of code, so having a new model makes life a bit easier :)

In the book "The Elements of Statistical Learning" there were some examples on where this technique would be better against elastic net. I am not sure if they contained examples that include LDA. I can look into it.

Best regards,
Stelios

@amueller
Copy link
Member

amueller commented Sep 8, 2015

Sorry, I misread the code then.
So it is equivalent to make_pipeline(LogisticRegression(), PCA(), LogisticRegression())?
The first logistic regression will drop features below a threshold, then you do a PCA and then train a model on the outcome. I'm not sure what the PCA is for, though? I'll have a look at ESL....

@amueller
Copy link
Member

amueller commented Sep 8, 2015

Ah, ok, you do drop the number of components. And they use univariate selection. So it is make_pipeline(SelectKBest(), PCA(), LogisticRegression()).
They do use a more fancy univariate selection criterion in the first step, which is more appropriate for survival analysis. That gives them an advantage on their data.

I don't think this particular pipeline deserves it's own estimator.

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Sep 8, 2015 via email

…hef, concordance correlation chef, example of concordance vs pearson

Added the following:

1) Improved version of supervised PCA
2) Example of supervised PCA against LDA and QDA
3) Example of supervised PCA against elasticNet
4) Pearson and concordance correlation coefficients
5) Example where the concordance correlation coefficient can be better
than Pearson
@hlin117
Copy link
Contributor

hlin117 commented Nov 25, 2015

There's already a somewhat similar example on scikit-learn's documentation. It's only missing the PCA step.
http://scikit-learn.org/stable/auto_examples/feature_selection/feature_selection_pipeline.html

@amueller
Copy link
Member

closing as no reply and no excitement. This is a pretty straight-forward pipeline imho.

@amueller amueller closed this Sep 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants