-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
RFECV with SVC & kernel != 'linear' == ValueError #5168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is by design. It's not possible to use RFE with SVC with any other kernel. Maybe a good reason to add SBS http://rasbt.github.io/mlxtend/docs/feature_selection/sequential_backward_selection/ ? |
Ok, cool. Sebastian's Ensemble Classifier from mlxtend sure is handy, so I have no doubt this would also be a fine addition! Thanks Andy! |
he contributed the ensemble classifier to sklearn btw ;) |
I know :D +1 for the addition! Haven't had a chance to take it out for a test drive...yet! |
I was just talking about this with @rhiever and he was wondering where to find it in scikit-learn ;) So if there is more general interest, I could prepare a pull request for this (okay, I should takle #5070 first this/next weekend though). The code is actually pretty lean so far (https://github.com/rasbt/mlxtend/blob/master/mlxtend/feature_selection/sequential_backward_select.py), maybe adding an option to toggle between forward and backward selection and then just call it SFS (Sequential Feature Selector/Selection) or so. What do you think, is there still interest @jmwoloso @amueller ? |
The concept is actually pretty simple and a lot of people may do something similar already (without necessarily calling it Sequential Backward/Forward Selection). Even so, this may be a convenient wrapper (incl. CV and GridSearch), plus it would be a different "application" than RFE (not talking about better or worse here). Whereas the RFE selects based on weights of linear models you'd select by performance metric (choosing any classification algorithm). Maybe we could even use |
@rasbt I think it would be a nice addition if you want to move it from mlxtend to sklearn :) |
Nice, I'd be definitely up to it.
mlxtend is actually just more of a "playground" for me. The stuff there should all work fine, but it is not the nicest, most efficient code; more like a "born out of need" kind of thing ;). I purposely avoid too much refactoring and spread the code over different classes since this is more for readability, and people can just copy & paste a certain function as needed without necessarily installing the whole package. That being said, I am happy to contribute the SBS to scikit-learn; it is not only about "giving back to the nice community" but also a valuable learning experience, and personally, I would also prefer to use these things via the cleaner and battle-tested scikit-learn API :P Coincidentally, I am planning to implement Sequential Forward Selection (SFS), Sequential Forward Floating Selection (SFFS), and Sequential Forward Floating Selection (SBFS) this weekend -- a colleague wants to use it for a study, and I just happened to need it next week too for a project with my experimental biology collaborators. I will probably implement them all separately, but once that's done I will open a placeholder pull request where we can discuss the implementation in scikit-learn further. E.g., having one SFS (Sequential Feature Selector) with different toggle options to switch between SFS, SBS, SFFS, SFBS (if the latter two should be included at all). |
@rasbt Sounds like a busy weekend :D |
I am not very familiar with the techniques but I think for sklearn it is better to err on the side of not including all options instead of including too many. (the name for SBFS is Sequential Backward I guess?) |
Sure, I agree. I don't want to make it unnecessarily complex. However, I think both the "forward" and "backward" approach have useful applications. Let's say you have 100 features and want to select the "best performing" subset of 10 features. Maybe it wouldn't even be that complex to have one Here are some short description about the different algorithms here (they are really super simple), and some example about the current usage/API, which we may want to change:
But I could create an early pull request with a |
I have to check on the Floating versions but I agree that both forward and backward would be helpful. |
do you have a reference for the floating version? |
Good point, I wanted to look for some empirical studies or maybe try to find the original papers for reference -- I implemented these algos from old notes that I took in a pattern classification class, I think the Prof may have used this paper as "reference": http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=02CB16CB1C28EA6CB57E212861CFB180?doi=10.1.1.24.4369&rep=rep1&type=pdf Floating versions are better in terms of classifier performance since you sample more feature subspaces, but it is computationally also more expensive. It would maybe be more interesting to not only compare SFS, SFFS, and optimal (exhaustive search) via classifier performance but also adding a time component. So, again, it really depends on the application to choose the more appropriate one. |
Hi Guys, |
Which one are you using, the one that is currently under construction (#8684) or the one from mlxtend (http://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/)? Regarding the former, the API may change a bit depending on how the PR goes, and for the latter, please feel free to ask questions regarding the indices via the mlxtend mailing list or Gh-issues |
I'm not even sure this is an issue worth mentioning, but thought I'd put it here in case anyone else ran into it. If you're specifying that SVC be used as the 'estimator' for RFECV and you have the SVC 'kernel' set to 'rbf', you'll get: "ValueError: coef_ is only available when using linear kernel". I haven't tested this with any other kernel setting like poly or sigmoid (my model is still running as I type this) but I imagine it would happen with anything where kernel != linear.
The text was updated successfully, but these errors were encountered: