-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC Enhanced example visualization to RFE #28862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@adrinjalali could you please review this PR? I wasn't sure if combining RFECV was our goal for this issue. Thanks :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @plon-Susk7. Here is a batch of comments.
Also, I think the example would benefit from a more descriptive narrative in the introduction paragraph, similar to (feel free to rephrase):
This example demonstrates how :class:
~sklearn.feature_selection.RFE
can be used
to determine the importance of individual pixels when classifying handwritten digits.
RFE is a method that recursively removes the least significant features and retrains
the model, allowing us to rank features by their importance. the most important features are assigned rank 1 and the higherranking_
, the less important. This ranking is also encoded by the shades of blue. As expected, pixels in the center of the image are more predictive than pixels close to the edges.
pipe = Pipeline( | ||
[("rfe", RFE(estimator=LogisticRegression(), n_features_to_select=1, step=1))] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to ignore the ConvergenceWarning
, we need to scale the data. Here I propose a MinMaxScaler
as a slightly better option than StandardScaler
when dealing with the digits dataset (see #25334 (comment)), but I am open to any other scaling.
pipe = Pipeline( | |
[("rfe", RFE(estimator=LogisticRegression(), n_features_to_select=1, step=1))] | |
) | |
pipe = Pipeline( | |
[ | |
("scaler", MinMaxScaler()), | |
("rfe", RFE(estimator=LogisticRegression(), n_features_to_select=1, step=1)), | |
] | |
) |
Thank you very much @ArturoAmorQ. I have made changes as suggested. |
Hi @ArturoAmorQ, I noticed a few failed tests here. Is this failing because of high computational cost or maybe of the convergence warnings the code is throwing? |
It was a random failure on the CI, I just had to re-trigger the failing test. |
Hi @ArturoAmorQ @adrinjalali, sorry to bug you guys. Could you please review my PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a bit of rephrasing and added the missing backticks for the cross-reference to render correctly. Otherwise this PR is a net improvement to the example and LGTM :)
Thoughts on this @adrinjalali?
Thanks again @plon-Susk7, merging! |
Thanks @adrinjalali @ArturoAmorQ , I am more than happy to have contributed to this repo. |
Reference Issues/PRs
This PR picks work of @Shreesha3112 and @raj-pulapakura. I reviewed the discussion and used Logistic Regression as classifier instead of SVM. I implemented pipeline to the example as advised in the discussion. I annotated the ranking map with numbers for clarity.