Skip to content

DOC Enhanced example visualization to RFE #28862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 22, 2024

Conversation

plon-Susk7
Copy link
Contributor

Follow up to #26950. Original Issue #26927

Reference Issues/PRs

This PR picks work of @Shreesha3112 and @raj-pulapakura. I reviewed the discussion and used Logistic Regression as classifier instead of SVM. I implemented pipeline to the example as advised in the discussion. I annotated the ranking map with numbers for clarity.

Copy link

github-actions bot commented Apr 19, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 8550e2c. Link to the linter CI: here

@plon-Susk7
Copy link
Contributor Author

@adrinjalali could you please review this PR? I wasn't sure if combining RFECV was our goal for this issue. Thanks :)

@ArturoAmorQ ArturoAmorQ changed the title Enhanced example visualization to RFE DOC Enhanced example visualization to RFE Apr 19, 2024
Copy link
Member

@ArturoAmorQ ArturoAmorQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @plon-Susk7. Here is a batch of comments.

Also, I think the example would benefit from a more descriptive narrative in the introduction paragraph, similar to (feel free to rephrase):

This example demonstrates how :class:~sklearn.feature_selection.RFE can be used
to determine the importance of individual pixels when classifying handwritten digits.
RFE is a method that recursively removes the least significant features and retrains
the model, allowing us to rank features by their importance. the most important features are assigned rank 1 and the higher ranking_, the less important. This ranking is also encoded by the shades of blue. As expected, pixels in the center of the image are more predictive than pixels close to the edges.

Comment on lines 31 to 33
pipe = Pipeline(
[("rfe", RFE(estimator=LogisticRegression(), n_features_to_select=1, step=1))]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to ignore the ConvergenceWarning, we need to scale the data. Here I propose a MinMaxScaler as a slightly better option than StandardScaler when dealing with the digits dataset (see #25334 (comment)), but I am open to any other scaling.

Suggested change
pipe = Pipeline(
[("rfe", RFE(estimator=LogisticRegression(), n_features_to_select=1, step=1))]
)
pipe = Pipeline(
[
("scaler", MinMaxScaler()),
("rfe", RFE(estimator=LogisticRegression(), n_features_to_select=1, step=1)),
]
)

@plon-Susk7
Copy link
Contributor Author

Thank you very much @ArturoAmorQ. I have made changes as suggested.

@plon-Susk7
Copy link
Contributor Author

Hi @ArturoAmorQ, I noticed a few failed tests here. Is this failing because of high computational cost or maybe of the convergence warnings the code is throwing?

@ArturoAmorQ
Copy link
Member

It was a random failure on the CI, I just had to re-trigger the failing test.

@plon-Susk7
Copy link
Contributor Author

Hi @ArturoAmorQ @adrinjalali, sorry to bug you guys. Could you please review my PR?

Copy link
Member

@ArturoAmorQ ArturoAmorQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a bit of rephrasing and added the missing backticks for the cross-reference to render correctly. Otherwise this PR is a net improvement to the example and LGTM :)

Thoughts on this @adrinjalali?

@ArturoAmorQ
Copy link
Member

Thanks again @plon-Susk7, merging!

@ArturoAmorQ ArturoAmorQ merged commit 79e14a7 into scikit-learn:main Apr 22, 2024
30 checks passed
@plon-Susk7
Copy link
Contributor Author

Thanks @adrinjalali @ArturoAmorQ , I am more than happy to have contributed to this repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants