DOC Enhanced example visualization to RFE #28862

plon-Susk7 · 2024-04-19T12:36:18Z

Follow up to #26950. Original Issue #26927

Reference Issues/PRs

This PR picks work of @Shreesha3112 and @raj-pulapakura. I reviewed the discussion and used Logistic Regression as classifier instead of SVM. I implemented pipeline to the example as advised in the discussion. I annotated the ranking map with numbers for clarity.

github-actions · 2024-04-19T12:37:36Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 8550e2c. Link to the linter CI: here}

plon-Susk7 · 2024-04-19T12:56:07Z

@adrinjalali could you please review this PR? I wasn't sure if combining RFECV was our goal for this issue. Thanks :)

ArturoAmorQ

Thanks for the PR @plon-Susk7. Here is a batch of comments.

Also, I think the example would benefit from a more descriptive narrative in the introduction paragraph, similar to (feel free to rephrase):

This example demonstrates how :class:~sklearn.feature_selection.RFE can be used
to determine the importance of individual pixels when classifying handwritten digits.
RFE is a method that recursively removes the least significant features and retrains
the model, allowing us to rank features by their importance. the most important features are assigned rank 1 and the higher ranking_, the less important. This ranking is also encoded by the shades of blue. As expected, pixels in the center of the image are more predictive than pixels close to the edges.

examples/feature_selection/plot_rfe_digits.py

ArturoAmorQ · 2024-04-19T14:08:48Z

examples/feature_selection/plot_rfe_digits.py

+pipe = Pipeline(
+    [("rfe", RFE(estimator=LogisticRegression(), n_features_to_select=1, step=1))]
+)


We don't need to ignore the ConvergenceWarning, we need to scale the data. Here I propose a MinMaxScaler as a slightly better option than StandardScaler when dealing with the digits dataset (see #25334 (comment)), but I am open to any other scaling.

Suggested change

pipe = Pipeline(

[("rfe", RFE(estimator=LogisticRegression(), n_features_to_select=1, step=1))]

)

pipe = Pipeline(

[

("scaler", MinMaxScaler()),

("rfe", RFE(estimator=LogisticRegression(), n_features_to_select=1, step=1)),

]

)

examples/feature_selection/plot_rfe_digits.py

plon-Susk7 · 2024-04-19T15:15:01Z

Thank you very much @ArturoAmorQ. I have made changes as suggested.

plon-Susk7 · 2024-04-19T16:35:57Z

Hi @ArturoAmorQ, I noticed a few failed tests here. Is this failing because of high computational cost or maybe of the convergence warnings the code is throwing?

ArturoAmorQ · 2024-04-19T18:48:44Z

It was a random failure on the CI, I just had to re-trigger the failing test.

plon-Susk7 · 2024-04-21T09:23:42Z

Hi @ArturoAmorQ @adrinjalali, sorry to bug you guys. Could you please review my PR?

ArturoAmorQ

Just a bit of rephrasing and added the missing backticks for the cross-reference to render correctly. Otherwise this PR is a net improvement to the example and LGTM :)

Thoughts on this @adrinjalali?

examples/feature_selection/plot_rfe_digits.py

ArturoAmorQ · 2024-04-22T13:19:30Z

Thanks again @plon-Susk7, merging!

plon-Susk7 · 2024-04-22T13:22:19Z

Thanks @adrinjalali @ArturoAmorQ , I am more than happy to have contributed to this repo.

issue resolved

efda7e9

plon-Susk7 added 2 commits April 19, 2024 18:14

fixed formatting

dc5c891

fixing linting

d9cbd70

ArturoAmorQ changed the title ~~Enhanced example visualization to RFE~~ DOC Enhanced example visualization to RFE Apr 19, 2024

github-actions bot added the Documentation label Apr 19, 2024

ArturoAmorQ reviewed Apr 19, 2024

View reviewed changes

plon-Susk7 added 2 commits April 19, 2024 20:40

made changes as advised

4ad18bb

minmax scaler added

66778ec

ArturoAmorQ approved these changes Apr 22, 2024

View reviewed changes

examples/feature_selection/plot_rfe_digits.py Outdated Show resolved Hide resolved

Update examples/feature_selection/plot_rfe_digits.py

8550e2c

adrinjalali approved these changes Apr 22, 2024

View reviewed changes

ArturoAmorQ merged commit 79e14a7 into scikit-learn:main Apr 22, 2024
30 checks passed

This was referenced Apr 22, 2024

DOC Notebook style and enhanced descriptions and add example links for feature_selection.RFE #26950

Closed

DOC Combined examples for feature_selection.RFE and feature_selection.RFECV #28065

Closed

plon-Susk7 deleted the rce-example branch April 22, 2024 16:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC Enhanced example visualization to RFE #28862

DOC Enhanced example visualization to RFE #28862

plon-Susk7 commented Apr 19, 2024

github-actions bot commented Apr 19, 2024 •

edited

Loading

plon-Susk7 commented Apr 19, 2024

ArturoAmorQ left a comment

ArturoAmorQ Apr 19, 2024

plon-Susk7 commented Apr 19, 2024

plon-Susk7 commented Apr 19, 2024

ArturoAmorQ commented Apr 19, 2024

plon-Susk7 commented Apr 21, 2024

ArturoAmorQ left a comment

ArturoAmorQ commented Apr 22, 2024

plon-Susk7 commented Apr 22, 2024

DOC Enhanced example visualization to RFE #28862

DOC Enhanced example visualization to RFE #28862

Conversation

plon-Susk7 commented Apr 19, 2024

Reference Issues/PRs

github-actions bot commented Apr 19, 2024 • edited Loading

✔️ Linting Passed

plon-Susk7 commented Apr 19, 2024

ArturoAmorQ left a comment

Choose a reason for hiding this comment

ArturoAmorQ Apr 19, 2024

Choose a reason for hiding this comment

plon-Susk7 commented Apr 19, 2024

plon-Susk7 commented Apr 19, 2024

ArturoAmorQ commented Apr 19, 2024

plon-Susk7 commented Apr 21, 2024

ArturoAmorQ left a comment

Choose a reason for hiding this comment

ArturoAmorQ commented Apr 22, 2024

plon-Susk7 commented Apr 22, 2024

github-actions bot commented Apr 19, 2024 •

edited

Loading