DOC Combined examples for feature_selection.RFE and feature_selection.RFECV #28065

raj-pulapakura · 2024-01-05T06:46:09Z

Reference Issues/PRs

Follow up to #26950. Issue #26927

What does this implement/fix? Explain your changes.

This PR picks up from the work of @Shreesha3112 , whom I am very grateful to for providing the starting code.

I've followed the advice from @glemaitre in regards to this review: #26950 (review) . In particular, I've combined the RFE and RFECV examples into a single document. I've also swapped out the handwritten digits dataset for the breast cancer dataset, as the model performance for this particular dataset actually benefits from RFE.

I haven't deleted the redundant RFECV example, just in case.

…election_RFE

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>

…v example

github-actions · 2024-01-05T06:47:36Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: f1e642f. Link to the linter CI: here}

raj-pulapakura · 2024-01-05T06:48:14Z

@adrinjalali

glemaitre

Could you remove the file doc/sg_execution_times.rst. We should add this file in the .gitignore file indeed but this will be done in another PR.

glemaitre

Since we are removing a file, we need to handle the redirection. Could you add the following in the redirects dictionary in doc/conf.py around l.300:

        "auto_examples/feature_selection/plot_rfe_breast_cancer"
    ),

ArturoAmorQ · 2024-01-09T20:53:12Z

I wouldn't change the digits dataset to cancer as it is not about accuracy, but the spirit of this example is to show a visualization of the importance (or relevance if you may) of each pixel when predicting a digit. I would rather merge the two examples (RFE and RFECV) as they are but adding a bit of narrative.

See my comment here on what the pixel color means and you can keep the introduction paragraph and the narrative on the dataset description from the original PR, as mentioned here.

raj-pulapakura · 2024-01-10T11:32:17Z

@ArturoAmorQ I see, thanks for the clarification. I'll revert back to the digits dataset.

Given your suggestion to merge the RFE and RFECV examples, do we want to:

Show usage of RFE and RFECV, or
Show usage of just RFECV

raj-pulapakura · 2024-01-10T11:40:09Z

@glemaitre has suggested in this comment that RFE and RFECV should be shown, so I'm going to go with that.

raj-pulapakura · 2024-01-10T11:41:00Z

Since we are removing a file, we need to handle the redirection. Could you add the following in the redirects dictionary in doc/conf.py around l.300:
        "auto_examples/feature_selection/plot_rfe_breast_cancer"
    ),

I'll make sure to do this after I have finished updating the code 👍

…raj-pulapakura/scikit-learn into add_links_feature_selection_RFE

glemaitre · 2024-01-11T10:05:04Z

plot_rfe_digits.py

You can remove this file

glemaitre · 2024-01-11T10:13:42Z

I would rather merge the two examples (RFE and RFECV) as they are but adding a bit of narrative.

I am getting a bit confused and my comment for redirection is not anymore useful. @ArturoAmorQ do you think it could be a good idea to remove plot_rfe_with_cross_validation.py since it does not bring more value than the discussion that we will add here.

raj-pulapakura · 2024-01-15T12:48:28Z

ping @ArturoAmorQ

ArturoAmorQ · 2024-01-16T09:36:29Z

do you think it could be a good idea to remove plot_rfe_with_cross_validation.py since it does not bring more value than the discussion that we will add here.

If we merge the two examples, yes, we can remove it. But the current version of RFECV displays error bars (which cannot be done with the heatmap using the digits dataset) and discusses in a simple dataset what happens with correlated features.

Notice that if we merge the two examples, that means using digits in a first section, and the synthetic dataset in a further section.

ArturoAmorQ · 2024-01-16T09:39:03Z

examples/feature_selection/plot_rfe_digits.py

+# %%
+# Visualizing Feature Importance after RFE
+# ----------------------------------------
+#
+# RFECV and RFE provide a ranking of the features based on their importance.
+# We can visualize this ranking to gain insights into which pixels
+# (or features) are deemed most significant by RFECV in the digit
+# classification task.

-# Plot pixel ranking
+# %%
+ranking = rfecv.ranking_.reshape(digits.images[0].shape)
 plt.matshow(ranking, cmap=plt.cm.Blues)
 plt.colorbar()
-plt.title("Ranking of pixels with RFE")
+plt.title("Ranking of pixels with RFECV")
 plt.show()


According to the documentation ranking_ is an array where the most important features are assigned rank 1 and the higher ranking the less important. Notice that in the current version of the example you have shades of blue going from 1 to 64 (we have 8 x 8 pixels) whereas your code uses a model which already truncated the feature space to keep only the 42 most relevant pixels and degenerating the rest to a value of 1. I am afraid this was not the spirit of the example. You can alternatively show the feature importance of the whole 64 pixels using different seeds of the train_test_split to check for stability.

raj-pulapakura · 2024-01-26T11:37:41Z

I think there are 2 slightly contradicting objectives here:

The first objective is to merge both examples because there's no point having 2 examples which both demonstrate the use of RFE/RFECV.

The second objective is to give the user:

A visual example which shows how the RFE algorithm identifies important features, using the digits dataset
A practical example which has a detailed narrative and analysis, using a synthetic dataset

glemaitre · 2024-01-26T13:17:41Z

A visual example which shows how the RFE algorithm identifies important features, using the digits dataset
A practical example which has a detailed narrative and analysis, using a synthetic dataset

We can use a single dataset for both aspect.

Another thing that we should do is to use an other estimator than SVM here since it does not scale with the number of samples. It would be best to use a LogisticRegression if we use a linear kernel.

marenwestermann · 2024-02-23T20:43:06Z

@raj-pulapakura would you like to continue working on this PR?

ArturoAmorQ · 2024-04-22T13:33:24Z

Closing this PR as superseded by #28862.

shreesha3112 and others added 13 commits July 31, 2023 15:48

add example links for feature_selection.RFE

dd22cce

Merge remote-tracking branch 'upstream/main' into add_links_feature_s…

5b53234

…election_RFE

Merge remote-tracking branch 'upstream/main' into add_links_feature_s…

66e6ab3

…election_RFE

plot_rfe_digits notebok style doc and enhanced descriptions

39d6e54

Merge remote-tracking branch 'upstream/main' into add_links_feature_s…

5141941

…election_RFE

added pipeline to example

bb70632

Merge remote-tracking branch 'upstream/main' into add_links_feature_s…

9f2afbb

…election_RFE

Update examples/feature_selection/plot_rfe_digits.py

da2359c

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

Merge branch 'main' into add_links_feature_selection_RFE

41328c7

Apply suggestions from code review

5ab2392

Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>

Merge branch 'main' into add_links_feature_selection_RFE

cc0aa51

plot_rfe_digits converted to plot_rfe_breast_cancer with rfe and rfec…

ff96028

…v example

using breast cancer dataset, rfe and rfecv examples

2e5de61

raj-pulapakura changed the title ~~Add links feature selection rfe~~ DOC Combined examples for feature_selection.RFE and feature_selection.RFECV Jan 5, 2024

github-actions bot added the Documentation label Jan 5, 2024

raj-pulapakura added 2 commits January 5, 2024 18:39

Merge branch 'main' into add_links_feature_selection_RFE

5f52c23

Merge branch 'main' into add_links_feature_selection_RFE

72fe19c

glemaitre self-requested a review January 9, 2024 15:10

glemaitre reviewed Jan 9, 2024

View reviewed changes

raj-pulapakura added 3 commits January 10, 2024 23:09

added suggested changes

6bdbc4c

add suggested changes

ac3b5cc

Merge branch 'add_links_feature_selection_RFE' of https://github.com/…

f3605e8

…raj-pulapakura/scikit-learn into add_links_feature_selection_RFE

glemaitre self-requested a review January 10, 2024 13:10

glemaitre removed their request for review January 10, 2024 13:11

added redirect to conf.py

f1e642f

raj-pulapakura requested a review from glemaitre January 11, 2024 02:43

glemaitre reviewed Jan 11, 2024

View reviewed changes

plot_rfe_digits.py Outdated

Copy link

Member

glemaitre Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this file

ArturoAmorQ reviewed Jan 16, 2024

View reviewed changes

adrinjalali added Stalled help wanted labels Apr 19, 2024

ArturoAmorQ closed this Apr 22, 2024

Uh oh!

DOC Combined examples for feature_selection.RFE and feature_selection.RFECV #28065

DOC Combined examples for feature_selection.RFE and feature_selection.RFECV #28065

Uh oh!

Conversation

raj-pulapakura commented Jan 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

github-actions bot commented Jan 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

raj-pulapakura commented Jan 5, 2024

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

ArturoAmorQ commented Jan 9, 2024

Uh oh!

raj-pulapakura commented Jan 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raj-pulapakura commented Jan 10, 2024

Uh oh!

raj-pulapakura commented Jan 10, 2024

Uh oh!

glemaitre Jan 11, 2024

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Jan 11, 2024

Uh oh!

raj-pulapakura commented Jan 15, 2024

Uh oh!

ArturoAmorQ commented Jan 16, 2024

Uh oh!

ArturoAmorQ Jan 16, 2024

Choose a reason for hiding this comment

Uh oh!

raj-pulapakura commented Jan 26, 2024

Uh oh!

glemaitre commented Jan 26, 2024

Uh oh!

marenwestermann commented Feb 23, 2024

Uh oh!

ArturoAmorQ commented Apr 22, 2024

Uh oh!

Uh oh!

raj-pulapakura commented Jan 5, 2024 •

edited

Loading

github-actions bot commented Jan 5, 2024 •

edited

Loading

raj-pulapakura commented Jan 10, 2024 •

edited

Loading