DOC Notebook style and enhanced descriptions and add example links for feature_selection.RFE #26950

Shreesha3112 · 2023-07-31T10:23:23Z

Reference Issues/PRs

issue #26927

What does this implement/fix? Explain your changes.

This PR adds example links for feature_selection.RFE

Class: feature_selection.RFE

Related Example files:

sphx-glr-auto-examples-feature-selection-plot-rfe-digits-py

Any other comments?

github-actions · 2023-07-31T10:24:53Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: cc0aa51. Link to the linter CI: here}

adrinjalali · 2023-07-31T11:50:44Z

Thanks for the PR.

That example is really not in a good shape. It needs to be worked before it really deserves to be linked like this, an example for the improvements are here: #26805

You can do that in the same PR.

…election_RFE

adrinjalali

This looks quite nice now.

adrinjalali · 2023-08-24T16:10:34Z

examples/feature_selection/plot_rfe_digits.py

+# %%
+from sklearn.feature_selection import RFE
+
+# Arbitrarily chosen; can be adjusted based on domain knowledge or iterative testing


it should be set using a GridSearchCV in reality, which you could also do here. You can start by a random number, then optimize using GridSearchCV, but putting it in a pipeline with SVC

adrinjalali · 2023-08-24T16:10:58Z

examples/feature_selection/plot_rfe_digits.py

+X_train_rfe = rfe.transform(X_train)
+X_test_rfe = rfe.transform(X_test)


users should almost always use a pipeline instead of doing this.

…election_RFE

adrinjalali

otherwise LGTM.

cc @scikit-learn/documentation-team maybe?

examples/feature_selection/plot_rfe_digits.py

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

Shreesha3112 · 2023-09-28T14:17:17Z

cc @ArturoAmorQ @lucyleeow

ArturoAmorQ

Hi @Shreesha3112, thanks for your time and effort. Here is a batch of comments that I hope you will find as constructive criticism, which was my intention.

My overall suggestion is to revert all the changes made in this PR except for the introduction paragraph and the narrative on the dataset description. For the latter you can alternatively link to the documentation of said dataset.

ArturoAmorQ · 2023-09-28T14:33:57Z

examples/feature_selection/plot_rfe_digits.py

+# Display the first digit
+plt.imshow(digits.images[0], cmap="gray")
+plt.title(f"Label: {digits.target[0]}")
+plt.axis("off")
+plt.show()


This part is redundant with the Digit Dataset example.

examples/feature_selection/plot_rfe_digits.py

ArturoAmorQ · 2023-09-28T14:43:03Z

examples/feature_selection/plot_rfe_digits.py

+from sklearn.feature_selection import RFE
+from sklearn.model_selection import GridSearchCV
+from sklearn.pipeline import Pipeline
+
+# Define the parameters for the grid search
+param_grid = {"rfe__n_features_to_select": [1, 5, 10, 20, 30, 40, 50, 64]}
+
+# Create a pipeline with feature selection followed by SVM
+pipe = Pipeline(
+    [
+        ("rfe", RFE(estimator=SVC(kernel="linear", C=1))),
+        ("svc", SVC(kernel="linear", C=1)),
+    ]
+)
+
+# Create the grid search object
+grid_search = GridSearchCV(pipe, param_grid, cv=5, scoring="accuracy", n_jobs=-1)
+
+# Fit to the data and get the best estimator
+grid_search.fit(X_train, y_train)
+best_pipeline = grid_search.best_estimator_

-# Plot pixel ranking
+# Extract the optimal number of features from the best estimator
+optimal_num_features = best_pipeline.named_steps["rfe"].n_features_
+
+print(f"Optimal number of features: {optimal_num_features}")


This whole part could be more easily done using the class RFECV, which is an optimized version of a grid search.

ArturoAmorQ · 2023-09-28T14:44:48Z

examples/feature_selection/plot_rfe_digits.py

+# Feature Selection Impact on Model Accuracy
+# ---------------------------------------------------
+#
+# To understand the relationship between the number of features selected and model
+# performance, let's train the :class:`~sklearn.svm.SVC` on various subsets of
+# features ranked by :class:`~sklearn.feature_selection.RFE`. We'll then plot the
+# accuracy of the model as a function of the number of features used. This will help
+# us visualize any trade-offs between feature selection and model accuracy.
+
+# %%
+import numpy as np
+
+# Split the dataset
+X_train, X_test, y_train, y_test = train_test_split(
+    X, y, test_size=0.3, random_state=42
+)
+
+# Train with RFE to get the rankings (as done earlier in the code)
+svc = SVC(kernel="linear", C=1)
+rfe = RFE(estimator=svc, n_features_to_select=1, step=1)
+rfe.fit(X_train, y_train)
+ranking = rfe.ranking_
+
+# Store accuracies
+# Adjust the step for finer granularity
+num_features_list = [1, 5, 10, 20, 30, 40, 50, 64]
+accuracies = []
+
+for num_features in num_features_list:
+    # Select top 'num_features' important features
+    top_features_idx = np.where(ranking <= num_features)[0]
+    X_train_selected = X_train[:, top_features_idx]
+    X_test_selected = X_test[:, top_features_idx]
+
+    # Train SVM and get accuracy
+    svc_selected = SVC(kernel="linear", C=1)
+    svc_selected.fit(X_train_selected, y_train)
+    y_pred = svc_selected.predict(X_test_selected)
+    accuracy = accuracy_score(y_test, y_pred)
+    accuracies.append(accuracy)
+
+# Plot the accuracies
+plt.plot(num_features_list, accuracies, marker="o", linestyle="-")
+plt.xlabel("Number of Selected Features")
+plt.ylabel("Accuracy")
+plt.title("Feature Selection Impact on Model Accuracy")
+plt.grid(True)
+plt.show()


This part is redundant with the RFECV example, where an interpretation and error bars are given.

ArturoAmorQ · 2023-09-28T14:47:34Z

sklearn/feature_selection/_rfe.py

+    For an example on usage, see
+    :ref:`sphx_glr_auto_examples_feature_selection_plot_rfe_digits.py`.


I'm not sure if we always want to link examples from the docstrings (unless really relevant to the introductory paragraph), as they already appear in the lowest part of the page.

I would say that we don't need when we have a single example which is the case here.

ArturoAmorQ · 2023-09-28T15:03:09Z

examples/feature_selection/plot_rfe_digits.py

+# Visualizing Feature Importance after RFE
+# ----------------------------------------
+#
+# :class:`~sklearn.feature_selection.RFE` provides a ranking of the features based on
+# their importance. We can visualize this ranking to gain insights into which pixels
+# (or features) are deemed most significant by :class:`~sklearn.feature_selection.RFE`
+# in the digit classification task.
+
+# %%
+ranking = best_pipeline.named_steps["rfe"].ranking_.reshape(digits.images[0].shape)
 plt.matshow(ranking, cmap=plt.cm.Blues)
 plt.colorbar()
 plt.title("Ranking of pixels with RFE")
 plt.show()


According to the documentation ranking_ is an array where the most important features are assigned rank 1 and the higher ranking the less important. Notice that in the current version of the example you have shades of blue going from 1 to 64 (we have 8 x 8 pixels) whereas your code uses a model which already truncated the feature space to keep only the 5 x 5 most relevant pixels and degenerating the rest to a value of 1. I am afraid this was not the spirit of the example.

examples/feature_selection/plot_rfe_digits.py

Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>

glemaitre

Taking into account the reviews of @ArturoAmorQ, I think that we could improve this PR by making an example that present both RFE and RFECV.

In this case, we could also remove the other RFECV example since it will be redundant and make the redirection toward this newly improved example.

@Shreesha3112 Would you be able to address these issues?

glemaitre · 2023-11-03T09:15:03Z

I'm going to tag this PR as stalled for the time being.

Shreesha3112 · 2023-11-04T09:42:43Z

Taking into account the reviews of @ArturoAmorQ, I think that we could improve this PR by making an example that present both RFE and RFECV.

In this case, we could also remove the other RFECV example since it will be redundant and make the redirection toward this newly improved example.

@Shreesha3112 Would you be able to address these issues?

I won't be able to contribute for the next 2-3 weeks. If anyone else wants to pick it up can go ahead.

raj-pulapakura · 2023-12-13T03:04:20Z

Hi, I would love to work on this. Could someone please assign me?

raj-pulapakura · 2023-12-19T21:15:26Z

Hi @adrinjalali and @glemaitre . Sorry to bother you, but is there any way I can help with this PR?

adrinjalali · 2024-01-04T10:33:02Z

@raj-pulapakura you can open a new PR and continue the work from this PR. You can base your work on this PR's branch to keep the history.

raj-pulapakura · 2024-01-04T12:36:41Z

On it!

sagnik-t · 2024-02-24T07:37:54Z

Hi! I'm new to this repo(and open-source in general). Could someone help me get started?

muchemicarol · 2024-03-10T21:29:15Z

Hi! I'm new to this repo(and open-source in general). Could someone help me get started?

Hey Sagnik,
Good start! Not sure if you received support for this, but here goes..

I've been doing some research on the same and here are some tips I found for an open-source beginner:

Look through issues that are labelled beginner friendly/good first issues or you could pick an issue that you feel you can work on.
Join the community, it's on the README file on the home page of repo.
Follow the contribution guidelines which should include steps on how to fork the repo, push your code and contribution ethics.
Pace yourself.

Please let me know if this helps and how it'll go!

plon-Susk7 · 2024-04-18T11:10:47Z

@adrinjalali , can I work on this issue? I am new to contributing to this repo and open source projects in general. How should I go about resolving this issue?

adrinjalali · 2024-04-19T10:36:18Z

@plon-Susk7 yeah go for it.

ArturoAmorQ · 2024-04-22T13:32:41Z

Closing this PR as superseded by #28862.

add example links for feature_selection.RFE

dd22cce

github-actions bot added module:feature_selection Documentation labels Jul 31, 2023

Shreesha3112 mentioned this pull request Jul 31, 2023

Add links to examples from the docstrings and user guides #26927

Closed

shreesha3112 added 3 commits August 10, 2023 15:26

Merge remote-tracking branch 'upstream/main' into add_links_feature_s…

5b53234

…election_RFE

Merge remote-tracking branch 'upstream/main' into add_links_feature_s…

66e6ab3

…election_RFE

plot_rfe_digits notebok style doc and enhanced descriptions

39d6e54

Shreesha3112 changed the title ~~DOC Add example links for feature_selection.RFE~~ DOC Notebook style and enhanced descriptions and add example links for feature_selection.RFE Aug 15, 2023

adrinjalali reviewed Aug 24, 2023

View reviewed changes

shreesha3112 added 3 commits August 30, 2023 18:35

Merge remote-tracking branch 'upstream/main' into add_links_feature_s…

5141941

…election_RFE

added pipeline to example

bb70632

Merge remote-tracking branch 'upstream/main' into add_links_feature_s…

9f2afbb

…election_RFE

adrinjalali reviewed Sep 28, 2023

View reviewed changes

examples/feature_selection/plot_rfe_digits.py Outdated Show resolved Hide resolved

Update examples/feature_selection/plot_rfe_digits.py

da2359c

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

Merge branch 'main' into add_links_feature_selection_RFE

41328c7

ArturoAmorQ reviewed Sep 28, 2023

View reviewed changes

Apply suggestions from code review

5ab2392

Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>

glemaitre self-requested a review November 2, 2023 20:45

Merge branch 'main' into add_links_feature_selection_RFE

cc0aa51

glemaitre reviewed Nov 3, 2023

View reviewed changes

glemaitre added the Stalled label Nov 3, 2023

adrinjalali added help wanted good first issue Easy with clear instructions to resolve labels Dec 5, 2023

raj-pulapakura mentioned this pull request Jan 5, 2024

DOC Combined examples for feature_selection.RFE and feature_selection.RFECV #28065

Closed

glemaitre requested review from glemaitre and removed request for glemaitre January 9, 2024 15:10

plon-Susk7 mentioned this pull request Apr 19, 2024

DOC Enhanced example visualization to RFE #28862

Merged

ArturoAmorQ closed this Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC Notebook style and enhanced descriptions and add example links for feature_selection.RFE #26950

DOC Notebook style and enhanced descriptions and add example links for feature_selection.RFE #26950

Shreesha3112 commented Jul 31, 2023 •

edited

Loading

github-actions bot commented Jul 31, 2023 •

edited

Loading

adrinjalali commented Jul 31, 2023

adrinjalali left a comment

adrinjalali Aug 24, 2023

adrinjalali Aug 24, 2023

adrinjalali left a comment

Shreesha3112 commented Sep 28, 2023

ArturoAmorQ left a comment

ArturoAmorQ Sep 28, 2023

ArturoAmorQ Sep 28, 2023

ArturoAmorQ Sep 28, 2023 •

edited

Loading

ArturoAmorQ Sep 28, 2023

glemaitre Nov 2, 2023

ArturoAmorQ Sep 28, 2023

glemaitre left a comment

glemaitre commented Nov 3, 2023

Shreesha3112 commented Nov 4, 2023

raj-pulapakura commented Dec 13, 2023 •

edited

Loading

raj-pulapakura commented Dec 19, 2023 •

edited

Loading

adrinjalali commented Jan 4, 2024

raj-pulapakura commented Jan 4, 2024

sagnik-t commented Feb 24, 2024

muchemicarol commented Mar 10, 2024

plon-Susk7 commented Apr 18, 2024

adrinjalali commented Apr 19, 2024

ArturoAmorQ commented Apr 22, 2024

		X_train_rfe = rfe.transform(X_train)
		X_test_rfe = rfe.transform(X_test)

		For an example on usage, see
		:ref:`sphx_glr_auto_examples_feature_selection_plot_rfe_digits.py`.

DOC Notebook style and enhanced descriptions and add example links for feature_selection.RFE #26950

DOC Notebook style and enhanced descriptions and add example links for feature_selection.RFE #26950

Conversation

Shreesha3112 commented Jul 31, 2023 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Jul 31, 2023 • edited Loading

✔️ Linting Passed

adrinjalali commented Jul 31, 2023

adrinjalali left a comment

Choose a reason for hiding this comment

adrinjalali Aug 24, 2023

Choose a reason for hiding this comment

adrinjalali Aug 24, 2023

Choose a reason for hiding this comment

adrinjalali left a comment

Choose a reason for hiding this comment

Shreesha3112 commented Sep 28, 2023

ArturoAmorQ left a comment

Choose a reason for hiding this comment

ArturoAmorQ Sep 28, 2023

Choose a reason for hiding this comment

ArturoAmorQ Sep 28, 2023

Choose a reason for hiding this comment

ArturoAmorQ Sep 28, 2023 • edited Loading

Choose a reason for hiding this comment

ArturoAmorQ Sep 28, 2023

Choose a reason for hiding this comment

glemaitre Nov 2, 2023

Choose a reason for hiding this comment

ArturoAmorQ Sep 28, 2023

Choose a reason for hiding this comment

glemaitre left a comment

Choose a reason for hiding this comment

glemaitre commented Nov 3, 2023

Shreesha3112 commented Nov 4, 2023

raj-pulapakura commented Dec 13, 2023 • edited Loading

raj-pulapakura commented Dec 19, 2023 • edited Loading

adrinjalali commented Jan 4, 2024

raj-pulapakura commented Jan 4, 2024

sagnik-t commented Feb 24, 2024

muchemicarol commented Mar 10, 2024

plon-Susk7 commented Apr 18, 2024

adrinjalali commented Apr 19, 2024

ArturoAmorQ commented Apr 22, 2024

Shreesha3112 commented Jul 31, 2023 •

edited

Loading

github-actions bot commented Jul 31, 2023 •

edited

Loading

ArturoAmorQ Sep 28, 2023 •

edited

Loading

raj-pulapakura commented Dec 13, 2023 •

edited

Loading

raj-pulapakura commented Dec 19, 2023 •

edited

Loading