add `feature_indices` param to `permutation_importance()` #30005

andrewm4894 · 2024-10-03T19:16:59Z

Reference Issues/PRs

Add option to inspection.permutation_importance to only compute importance for a subset of features #18694

What does this implement/fix? Explain your changes.

add feature_indices param to permutation_importance() to enable a divide and conquer approach when you have 1000's of features. Idea being to pass in a subset of features you want to calculate feature importance for. For example you might want to randomly cover different subsets using feature_indices and do it multiple times in seperate work to cover all features in seperate permutation_importance() calls.

Any other comments?

My use case here is that i have some feature importance tasks running over a model (dnn) with 1000's of features and i am finding that as a bottle neck in the process since each time permutation_importance() needs to process all features. I would like to horizontally scale a bit more by having each permutation importance task randomly work on say 100's features and then i will combine all the results back together and handle and account for how many times we sampled each feature col etc.

Just wanted to make a quick small illustrative PR to show what i mean and see if maintainers think this is a reasonable or feasible approach/idea. Was thinking about coding this up myself in some way but though it could maybe end up being useful for others so just wanted to see about maybe contributing here.

add `n_features` param to enable a divide and conquer approach when you have 100's of features. Idea being to randomly cover n_features and do it multiple times in seperate work.

github-actions · 2024-10-03T19:18:18Z

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here

`black`

black detected issues. Please run black . locally and push the changes. Here you can see the detected issues. Note that running black might also fix some of the issues which might be detected by ruff. Note that the installed black version is black=24.3.0.


--- /home/runner/work/scikit-learn/scikit-learn/sklearn/inspection/tests/test_permutation_importance.py	2025-04-10 11:09:15.474510+00:00
+++ /home/runner/work/scikit-learn/scikit-learn/sklearn/inspection/tests/test_permutation_importance.py	2025-04-10 11:09:33.370666+00:00
@@ -555,11 +555,11 @@
         y,
         n_repeats=n_repeats_test,
         scoring="neg_mean_squared_error",
         feature_indices=feature_indices_test,
     )
-    assert results.importances.shape == (len(feature_indices_test),n_repeats_test)
+    assert results.importances.shape == (len(feature_indices_test), n_repeats_test)
 
 
 def test_permutation_importance_feature_indices_out_of_range_error():
     """Check you get error when feature_indices are out of range."""
     n_features = 10
would reformat /home/runner/work/scikit-learn/scikit-learn/sklearn/inspection/tests/test_permutation_importance.py

Oh no! 💥 💔 💥
1 file would be reformatted, 918 files would be left unchanged.

_{Generated for commit: fc98ab6. Link to the linter CI: here}

andrewm4894 · 2024-10-03T23:10:19Z

hmm not sure how to fix that failing test:

FAILED tests/test_public_functions.py::test_function_param_validation[sklearn.inspection.permutation_importance] - AssertionError: Mismatch between _parameter_constraints and the parameters

glemaitre · 2024-10-08T17:09:11Z

@andrewm4894 Actually, we had a similar feature request in the past: #18694

In general, I think that the scope of #18694 is better defined for the tabular learning use case but using the propose API there, you could request N random features and make a single permutation and repeat the process.

You will need to combine the results but this is already something that you were planning.

It would be great that you have a look at the propose API in the issue.

andrewm4894 · 2024-10-08T18:22:10Z

@andrewm4894 Actually, we had a similar feature request in the past: #18694

@glemaitre oh interesting, I had thought about similar but felt this change a bit smaller and simple in some sense so maybe had more of a chance.

But agree, being able to pass on a list of features as a subset is probably the most general case and would serve what I need here too.

andrewm4894 · 2024-10-08T18:28:47Z

Although @glemaitre this is a small and simple pr here while you linked to an issue but not sure if there is a draft pr anywhere?

(I will try think what that might look like but not sure how complex/big or not it might be)

Eg maybe a trade off between small incremental change vs a bigger one that maybe is more flexible longer term. Let's maybe see what maintainers think a potential way forward might be (I'm new here, first pr to the repo, so will defer to others).

glemaitre · 2024-10-08T20:23:57Z

not sure if there is a draft pr anywhere?

Nop I don't think that we got any PR on the subject.

Eg maybe a trade off between small incremental change vs a bigger one that maybe is more flexible longer term. Let's maybe see what maintainers think a potential way forward might be (I'm new here, first pr to the repo, so will defer to others).

Basically the API in the issue is much better suited than going on the n_features line because we cannot use the code as-is.

andrewm4894 · 2024-10-10T11:23:37Z

@glemaitre 👋 i made some changes here to switch to feature_indices approach that i think matches sort of api desired in #18694

this way user can just pass in a random list of feature indecies if they want or a more specific one if they only want to look at a specific feature or set of sub features.

just went with feature_indices name but maybe there is a better more conventional name for similar params elsewhere perhaps?

andrewm4894 · 2024-10-23T07:07:04Z

@glemaitre 👋 do you think the updated approach in this pr to use the more flexible API is a way forward?

I'm a little bit blocked by this in work so keen to try push to see if feasible to get something merged in the next little while or if I'll need to maybe try something else in work as a work around.

andrewm4894 · 2025-01-14T10:23:55Z

@glemaitre any idea if I can find someone to try help me get this reviewed and merged over next few weeks?

andrewm4894 · 2025-04-10T11:09:58Z

@glemaitre any idea if i can try make another run at seeing if feasible to consider getting this merged?

add n_features param to permutation_importance()

218291e

add `n_features` param to enable a divide and conquer approach when you have 100's of features. Idea being to randomly cover n_features and do it multiple times in seperate work.

github-actions bot added the module:inspection label Oct 3, 2024

andrewm4894 added 2 commits October 3, 2024 19:47

fix pre-commit

bdfae7d

fix docstring

6259931

andrewm4894 marked this pull request as ready for review October 3, 2024 19:53

add test

9333a31

andrewm4894 mentioned this pull request Oct 8, 2024

Add option to inspection.permutation_importance to only compute importance for a subset of features #18694

Open

Merge branch 'main' into patch-1

9774094

andrewm4894 and others added 2 commits October 10, 2024 03:57

Merge branch 'scikit-learn:main' into patch-1

b07877d

use feature_indices param instead of n_features

84d8adb

andrewm4894 changed the title ~~add n_features param to permutation_importance()~~ add feature_indices param to permutation_importance() Oct 10, 2024

andrewmaguire-stripe and others added 6 commits October 10, 2024 12:27

fix trailing whitespace

f6df3e5

fix lint

5013a13

Merge branch 'main' into patch-1

468b96d

Merge branch 'main' into patch-1

11a3311

Merge branch 'main' into patch-1

6315ec8

Merge branch 'main' into patch-1

9ebe0d7

Merge branch 'main' into patch-1

92b4c8d

Merge branch 'main' into patch-1

fc98ab6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `feature_indices` param to `permutation_importance()` #30005

add `feature_indices` param to `permutation_importance()` #30005

andrewm4894 commented Oct 3, 2024 •

edited

Loading

github-actions bot commented Oct 3, 2024 •

edited

Loading

andrewm4894 commented Oct 3, 2024

glemaitre commented Oct 8, 2024

andrewm4894 commented Oct 8, 2024

andrewm4894 commented Oct 8, 2024 •

edited

Loading

glemaitre commented Oct 8, 2024 •

edited

Loading

andrewm4894 commented Oct 10, 2024 •

edited

Loading

andrewm4894 commented Oct 23, 2024

andrewm4894 commented Jan 14, 2025

andrewm4894 commented Apr 10, 2025

add feature_indices param to permutation_importance() #30005

Are you sure you want to change the base?

add feature_indices param to permutation_importance() #30005

Conversation

andrewm4894 commented Oct 3, 2024 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Oct 3, 2024 • edited Loading

❌ Linting issues

black

andrewm4894 commented Oct 3, 2024

glemaitre commented Oct 8, 2024

andrewm4894 commented Oct 8, 2024

andrewm4894 commented Oct 8, 2024 • edited Loading

glemaitre commented Oct 8, 2024 • edited Loading

andrewm4894 commented Oct 10, 2024 • edited Loading

andrewm4894 commented Oct 23, 2024

andrewm4894 commented Jan 14, 2025

andrewm4894 commented Apr 10, 2025

add `feature_indices` param to `permutation_importance()` #30005

add `feature_indices` param to `permutation_importance()` #30005

andrewm4894 commented Oct 3, 2024 •

edited

Loading

github-actions bot commented Oct 3, 2024 •

edited

Loading

`black`

andrewm4894 commented Oct 8, 2024 •

edited

Loading

glemaitre commented Oct 8, 2024 •

edited

Loading

andrewm4894 commented Oct 10, 2024 •

edited

Loading