ENH Allow multiple scorers input to permutation_importance #19411

simonamaggio · 2021-02-09T14:45:18Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Input argument scoring can be a list, tuple or dict with multiple metrics.
The permutation feature importance is computed iteratively for the different metrics and
the result is a dict with metric_name:Bunchcontaining the feature importances, mean and std for each of the requested input metrics.

ogrisel · 2021-02-09T14:57:14Z

@simonamaggio told me on a private channel that this is still work in progress: new tests are needed before reviewing.

Also, it would be great to check the computational benefit of this approach on a small benchmark snippet to check that the caching mechanism works as expected.

ogrisel · 2021-02-09T15:02:36Z

Thinking about it the caching will probably not work because of the Parallel call: the multimetric scorer keeps the cached predictions as an attribute on the scorer which is mutated the first time it is called. Here this will happen in isolated processes. I think we need to somehow inverse the two loops but I am not sure how...

Edit: The above is probably wrong as discussed below.

sklearn/inspection/_permutation_importance.py

glemaitre

We will need a test to check that we can provide multiple scorers. We should test all the supported combination and check the output. This would be a matter of parametrization using pytest.

Then, we need to update the User Guide as well to illustrate how to use this feature. I don't think that we have an example (in examples/ folder) where we could use this feature.

Please add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

sklearn/inspection/_permutation_importance.py

doc/whats_new/v1.0.rst

sklearn/inspection/_permutation_importance.py

sklearn/inspection/tests/test_permutation_importance.py

…nces FIX check that the code works with callable returning a dict with multiple metrics

sklearn/inspection/_permutation_importance.py

doc/whats_new/v1.0.rst

doc/modules/permutation_importance.rst

glemaitre

I am happy with the change in the documentation. LGTM

glemaitre · 2021-02-10T16:41:09Z

@thomasjpfan @NicolasHug It could be nice to have your thoughts regarding the way to output the feature importances with multiple metrics. We went for a dict of Bunch but maybe you have some thoughts about it?

sklearn/inspection/_permutation_importance.py

NicolasHug · 2021-02-10T17:07:36Z

We went for a dict of Bunch but maybe you have some thoughts about it?

Sounds like the most sensible way, from a quick glance. What else did you have in mind?

simonamaggio · 2021-02-10T17:29:39Z

We went for a dict of Bunch but maybe you have some thoughts about it?

Sounds like the most sensible way, from a quick glance. What else did you have in mind?

In cross_validate the output is a dict with keys including the name of the specific metric: ret['test_%s' % name] = test_scores_dict[name]. Here it could be similarly 'importance_'%name. But that function returns values, not Bunch objects.

NicolasHug · 2021-02-10T17:37:18Z

Fair point. I would sacrifice a bit of consistency for the sake of usability and keep the current dict of Bunch, which seems much more practical. In cross_validate, getting all the statistics for a given metric is a pain as one has to manipulate strings. Same if you want the test score of all metrics.

thomasjpfan · 2021-02-10T19:18:59Z

I would agree with the dict of bunches. On the topic of Bunch, would Bunch of Bunch be better?

simonamaggio · 2021-02-10T20:34:45Z

I would agree with the dict of bunches. On the topic of Bunch, would Bunch of Bunch be better?

Maybe I'm missing something, but using a Bunch of Bunch it seems tricky to get the keys for the external Bunch, from the strings of the metric names. For instance if I use scoring = ['precision', 'recall'], I'd need to access the internal Bunches with result.precision and result.recall.

ogrisel

Some more comments below:

doc/modules/permutation_importance.rst

sklearn/inspection/_permutation_importance.py

ogrisel · 2021-02-10T20:44:37Z

I would agree with the dict of bunches. On the topic of Bunch, would Bunch of Bunch be better?

Not sure it brings that much benefits as the list of keys is not static, so attribute based look-ups is not necessarily that natural. But not a strong opinion.

Maybe I'm missing something, but using a Bunch of Bunch it seems tricky to get the keys for the external Bunch, from the strings of the metric names. For instance if I use scoring = ['precision', 'recall'], I'd need to access the internal Bunches with result.precision and result.recall.

You can also access those results using the usual dict API as Bunch is a subclass of dict.

doc/modules/permutation_importance.rst

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

glemaitre · 2021-02-11T08:52:54Z

All CIs are green. Merging. Thank you @simonamaggio

ogrisel · 2021-02-11T12:02:36Z

~~@glemaitre @simonamaggio we forgot to document the change in doc/whats_new/v1.0.rst. @simonamaggio would you mind opening a new PR for this?~~

Scratch that, we did. My ctrl-f did not work as intended...

glemaitre · 2021-02-11T13:32:15Z

Scratch that, we did. My ctrl-f did not work as intended...

Hehehe. This could have happen thought, it is quite common that I forgot it during the review :)

ogrisel · 2021-02-12T08:47:15Z

I understand why my ctrl f was not working anymore in firefox: I had the match "Whole Words" option enabled without realizing it...

flake8 adjust

e142715

github-actions bot added the module:inspection label Feb 9, 2021

bux fix for loop

93908d7

glemaitre reviewed Feb 9, 2021

View reviewed changes

sklearn/inspection/_permutation_importance.py Outdated Show resolved Hide resolved

glemaitre reviewed Feb 9, 2021

View reviewed changes

sklearn/inspection/_permutation_importance.py Outdated Show resolved Hide resolved

using _MultimetricScorer

71a0134

glemaitre reviewed Feb 10, 2021

View reviewed changes

simonamaggio added 2 commits February 10, 2021 10:50

add pr in whatsnew

1bff186

change docstring

fd6746c

glemaitre changed the title ~~Allow multiple scorers input to permutation_importance~~ ENH Allow multiple scorers input to permutation_importance Feb 10, 2021

simonamaggio added 3 commits February 10, 2021 11:11

if else only to build scorers and return output

6624eae

improve docstring

097a0fa

fix linting

9d008a6

glemaitre reviewed Feb 10, 2021

View reviewed changes

merge with main + modify whatsnew

7ffdc89

ogrisel reviewed Feb 10, 2021

View reviewed changes

sklearn/inspection/_permutation_importance.py Show resolved Hide resolved

simonamaggio added 5 commits February 10, 2021 12:13

bugfix + change return dict

b648b3a

improve _create_importances_bunch

947c4d2

docstring line to explain why multiscoring is more efficient

4123ace

add consistency test

b3d9d08

default params in assert_allclose

9928793

glemaitre self-requested a review February 10, 2021 13:01

glemaitre reviewed Feb 10, 2021

View reviewed changes

sklearn/inspection/tests/test_permutation_importance.py Outdated Show resolved Hide resolved

glemaitre reviewed Feb 10, 2021

View reviewed changes

sklearn/inspection/tests/test_permutation_importance.py Outdated Show resolved Hide resolved

glemaitre and others added 2 commits February 10, 2021 15:01

reuse some helpers

c207e03

Merge pull request #1 from glemaitre/multi_metric_permutation_importa…

72dd7dd

…nces FIX check that the code works with callable returning a dict with multiple metrics

glemaitre reviewed Feb 10, 2021

View reviewed changes

sklearn/inspection/_permutation_importance.py Outdated Show resolved Hide resolved

glemaitre reviewed Feb 10, 2021

View reviewed changes

doc/whats_new/v1.0.rst Outdated Show resolved Hide resolved

simonamaggio added 3 commits February 10, 2021 15:16

callable input separate

7423dd5

rm extra dots

c4ba961

update doc

3e13d9f

glemaitre reviewed Feb 10, 2021

View reviewed changes

doc/modules/permutation_importance.rst Outdated Show resolved Hide resolved

glemaitre approved these changes Feb 10, 2021

View reviewed changes

call func on 2 lines

699e7fc

NicolasHug reviewed Feb 10, 2021

View reviewed changes

sklearn/inspection/_permutation_importance.py Outdated Show resolved Hide resolved

better docstring explaining result in case of multimetric

8fab3dc

ogrisel reviewed Feb 10, 2021

View reviewed changes

simonamaggio added 2 commits February 10, 2021 22:07

change doc according to review

aa3bd8c

docstring fix wording

9a997ce

ogrisel reviewed Feb 11, 2021

View reviewed changes

doc/modules/permutation_importance.rst Outdated Show resolved Hide resolved

ogrisel approved these changes Feb 11, 2021

View reviewed changes

move multimetric explanation after the example

4e15812

ogrisel reviewed Feb 11, 2021

View reviewed changes

doc/modules/permutation_importance.rst Outdated Show resolved Hide resolved

Update doc/modules/permutation_importance.rst

e497efc

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

glemaitre merged commit f58d1eb into scikit-learn:main Feb 11, 2021

ogrisel mentioned this pull request Feb 12, 2021

Investigate and document pitfalls with permutation importance on imbalanced classification problems with non-independent features #19448

Open

glemaitre mentioned this pull request Apr 22, 2021

Release 0.24.2 #19954

Merged

12 tasks

Uh oh!

ENH Allow multiple scorers input to permutation_importance #19411

ENH Allow multiple scorers input to permutation_importance #19411

Uh oh!

Conversation

simonamaggio commented Feb 9, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

ogrisel commented Feb 9, 2021

Uh oh!

ogrisel commented Feb 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Feb 10, 2021

Uh oh!

Uh oh!

NicolasHug commented Feb 10, 2021

Uh oh!

simonamaggio commented Feb 10, 2021

Uh oh!

NicolasHug commented Feb 10, 2021

Uh oh!

thomasjpfan commented Feb 10, 2021

Uh oh!

simonamaggio commented Feb 10, 2021

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Feb 10, 2021

Uh oh!

Uh oh!

Uh oh!

glemaitre commented Feb 11, 2021

Uh oh!

ogrisel commented Feb 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Feb 11, 2021

Uh oh!

ogrisel commented Feb 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Feb 9, 2021 •

edited

Loading

ogrisel commented Feb 11, 2021 •

edited

Loading

ogrisel commented Feb 12, 2021 •

edited

Loading