Skip to content

ENH Allow multiple scorers input to permutation_importance #19411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

simonamaggio
Copy link
Contributor

Reference Issues/PRs

Fixes #18701

What does this implement/fix? Explain your changes.

Input argument scoring can be a list, tuple or dict with multiple metrics.
The permutation feature importance is computed iteratively for the different metrics and
the result is a dict with metric_name:Bunchcontaining the feature importances, mean and std for each of the requested input metrics.

@ogrisel
Copy link
Member

ogrisel commented Feb 9, 2021

@simonamaggio told me on a private channel that this is still work in progress: new tests are needed before reviewing.

Also, it would be great to check the computational benefit of this approach on a small benchmark snippet to check that the caching mechanism works as expected.

@ogrisel
Copy link
Member

ogrisel commented Feb 9, 2021

Thinking about it the caching will probably not work because of the Parallel call: the multimetric scorer keeps the cached predictions as an attribute on the scorer which is mutated the first time it is called. Here this will happen in isolated processes. I think we need to somehow inverse the two loops but I am not sure how...

Edit: The above is probably wrong as discussed below.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need a test to check that we can provide multiple scorers. We should test all the supported combination and check the output. This would be a matter of parametrization using pytest.

Then, we need to update the User Guide as well to illustrate how to use this feature. I don't think that we have an example (in examples/ folder) where we could use this feature.

Please add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

@glemaitre glemaitre changed the title Allow multiple scorers input to permutation_importance ENH Allow multiple scorers input to permutation_importance Feb 10, 2021
@glemaitre glemaitre self-requested a review February 10, 2021 13:01
glemaitre and others added 2 commits February 10, 2021 15:01
…nces

FIX check that the code works with callable returning a dict with multiple metrics
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with the change in the documentation. LGTM

@glemaitre
Copy link
Member

@thomasjpfan @NicolasHug It could be nice to have your thoughts regarding the way to output the feature importances with multiple metrics. We went for a dict of Bunch but maybe you have some thoughts about it?

@NicolasHug
Copy link
Member

We went for a dict of Bunch but maybe you have some thoughts about it?

Sounds like the most sensible way, from a quick glance. What else did you have in mind?

@simonamaggio
Copy link
Contributor Author

We went for a dict of Bunch but maybe you have some thoughts about it?

Sounds like the most sensible way, from a quick glance. What else did you have in mind?

In cross_validate the output is a dict with keys including the name of the specific metric: ret['test_%s' % name] = test_scores_dict[name]. Here it could be similarly 'importance_'%name. But that function returns values, not Bunch objects.

@NicolasHug
Copy link
Member

Fair point. I would sacrifice a bit of consistency for the sake of usability and keep the current dict of Bunch, which seems much more practical. In cross_validate, getting all the statistics for a given metric is a pain as one has to manipulate strings. Same if you want the test score of all metrics.

@thomasjpfan
Copy link
Member

I would agree with the dict of bunches. On the topic of Bunch, would Bunch of Bunch be better?

@simonamaggio
Copy link
Contributor Author

I would agree with the dict of bunches. On the topic of Bunch, would Bunch of Bunch be better?

Maybe I'm missing something, but using a Bunch of Bunch it seems tricky to get the keys for the external Bunch, from the strings of the metric names. For instance if I use scoring = ['precision', 'recall'], I'd need to access the internal Bunches with result.precision and result.recall.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments below:

@ogrisel
Copy link
Member

ogrisel commented Feb 10, 2021

I would agree with the dict of bunches. On the topic of Bunch, would Bunch of Bunch be better?

Not sure it brings that much benefits as the list of keys is not static, so attribute based look-ups is not necessarily that natural. But not a strong opinion.

Maybe I'm missing something, but using a Bunch of Bunch it seems tricky to get the keys for the external Bunch, from the strings of the metric names. For instance if I use scoring = ['precision', 'recall'], I'd need to access the internal Bunches with result.precision and result.recall.

You can also access those results using the usual dict API as Bunch is a subclass of dict.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@glemaitre glemaitre merged commit f58d1eb into scikit-learn:main Feb 11, 2021
@glemaitre
Copy link
Member

All CIs are green. Merging. Thank you @simonamaggio

@ogrisel
Copy link
Member

ogrisel commented Feb 11, 2021

@glemaitre @simonamaggio we forgot to document the change in doc/whats_new/v1.0.rst. @simonamaggio would you mind opening a new PR for this?

Scratch that, we did. My ctrl-f did not work as intended...

@glemaitre
Copy link
Member

Scratch that, we did. My ctrl-f did not work as intended...

Hehehe. This could have happen thought, it is quite common that I forgot it during the review :)

@ogrisel
Copy link
Member

ogrisel commented Feb 12, 2021

I understand why my ctrl f was not working anymore in firefox: I had the match "Whole Words" option enabled without realizing it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow permutation_importance to accept multiple scorers in scoring
5 participants