Description
Following on from #14593, and an enabler for #12385.
A user is currently able to specify multiple metrics for scoring in cross validation or model selection, by setting scoring
to a dict, mapping names to scoring functions/callables. We also allow for the shorthand of specifying multiple predefined scorers by a list of names.
Following on from #14593, it should be relatively easy to allow the user to instead provide a callable which has the same input parameters as a scorer (estimator, X, y, ...) but which returns a dict mapping names to scores.
This would allow the computation of scores to be more efficient (e.g. computing confusion matrices once then aggregating them in various ways), and also enables us (or third parties) to provide prefabricated scorer collections (per #12385), which would amount to a diagnostic suite for your cross validation performance.
@thomasjpfan had expressed interest in implementing this but I thought I'd make an issue so it doesn't get lost and so that we can let Thomas focus on other things if there is a contributor willing to do it.
Note: care will need to be taken to handle the case that the scorer returns dicts with different keys in different calls. Either this should trigger an error or it should leave some results undefined.