Skip to content

[API] A public API for creating and using multiple scorers in the sklearn-ecosystem #28299

Closed
@eddiebergman

Description

@eddiebergman

Describe the workflow you want to enable

I would like a public stable interface for multiple scorers that can be developed against for the sklearn eco-system.

Without this, it makes it difficult for libraries to provide any consistent API for dealing with evaluation with multiple scorers unless they:

  1. Rely exclusively on cross_validate for evaluation as its the only place user input from multiple metrics can be funneled directly through to sklearn for evaluation.
  2. Implement custom wrapper types.
  3. Refuse to support multiple metrics.

Why developers may prefer an externally sklearn supported multi-metric API:

  1. Custom evaluation protocols can be developed that evaluate multiple objectives and benefit from sklearn's correctness (i.e. caching, metadata and response values).
  2. Custom multi-scoring wrappers do not have to version against the verison of sklearn installed. (See alternatives considered)
  3. Users can rely more on the same interface in sklearn-ecosystem of compliant libraries.

Context for suggestion:

In re-developing Auto-Sklearn, we perform Hyperparameter Optimization, which can include evaluating many metrics. We require custom evaluation protocols not trivially satisfied by cross_validate or the related family of provided sklearn functions. Previously, AutoSklearn would implement it's own metrics, however we'd like to extend this to any sklearn compliant scorer. Using a _MultiMetricScorer is ideal for their caching and handling of model response values to fit the scorer. Ideally we could also access this cache but that is a secondary concern for now.

I had previous solutions which emulated _MultiMetricScorer but they broke with sklearn 1.3 and 1.4 due to changes in scorers. I'm unsure how to reliably build a stable API against sklearn for multiple metrics.

An example use case where a user may want to evaluate against

# Custom evaluation class the depends on sklearn API
# Does not need to know anything ahead of time about the scorers 
class CustomEvaluator:
    def __init__(..., scoring: dict[str, str | _Scorer]):
        self.scoring = {name: get_scorer(v) if isinstance(v, str) else v}
        
    def evaluate(self, pipeline_configuration):
        model = something(pipeline_configuration)

		# MAIN API REQUEST
        scorers = public_sklearn_api_to_make_multimetric_scorer(self.scoring)
        scores = scorers(model, X, Y)
        ...

# Custom evaluation metric
def user_custom_metric(y_pred, y_true) -> float:
    ....
    
# Userland, can rely on libraries to accept the following interface for providing multiple scorers
scorers = {
  	"acc": "accuracy",
  	"custom": make_scorer(
  	    user_custom_metric,
  	    response_method="predict",
  	    greater_is_better=False
  	)    
  }
custom_evaluator = CustomEvaluator(scorers)

)

Describe your proposed solution

My proposed solution would involve making some variant of _MultiMetricScorer public API. Perhaps this could be made accessible through a non-backwards breaking change to get_scorer

# Before
def get_scorer(scoring: str) -> _Scorer: ...

# After
@overload
def get_scorer(scoring: str) -> _Scorer: ...

@overload
def get_scorer(scoring: Iterable[str]) -> MultiMetricScorer: ...

def get_scorer(scoring: str | Iterable[str]) -> _Scorer | MultiMetricScorer: ...

This would allow a user to pass in a MultiMetricScorer which I can act upon, or at the very least, a list[str] I can reliably convert to one.

# Example
match scorer:
    case MultiMetricScorer():
        scores: dict[str, float] = scorer(estimator, X, y)
    case list():
    	scorers = get_scorer(scorer)
    	scores = scorer(estimator, X, y)
    case _:
    	score: float = scorer(estimator, X, y)

This might cause inconsistency issues internally with sklearn which could be problematic. One additional change that might be required would be to add a new non-backwards breaking default to check_scoring(..., *, allow_multi_scoring: bool = False).


** Issues with this proposal **

  • There is no public Scorer class API, perhaps this suggestion make no sense without a public Scorer API. However I think that even if the _MultiMetric class were to remain hidden but there is a publicly advertised method to construct one that had reliable usage semantics, then both classes can remain hidden.

Describe alternatives you've considered, if relevant

This easiest solution in most cases is rely on the private _check_multimetric_scoring and just instantiating a _MultiMetricScorer, relying on private functionality.

Previous solutions relied using the private _MultiMetricScorer and family of _BaseScorer and it's previous sub-families.

Understandably, these private classes are subject to change and broke with 1.3 changes to metadata routing and 1.4 with changes to the _Scorer hierarchy.

I will rely on private functionality if I have to but it makes developing a library against sklearn quite difficult due to versioning.

If this will not be supported, I will likely go with some wrapper class that is dependent upon the version of scikit-learn in use.

Additional context

Currently, the only way to use multiple scorers for a model is through the interface to cross_validate(scoring=["a", "b", "c"]) or to permutation_importance:

Further Comments

Having access to the transformed cached predictions post scoring would be useful as well but I think that lies outside the scope for now.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions