Skip to content

We don't support func(estimator, X, y, ...) across the board as a scorer #31889

@adrinjalali

Description

@adrinjalali

Our documentation here states a callable with a (estimator, X, y) is a valid scorer. However, it isn't.

In #31599, it is observed that passing such an object fails in the context of a _MultimetricScorer.

While working on other metadata routing issues, I found that TunedThresholdClassifierCV also fails with such an object, since it creates a _CurveScorer which ignores the object and expects to just use the _score_func of a given scorer object.

Consider the following script:

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import TunedThresholdClassifierCV, cross_val_score
from sklearn.datasets import make_classification
from sklearn.metrics._scorer import _Scorer, mean_squared_error, make_scorer


class MyScorer(_Scorer):
    def _score(self, *args, **kwargs):
        print("I'm logging stuff")
        return super()._score(*args, **kwargs)

def my_scorer(estimator, X, y, **kwargs):
    print("I'm logging stuff in my_scorer")
    return mean_squared_error(estimator.predict(X), y, **kwargs)

def my_metric(y_pred, y_true, **kwargs):
    print("I'm logging stuff in my_metric")
    return mean_squared_error(y_pred, y_true, **kwargs)

my_second_scorer = make_scorer(my_metric)

X, y = make_classification()

# this prints logs
print("cross_val_score'ing")
cross_val_score(
    LogisticRegression(),
    X,
    y,
    scoring=MyScorer(mean_squared_error, sign=1, kwargs={}, response_method="predict"),
)

print("1. TunedThresholdClassifierCV'ing")
model = TunedThresholdClassifierCV(
    LogisticRegression(),
    # scoring=MyScorer(mean_squared_error, sign=1, kwargs={}, response_method="predict"),
    # scoring=my_scorer,
    scoring=my_second_scorer,
)
model.fit(X, y)

print("2. TunedThresholdClassifierCV'ing")
model = TunedThresholdClassifierCV(
    LogisticRegression(),
    scoring=MyScorer(mean_squared_error, sign=1, kwargs={}, response_method="predict"),
)
model.fit(X, y)

print("3. TunedThresholdClassifierCV'ing")
model = TunedThresholdClassifierCV(
    LogisticRegression(),
    scoring=my_scorer,
)
model.fit(X, y)

It includes 3 different ways of creating a scorer for TunedThresholdClassifierCV. The first one works as expected, the second one ignores the implemented _score function, and the third one raises an error:

cross_val_score'ing
I'm logging stuff
I'm logging stuff
I'm logging stuff
I'm logging stuff
I'm logging stuff
1. TunedThresholdClassifierCV'ing
I'm logging stuff in my_metric
(... repeated many more times ...)
2. TunedThresholdClassifierCV'ing
3. TunedThresholdClassifierCV'ing
Traceback (most recent call last):
  File "/tmp/1.py", line 54, in <module>
    model.fit(X, y)
  File "/path/to/scikit-learn/sklearn/base.py", line 1366, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/scikit-learn/sklearn/model_selection/_classification_threshold.py", line 129, in fit
    self._fit(X, y, **params)
  File "/path/to/scikit-learn/sklearn/model_selection/_classification_threshold.py", line 743, in _fit
    self._curve_scorer = self._get_curve_scorer()
                         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/scikit-learn/sklearn/model_selection/_classification_threshold.py", line 880, in _get_curve_scorer
    curve_scorer = _CurveScorer.from_scorer(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/scikit-learn/sklearn/metrics/_scorer.py", line 1141, in from_scorer
    score_func=scorer._score_func,
               ^^^^^^^^^^^^^^^^^^
AttributeError: 'function' object has no attribute '_score_func'
(sklearn) 

One could argue that MyScorer(mean_squared_error, sign=1, kwargs={}, response_method="predict") is undocumented since it inherits from a private class, however, we have people using it, and have #31540 where making these public is discussed and seems we're on board with it.

This makes me think we really don't support a custom callable as a scorer, and that it should be removed from our documentation. The only way where a scorer works across the board, is when all custom work is happening inside the _score_func, which is what we assume in the codebase as it is right now.

cc @glemaitre @MarcBresson @lesteve

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions