Skip to content

[MRG] Allow the scorer callable to return any object #7424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

[MRG] Allow the scorer callable to return any object #7424

wants to merge 1 commit into from

Conversation

acu192
Copy link

@acu192 acu192 commented Sep 14, 2016

This _score function is (indirectly) a helper only for the cross_val_score function. I posit that we should allow the scorer argument (a callable) to return any object for these reasons:

  1. It is safe to do so. The objects returned by the scorer callable are only collected into an ndarray. Easy squeezy.
  2. When a user-defined scorer is passed-in, this allows for very flexible behavior. E.g. I have a scorer which returns a tuple of several metrics I want to track. This change allows me to actually use my scorer! I've tested it and it works great!

Please consider. Thank you all for such a great library.

This `_score` function is (indirectly) a helper only for the `cross_val_score` function. I posit that we should allow the `scorer` argument (a callable) to return any object for these reasons:

1. It is safe to do so. The objects returned by the `scorer` callable are only collected into an ndarray. Easy squeezy.
2. When a user-defined `scorer` is passed-in, this allows for very flexible behavior. E.g. I have a scorer which returns a tuple of several metrics I want to track. This change allows me to actually use my scorer! I've tested it and it works great!

Please consider. Thank you all for such a great library.
@amueller
Copy link
Member

GridSearchCV and RandomizedSearchCV compare the scores to select the best estimator.

@acu192
Copy link
Author

acu192 commented Sep 14, 2016

@amueller Good point. I would argue that this change is still okay, because:

  1. This change only affects user-defined scorer arguments.

  2. The user might want their user-defined scorer to return non-numbers.Number objects even when using GridSearchCV or RandomizedSearchCV. Looking at BaseSearchCV._fit, as long as the returned objects support the *=, +=, /=, and < operators, it will all still work. Yay duck typing. (Granted, this scenario seems unlikely, but see my next point.)

  3. If the user's scorer returns weird objects that don't work with GridSearchCV and RandomizedSearchCV, an error will still be thrown. E.g. In my tuple example, I get an error like:

    TypeError: unsupported operand type(s) for +=: 'int' and 'tuple'

@amueller
Copy link
Member

@acu192 I think this was put there particularly to throw a better error than "TypeError: unsupported operand types", probably because someone spend hours of debugging on it and then complained in the issue tracker ;)
We are actually working on scorers that return more than just a single float, and using multiple scorers, see #7388. The fix will be a bit more complicated than this, though. In particular, we want grid-search on more complex metrics.

@jnothman
Copy link
Member

Back in the days before scorer objects (pre scikit-learn 0.13, iirc), this kind of thing worked. We've been trying to fix it ever since. We should have support for, say, tuples of arrays of floats in 0.19 if everything works out correctly. This is, unfortunately, the wrong fix.

@jnothman jnothman closed this Sep 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants