-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Cross-validation returning multiple scores #1850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
+1 for addressing the problem. Have to think about your solution if I have time. |
jnothman this is awesome. I feel like 4 might be best from the user's perspective. On Thu, Apr 11, 2013 at 5:55 AM, Andreas Mueller notifications@github.com
|
Thanks for pondering this. This is another reason to pay some attention to this now: before the |
A very good point. I jumped from 0.12 to HEAD so I didn't notice that. Hmm... Yes, rethinking the |
Friday is my last urgent deadline. Hopefully afterwards I can give your suggestions and pull requests some more attention :-/ |
no worries. |
Hmmm given that it's okay to wreak havoc to the current interface, I'm thinking 3 might be the way to go, allowing a scoring function to return either a scalar or a dict/structured array with a special key for the primary score. But I'm not yet certain. I also think we should recognise that |
agreed. also, there is the nasty "greater_is_better" that would be nice to get rid of ;) |
Well, the nasty part about I also have decided a dict or generator of named scores is probably more expressive than we want. We really want something equivalent to a namedtuple, or rather, the Currently, I'm trying to imagine a fairly maximal design for a
Default implementations of most methods are provided in a base class, wrapping existing metrics, arbitrary score_funcs and I tend to agree with those in the ML (@ogrisel in particular) who saw unnecessary duplication in having A couple of small other notes: When we think of applying this to training scores as well, we note that any output that doesn't depend on
|
I've just realised a fault of not singling out the objective score in the method proposed in my last comment: a single objective is also needed for non-predetermined parameter searches e.g using scipy, hyperopt. Maybe instead of |
A little messy, but here's the gist of that latest proposal (run for sample output): https://gist.github.com/jnothman/5396078 |
I would prefer a dict, since you keep more meaningful annotated number. And later on, you can easily postprocess the dict Furthermore, It would be coherent with your gist. You pass dicts of metrics.
numpy uses the |
In general, though, even though scipy documentation names return values, On Thu, Apr 25, 2013 at 9:40 PM, Arnaud Joly notifications@github.comwrote:
|
In the Python standard librairy, more and more functions return |
Coming to this from the angle of #2013, I must say I also strongly dislike the
I'll submit a PR with the main ideas. |
I take all the blame for the |
Because then you have a single public class |
Ok, so you mean providing backward-compatibility is a problem? We don't need to do that, right? |
Why not? This is a public API! |
But not included in any release. |
Oh, sure, I meant: if we do release it, we're stuck with it. Anyway, see #2123. |
@jnothman did you tag this? I'm not sure we can come up with a good solution within the next two days. First, we should review what was the interface at the last release and what we already changed. |
I don't think I did. |
To be able to do this refactoring in 0.15 and have time to discuss it, I have issued a pull request to rename I will retag this PR to 0.15 now. |
… is actually a number, not an array (otherwise cross_val_score returns bogus).
Any chance of this getting brought back up? It is a feature I would definitely enjoy. |
Me too... in fact, it was more or less the first thing that brought me to On Thu, Nov 28, 2013 at 1:10 PM, Zach Dwiel notifications@github.comwrote:
|
Did this go anywhere? It would be really nice to pass a list of metrics to cross val score and get a list of scores in the same order or a dict with metric names as keys. |
patience ;) This is one of the highest things on my never-ending priority list. But we need to merge the |
@raghavrv this is still open, right? Is there a PR? |
I have a usage case where it would be cool if cross_val_score could return one score per target for a multi-target estimator. (@dengemann EEG channels :D) |
Any progress on this front? I am busy putting an explanation in a docstring for some code telling the reader why I am re-implementing crossvalidation rather than using scikit-learn.
Are we going to send this issue to elementary school? It's going to be 4 years old soon! ;) |
I think we're close, and there's a fair chance you'll see this in 0.19. But
we have no desire to rush into design that then need to be redesigned.
I'm interested in whether the current proposal (#7388), allowing multiple
values for scoring is better than a generic callback to extract diagnostic
info from each fit, or whether we need both...
…On 3 Mar 2017 1:02 am, "RokoMijic" ***@***.***> wrote:
Any progress on this front? I am busy putting an explanation in a
docstring for some code telling the reader why I am re-implementing
crossvalidation rather than using scikit-learn.
jnothman opened this issue on Apr 11, 2013 ·
Are we going to send this issue to elementary school? It's going to be 4
years old soon! ;)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1850 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz61L3w0cwzfoaZEGdfUMnE-AbmDdNks5rhsvjgaJpZM4AkbI5>
.
|
Fixed in #7388 |
Scorer
objects currently provide an interface that returns a scalar score given an estimator and test data. This is necessary for*SearchCV
to calculate a mean score across folds, and determine the best score among parameters.This is very hampering in terms of the diagnostic information available from a cross-fold validation or parameter exploration, which one can see by comparing to the catalogue of
metrics
that includes: precision and recall with F-score; scores for each of multiple classes as well as an aggregate; and error distributions (i.e. PR-curve or confusion matrix). @solomonm (#1837) and I (ML, an implementation within #1768) have independently sought Precision and Recall to be returned from cross-validation routines when F1 is used as the cross-validation objective; @eickenberg on #1381 (comment) posed a concern regarding array of scores corresponding to multiple targets.I thought it deserved an Issue of its own to solidify the argument and its solution.
Some design options:
cross_val_score
or*SearchCV
(henceforthCVEvaluator
), with one specified as the objective. But since theScorer
generally callsestimator.{predict,decision_function,predict_proba}
, each scorer would repeat this work.CVEvaluator
: thescoring
parameter remains as it is and adiagnostics
parameter provides a callable with similar (same?) arguments asScorer
, but returning a dict. This means that the prediction work is repeated but not necessarily as many times as there are metrics. This diagnostics callable is more flexible and perhaps could be passed the training data as well as the test data.scoring
parameter, but allow theScorer
to return a dict with a special key for the objective score. This would need to be handled by the caller. For backwards compatibility, no existing scorers would change their behaviour of returning a float. This ensures no repeated prediction work.Scorer
interface that generates a set of named outputs (as withcalc_names
proposed in Use cross_validation.cross_val_score with metrics.precision_recall_fscore_support #1837), again with a special key for the objective score. This allows users to continue usingscoring='f1'
but get back precision and recall for free.Note that 3. and 4. potentially allow for any set of metrics to be composed into a scorer without redundant prediction work (and 1. allows composition with highly redundant prediction work).
Comments, critiques and suggestions are very welcome.
The text was updated successfully, but these errors were encountered: