MRG adding SomeScore objects for better (?!) grid search interface. #1381

amueller · 2012-11-18T18:08:49Z

This is hopefully my last shot at adding more advanced score functions, as discussed in the other PRs and the ML.

Compare to #1198 and #1014.

This is sort of a draft (again) and it would be great if I could get some feedback before going further.

This PR basically implements a new interface in GridSearchCV using callable score objects with an (estimator, X, y)
signature. To create these objects I introduced an internal factory-style function.
The idea would be to either pass a string or a callable object to GridSearchCV.

~~At the moment, I complicated things a bit by keeping score_func as the parameter name in GridSearchCV. I think it might be better to deprecate it and add a new score parameter.~~

I am not sure if I am entirely happy with this. The factory seems a bit overly complicated, but maybe that is a matter of taste. Alternative suggestions welcome. It feels a bit as if decorators would be a more natural.

I am happy now.

ogrisel · 2012-11-18T18:42:32Z

sklearn/metrics/score_objects.py

+    # construct a scoring object from a simple scoring function
+    if needs_threshold:
+        class ScoreObj(object):
+            greater_is_better = greater_is_better_


Why the trailing _?

Hum ok namespace conflict issue...

Yeah didn't know what would happen and wanted to be on the safe side.

I would simply use objects rather than classes. The factory would than become the init of a class.

I thought about that. I would prefer that as it is somewhat "flatter".
Would this mean that there is one instance per score? I.e. this namespace contains an object called AUCScore?
How would you handle the case where there are additional parameters, such as fbeta or precision@k?

Would this mean that there is one instance per score? I.e. this namespace
contains an object called AUCScore?

Yes, or rather auc_score (object, no class, thus no CamelCase)

How would you handle the case where there are additional parameters, such as
fbeta or precision@k?

Indeed, with additional parameters, having only one global instance of
the scorer will be a problem. One way to solve it would be to teach the
user to use the ScorerDecorator, that could provide support for
additional arguments. I could do something like

GridSearchCV(..., score_type=AsScorer(fbeta_score, args=dict(beta=3)))

(I just decided that I really didn't like the ScorerDecorator name :} ).

By the way, generating classes the way you do it will pose problems, as
they won't pickle (they are defined in a closure).

I am going to have to run to bed right now, I fear. Will keep on thinking
tomorrow.

Will hit the sack soon, too. (I don't like ScorerDecorator either ;).
I think we can get somewhere here :)

The AsScorer(fbeta_score, args=dict(beta=3)) is a bit to verbose for me still. Maybe **kwargs?

Good night!

The AsScorer(fbeta_score, args=dict(beta=3)) is a bit to verbose for me still.
Maybe **kwargs?

Possibly. I have no love for **kwargs, but it might be a usecase.

amueller · 2012-11-18T21:42:27Z

ping @GaelVaroquaux

GaelVaroquaux · 2012-11-18T22:00:38Z

ping @GaelVaroquaux

Looking at it right now :)

amueller · 2012-11-18T22:02:12Z

Thanks :) I put some more info in the description above in case you didn't notice.

GaelVaroquaux · 2012-11-18T22:05:39Z

At the moment, I complicated things a bit by keeping score_func as the parameter name in GridSearchCV. I think it might be better to deprecate it and add a new score parameter.

score cannot be used as a parameter name, as it would induce a name clash with the score method.

GaelVaroquaux · 2012-11-18T22:17:37Z

sklearn/metrics/score_objects.py

+    return ScoreObj
+
+
+def _score_obj_from_string(score_func):


Should just be a dictionary, IMHO.

amueller · 2012-11-18T22:20:18Z

|score| cannot be used as a parameter name, as it would induce a name
clash with the |score| method.

hum, that is right. We could come up with a new name for a parameter, or
we could leave it as-is.
It is fully backward compatible atm.

GaelVaroquaux · 2012-11-18T22:25:50Z

In general, I think that I like the approach. It feels like some aspects could be simplified by working a bit more on the details, but I think that the general idea is good.

With regards to the API change, I think that your idea of introducing a name change is a good one, to make the transition more smooth and more explicit. Maybe score_type would be a possible new name.

I don't think that I like the classes exposed in metrics: R2Score, MSEScore, ... First, I'd use objects rather than classes. There is no need to introduce classes, IMHO. This would also enable you to get rid of the factory. Second, I think that we should push people to use strings most of the time, just like we are doing with kernels.

I think that my favorite strategy would be:

Have a ScorerDecorator (or something like that) class, that would be used as you currently use your factory
Have it multiply by -1 the result of the function in the call to be able to simplify the logic in GridSearchCV (I don't like the idea of implementation details of the scorer percolating in the GridSearhCV).
Have a module-level dictionary in metrics that stores the association between strings and scorer objects, as in
```
score_types = {
     'r2_score' = ScorerDecorator(r2_score, greater_is_better_=True)...
```
Expose the ScorerDecorator publicly and document it as a helper for people wanting new score function.
Write all this in parallel to the existing score_func mess in GridSearch and co, with a new parameter nmae (for instance score_type), and in two releases kill the score_func stuff.

Once this it all done, I have the feeling that it should lead to fairly simple code.

amueller · 2012-11-18T22:32:06Z

With regards to the API change, I think that your idea of introducing
a name change is a good one, to make the transition more smooth and
more explicit. Maybe |score_type| would be a possible new name.

+1

I don't think that I like the classes exposed in metrics: R2Score,
MSEScore, ... First, I'd use objects rather than classes. There is no
need to introduce classes, IMHO. This would also enable you to get rid
of the factory. Second, I think that we should push people to use
strings most of the time, just like we are doing with kernels.

See my comment above for possible issues with this.
If the score has a parameter, I don't thinks are the way to go.
otherwise, yes.

I think that my favorite strategy would be:
Have it multiply by -1 the result of the function in the *call* to
be able to simplify the logic in GridSearchCV (I don't like the
idea of implementation details of the scorer percolating in the
GridSearhCV).
Which part to you mean? I don't like this either.
If you refer to the part where the best score is computed, it would be
possible to have a better method in the score
object.

ps: thanks for the immediate feedback :)

GaelVaroquaux · 2012-11-19T06:34:47Z

On Sun, Nov 18, 2012 at 02:32:12PM -0800, Andreas Mueller wrote:

Have it multiply by -1 the result of the function in the call to
be able to simplify the logic in GridSearchCV (I don't like the
idea of implementation details of the scorer percolating in the
GridSearhCV).

Which part to you mean? I don't like this either.

I'd like to completely remove 'greater_is_better' from GridSearchCV.

What I have in mind is, in the scorer object:

def __call__(self, estimator, X, y):
    y_pred = estimator.predict(X)
    if not self.great_is_better:
         return -score_func(y, y_pred)
    return score_func(y, y_pred)

amueller · 2012-11-20T20:55:11Z

As the user shouldn't be using this object herself, this is probably fine.

GaelVaroquaux · 2012-11-20T20:56:22Z

As the user shouldn't be using this object herself, this is probably fine.

I am not getting the meaning of your remark (I am tired :$ ).

amueller · 2012-11-20T20:58:32Z

Could be tiredness on my side, too.
I don't like flipping signs if the users sees it as I don't want to tell the user his MSE is -10.

GaelVaroquaux · 2012-11-20T21:00:04Z

I don't like flipping signs if the users sees it as I don't want to tell the
user his MSE is -10.

OK, I thought that you might mean this. I agree with you on both sides:
it's not great for the user, but the user should be calling the decorated
function anyhow.

amueller · 2012-11-20T21:01:05Z

On the other hand, this shows up in the estimator_scores_ that are produced by GridSearchCV :-/

GaelVaroquaux · 2012-11-20T21:02:41Z

On the other hand, this shows up in the estimator_scores_ that are produced by
GridSearchCV :-/

Yes, but the absolute rule is that bigger is better. We don't want to
break that rule, or it will be a mess.

ogrisel · 2012-11-21T16:47:08Z

Sorry for the misclick.... This button needs a confirm popup or something...

ogrisel · 2012-11-21T16:55:50Z

sklearn/grid_search.py

-        order to evaluate the performance of prediction (high is good).
-        If None is passed, the score of the estimator is maximized.
+    score_func: string or object, optional
+        Either one of ["zero-one", "f1", "auc"] for classification,


I'd rather use "zero_one" (underscore instead of dash) as I think is more common in the rest of the project.

Also we should stop using auc in our public API but instead used roc_auc. "Area Under Curve" is ambiguous as we have roc_auc and precision_recall_auc that are both valid yet different scoring metrics for classification.

Also we should stop using auc in our public API but instead used
roc_auc. "Area Under Curve" is ambiguous as we have roc_auc and
precision_recall_auc that are both valid yet different scoring metrics
for classification.

+1

ogrisel · 2012-11-21T16:57:40Z

@amueller I cannot find the comment / discussion where I advertised the use of parameterized Scorer classes instead of simple functions or objects. I think you started this PR in reaction to that. I would like to re-read my own arguments as I think I forgot some of them.

amueller · 2012-11-21T16:59:13Z

This was on the ML. I'll try to find it. Yes, this PR is the result of the discussion, as I think your idea seemed the best consensus.

ogrisel · 2012-11-21T17:00:10Z

Ok and I usual I cannot find this email using google... Sourceforge mailing list archives are really really bad...

amueller · 2012-11-21T17:01:41Z

here it is

ogrisel · 2012-11-21T17:04:38Z

here it is

Thanks @amueller!

@GaelVaroquaux can you please re-read this message and the following discussion. I really like this explicit is better than implicit design.

amueller · 2012-11-21T17:05:24Z

I will also reread the discussion. it has been a while ;)

amueller · 2012-12-12T23:06:00Z

About AUC: isn't "area under precision-recall curve" the same as average precision and called ap?

amueller · 2012-12-15T18:21:59Z

This is starting to get some shape.

TODO

Scores with parameters (f-beta)
use the new interface in cross_val_score
docs
check test coverage (think it should be ok).

amueller · 2012-12-16T14:15:32Z

@ogrisel There is a behavior that I find quite weird here. It is possible to use a score function for unsupervised learning but I don't understand what is happening.
It looks to me as if the score can not use anything that was estimated during training.

See this commit: fde6ba1

Btw, in GridSearchCV, the score_func is completely ignored in the unsupervised case....

ogrisel · 2012-12-16T14:24:08Z

Yes using an external score_func for unsupervised estimators outcome is ill-defined. We should probably raise an NotImplementedError exception for this case.

If we are to make some standard assumptions on the unsupervised models attributes (e.g. such as components_ encodes a linear embedding of the data) then we could reuse standard score metrics such as the least square reconstruction error but this true for a few model. I that we can accept that unsupervised model scoring needs to have access to the model object (which is not necessarily the case for supervised models where having acces to the predictions (hard class assignment or decision function values for classification or predicted values for continuous output variables for regression) is often enough.

Plus I don't like capitalized first letters in titles

…ng and license note.

…section to docstrings of cross_val_score, GridSearchCV and permutation_test_score

amueller · 2013-02-03T15:37:50Z

running tests and building docs again, then it should be good :)

ogrisel · 2013-02-03T15:38:49Z

running tests and building docs again, then it should be good :)

I agree. 👍 for merge if the tests pass and the doc build with the latest renamings.

GaelVaroquaux · 2013-02-03T15:41:36Z

I added a small remark on smoke testing the repr, and after that, I am 👍 for merge. Great job!

amueller · 2013-02-03T16:10:57Z

Added a smoketest, whatsnew references are ok, tests run, doc build runs.

amueller · 2013-02-03T16:11:46Z

waiting for travis, then merging.

amueller · 2013-02-03T16:11:56Z

Thanks for the reviews, guys :)

amueller · 2013-02-03T16:16:04Z

🍻 wohooo!

ogrisel · 2013-02-03T22:09:48Z

🍻

ogrisel reviewed Nov 18, 2012
View reviewed changes

GaelVaroquaux reviewed Nov 18, 2012
View reviewed changes

ogrisel closed this Nov 21, 2012

ogrisel reopened this Nov 21, 2012

ogrisel reviewed Nov 21, 2012
View reviewed changes

GaelVaroquaux and others added 11 commits February 3, 2013 15:49

DOC: minor formatting in model_evaluation.rst

03aa748

DOC: minor rst issues

8db5750

Plus I don't like capitalized first letters in titles

DOC: misc rst formatting

9d8e6e9

MISC moved score_objects.py to scorer.py, added module level doc stri…

ee96e7d

…ng and license note.

DOC add kwargs in Scorer to docstring.

6f79e47

ENH add __repr__ to Scorer

51c3678

DOC addressed @ogrisel's comments.

318278e

COSMIT text reflow

0730e73

MISC pep8: rename scorers to SCORERS, remove score_objects getter

e375677

DOC remove duplicate table, add references to appropriate user guide …

dbfd837

…section to docstrings of cross_val_score, GridSearchCV and permutation_test_score

DOC add note on deprecation of score_func to whatsnew

266fd0c

amueller added 3 commits February 3, 2013 16:42

FIX imports for Scorer and SCORERS

3d3a305

DOC fixes in whatsnew, typo

31fb403

TST smoke test repr

8ca3647

amueller merged commit 8ca3647 into scikit-learn:master Feb 3, 2013

This was referenced Feb 3, 2013

WIP Auc grid search #1014

Closed

WIP GridSearch with advanced score functions. #1198

Closed

Enabling Grid Search using AUC #450

Closed

amueller deleted the score_func_objects branch February 3, 2013 16:17

amueller mentioned this pull request Mar 14, 2013

ENH annotate metrics to simplify populating SCORERS #1774

Closed

jnothman mentioned this pull request Apr 15, 2013

Cross-validation returning multiple scores #1850

Closed

jnothman mentioned this pull request Jun 16, 2013

ENH use the scoring interface in RFECV #2072

Closed

MRG adding SomeScore objects for better (?!) grid search interface. #1381

MRG adding SomeScore objects for better (?!) grid search interface. #1381

Conversation

amueller commented Nov 18, 2012

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller commented Nov 18, 2012

GaelVaroquaux commented Nov 18, 2012

amueller commented Nov 18, 2012

GaelVaroquaux commented Nov 18, 2012

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller commented Nov 18, 2012

GaelVaroquaux commented Nov 18, 2012

amueller commented Nov 18, 2012

GaelVaroquaux commented Nov 19, 2012

amueller commented Nov 20, 2012

GaelVaroquaux commented Nov 20, 2012

amueller commented Nov 20, 2012

GaelVaroquaux commented Nov 20, 2012

amueller commented Nov 20, 2012

GaelVaroquaux commented Nov 20, 2012

ogrisel commented Nov 21, 2012

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ogrisel commented Nov 21, 2012

amueller commented Nov 21, 2012

ogrisel commented Nov 21, 2012

amueller commented Nov 21, 2012

ogrisel commented Nov 21, 2012

amueller commented Nov 21, 2012

amueller commented Dec 12, 2012

amueller commented Dec 15, 2012

TODO

amueller commented Dec 16, 2012

ogrisel commented Dec 16, 2012

amueller commented Feb 3, 2013

ogrisel commented Feb 3, 2013

GaelVaroquaux commented Feb 3, 2013

amueller commented Feb 3, 2013

amueller commented Feb 3, 2013

amueller commented Feb 3, 2013

amueller commented Feb 3, 2013

ogrisel commented Feb 3, 2013