Skip to content

MRG adding SomeScore objects for better (?!) grid search interface. #1381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 59 commits into from
Feb 3, 2013

Conversation

amueller
Copy link
Member

This is hopefully my last shot at adding more advanced score functions, as discussed in the other PRs and the ML.

Compare to #1198 and #1014.

This is sort of a draft (again) and it would be great if I could get some feedback before going further.

This PR basically implements a new interface in GridSearchCV using callable score objects with an (estimator, X, y)
signature. To create these objects I introduced an internal factory-style function.
The idea would be to either pass a string or a callable object to GridSearchCV.

At the moment, I complicated things a bit by keeping score_func as the parameter name in GridSearchCV. I think it might be better to deprecate it and add a new score parameter.

I am not sure if I am entirely happy with this. The factory seems a bit overly complicated, but maybe that is a matter of taste. Alternative suggestions welcome. It feels a bit as if decorators would be a more natural.

I am happy now.

# construct a scoring object from a simple scoring function
if needs_threshold:
class ScoreObj(object):
greater_is_better = greater_is_better_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the trailing _?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum ok namespace conflict issue...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah didn't know what would happen and wanted to be on the safe side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simply use objects rather than classes. The factory would than become the init of a class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that. I would prefer that as it is somewhat "flatter".
Would this mean that there is one instance per score? I.e. this namespace contains an object called AUCScore?
How would you handle the case where there are additional parameters, such as fbeta or precision@k?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this mean that there is one instance per score? I.e. this namespace
contains an object called AUCScore?

Yes, or rather auc_score (object, no class, thus no CamelCase)

How would you handle the case where there are additional parameters, such as
fbeta or precision@k?

Indeed, with additional parameters, having only one global instance of
the scorer will be a problem. One way to solve it would be to teach the
user to use the ScorerDecorator, that could provide support for
additional arguments. I could do something like

  GridSearchCV(..., score_type=AsScorer(fbeta_score,
                                               args=dict(beta=3)))

(I just decided that I really didn't like the ScorerDecorator name :} ).

By the way, generating classes the way you do it will pose problems, as
they won't pickle (they are defined in a closure).

I am going to have to run to bed right now, I fear. Will keep on thinking
tomorrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will hit the sack soon, too. (I don't like ScorerDecorator either ;).
I think we can get somewhere here :)

The AsScorer(fbeta_score, args=dict(beta=3)) is a bit to verbose for me still. Maybe **kwargs?

Good night!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AsScorer(fbeta_score, args=dict(beta=3)) is a bit to verbose for me still.
Maybe **kwargs?

Possibly. I have no love for **kwargs, but it might be a usecase.

@amueller
Copy link
Member Author

ping @GaelVaroquaux

@GaelVaroquaux
Copy link
Member

ping @GaelVaroquaux

Looking at it right now :)

@amueller
Copy link
Member Author

Thanks :) I put some more info in the description above in case you didn't notice.

@GaelVaroquaux
Copy link
Member

At the moment, I complicated things a bit by keeping score_func as the parameter name in GridSearchCV. I think it might be better to deprecate it and add a new score parameter.

score cannot be used as a parameter name, as it would induce a name clash with the score method.

return ScoreObj


def _score_obj_from_string(score_func):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should just be a dictionary, IMHO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

@amueller
Copy link
Member Author

|score| cannot be used as a parameter name, as it would induce a name
clash with the |score| method.

hum, that is right. We could come up with a new name for a parameter, or
we could leave it as-is.
It is fully backward compatible atm.

@GaelVaroquaux
Copy link
Member

In general, I think that I like the approach. It feels like some aspects could be simplified by working a bit more on the details, but I think that the general idea is good.

With regards to the API change, I think that your idea of introducing a name change is a good one, to make the transition more smooth and more explicit. Maybe score_type would be a possible new name.

I don't think that I like the classes exposed in metrics: R2Score, MSEScore, ... First, I'd use objects rather than classes. There is no need to introduce classes, IMHO. This would also enable you to get rid of the factory. Second, I think that we should push people to use strings most of the time, just like we are doing with kernels.

I think that my favorite strategy would be:

  • Have a ScorerDecorator (or something like that) class, that would be used as you currently use your factory

  • Have it multiply by -1 the result of the function in the call to be able to simplify the logic in GridSearchCV (I don't like the idea of implementation details of the scorer percolating in the GridSearhCV).

  • Have a module-level dictionary in metrics that stores the association between strings and scorer objects, as in

    score_types = {
         'r2_score' = ScorerDecorator(r2_score, greater_is_better_=True)...
    
  • Expose the ScorerDecorator publicly and document it as a helper for people wanting new score function.

  • Write all this in parallel to the existing score_func mess in GridSearch and co, with a new parameter nmae (for instance score_type), and in two releases kill the score_func stuff.

Once this it all done, I have the feeling that it should lead to fairly simple code.

@amueller
Copy link
Member Author

With regards to the API change, I think that your idea of introducing
a name change is a good one, to make the transition more smooth and
more explicit. Maybe |score_type| would be a possible new name.

+1

I don't think that I like the classes exposed in metrics: R2Score,
MSEScore, ... First, I'd use objects rather than classes. There is no
need to introduce classes, IMHO. This would also enable you to get rid
of the factory. Second, I think that we should push people to use
strings most of the time, just like we are doing with kernels.

See my comment above for possible issues with this.
If the score has a parameter, I don't thinks are the way to go.
otherwise, yes.

I think that my favorite strategy would be:

Have it multiply by -1 the result of the function in the *call* to
be able to simplify the logic in GridSearchCV (I don't like the
idea of implementation details of the scorer percolating in the
GridSearhCV).

Which part to you mean? I don't like this either.
If you refer to the part where the best score is computed, it would be
possible to have a better method in the score
object.

ps: thanks for the immediate feedback :)

@GaelVaroquaux
Copy link
Member

On Sun, Nov 18, 2012 at 02:32:12PM -0800, Andreas Mueller wrote:

Have it multiply by -1 the result of the function in the call to
be able to simplify the logic in GridSearchCV (I don't like the
idea of implementation details of the scorer percolating in the
GridSearhCV).

Which part to you mean? I don't like this either.

I'd like to completely remove 'greater_is_better' from GridSearchCV.

What I have in mind is, in the scorer object:

def __call__(self, estimator, X, y):
    y_pred = estimator.predict(X)
    if not self.great_is_better:
         return -score_func(y, y_pred)
    return score_func(y, y_pred)

@amueller
Copy link
Member Author

As the user shouldn't be using this object herself, this is probably fine.

@GaelVaroquaux
Copy link
Member

As the user shouldn't be using this object herself, this is probably fine.

I am not getting the meaning of your remark (I am tired :$ ).

@amueller
Copy link
Member Author

Could be tiredness on my side, too.
I don't like flipping signs if the users sees it as I don't want to tell the user his MSE is -10.

@GaelVaroquaux
Copy link
Member

I don't like flipping signs if the users sees it as I don't want to tell the
user his MSE is -10.

OK, I thought that you might mean this. I agree with you on both sides:
it's not great for the user, but the user should be calling the decorated
function anyhow.

@amueller
Copy link
Member Author

On the other hand, this shows up in the estimator_scores_ that are produced by GridSearchCV :-/

@GaelVaroquaux
Copy link
Member

On the other hand, this shows up in the estimator_scores_ that are produced by
GridSearchCV :-/

Yes, but the absolute rule is that bigger is better. We don't want to
break that rule, or it will be a mess.

@ogrisel ogrisel closed this Nov 21, 2012
@ogrisel ogrisel reopened this Nov 21, 2012
@ogrisel
Copy link
Member

ogrisel commented Nov 21, 2012

Sorry for the misclick.... This button needs a confirm popup or something...

order to evaluate the performance of prediction (high is good).
If None is passed, the score of the estimator is maximized.
score_func: string or object, optional
Either one of ["zero-one", "f1", "auc"] for classification,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather use "zero_one" (underscore instead of dash) as I think is more common in the rest of the project.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we should stop using auc in our public API but instead used roc_auc. "Area Under Curve" is ambiguous as we have roc_auc and precision_recall_auc that are both valid yet different scoring metrics for classification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we should stop using auc in our public API but instead used
roc_auc. "Area Under Curve" is ambiguous as we have roc_auc and
precision_recall_auc that are both valid yet different scoring metrics
for classification.

+1

@ogrisel
Copy link
Member

ogrisel commented Nov 21, 2012

@amueller I cannot find the comment / discussion where I advertised the use of parameterized Scorer classes instead of simple functions or objects. I think you started this PR in reaction to that. I would like to re-read my own arguments as I think I forgot some of them.

@amueller
Copy link
Member Author

This was on the ML. I'll try to find it. Yes, this PR is the result of the discussion, as I think your idea seemed the best consensus.

@ogrisel
Copy link
Member

ogrisel commented Nov 21, 2012

Ok and I usual I cannot find this email using google... Sourceforge mailing list archives are really really bad...

@amueller
Copy link
Member Author

here it is

@ogrisel
Copy link
Member

ogrisel commented Nov 21, 2012

here it is

Thanks @amueller!

@GaelVaroquaux can you please re-read this message and the following discussion. I really like this explicit is better than implicit design.

@amueller
Copy link
Member Author

I will also reread the discussion. it has been a while ;)

@amueller
Copy link
Member Author

About AUC: isn't "area under precision-recall curve" the same as average precision and called ap?

@amueller
Copy link
Member Author

This is starting to get some shape.

TODO

  • Scores with parameters (f-beta)
  • use the new interface in cross_val_score
  • docs
  • check test coverage (think it should be ok).

@amueller
Copy link
Member Author

@ogrisel There is a behavior that I find quite weird here. It is possible to use a score function for unsupervised learning but I don't understand what is happening.
It looks to me as if the score can not use anything that was estimated during training.

See this commit: fde6ba1

Btw, in GridSearchCV, the score_func is completely ignored in the unsupervised case....

@ogrisel
Copy link
Member

ogrisel commented Dec 16, 2012

Yes using an external score_func for unsupervised estimators outcome is ill-defined. We should probably raise an NotImplementedError exception for this case.

If we are to make some standard assumptions on the unsupervised models attributes (e.g. such as components_ encodes a linear embedding of the data) then we could reuse standard score metrics such as the least square reconstruction error but this true for a few model. I that we can accept that unsupervised model scoring needs to have access to the model object (which is not necessarily the case for supervised models where having acces to the predictions (hard class assignment or decision function values for classification or predicted values for continuous output variables for regression) is often enough.

@amueller
Copy link
Member Author

amueller commented Feb 3, 2013

running tests and building docs again, then it should be good :)

@ogrisel
Copy link
Member

ogrisel commented Feb 3, 2013

running tests and building docs again, then it should be good :)

I agree. 👍 for merge if the tests pass and the doc build with the latest renamings.

@GaelVaroquaux
Copy link
Member

I added a small remark on smoke testing the repr, and after that, I am 👍 for merge. Great job!

@amueller
Copy link
Member Author

amueller commented Feb 3, 2013

Added a smoketest, whatsnew references are ok, tests run, doc build runs.

@amueller
Copy link
Member Author

amueller commented Feb 3, 2013

waiting for travis, then merging.

@amueller
Copy link
Member Author

amueller commented Feb 3, 2013

Thanks for the reviews, guys :)

@amueller
Copy link
Member Author

amueller commented Feb 3, 2013

🍻 wohooo!

@amueller amueller merged commit 8ca3647 into scikit-learn:master Feb 3, 2013
@amueller amueller deleted the score_func_objects branch February 3, 2013 16:17
@ogrisel
Copy link
Member

ogrisel commented Feb 3, 2013

🍻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants