-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] scorer_params and sample_weight support #3524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -17,7 +17,8 @@ | |||
from .utils.fixes import astype | |||
|
|||
|
|||
def learning_curve(estimator, X, y, train_sizes=np.linspace(0.1, 1.0, 5), | |||
def learning_curve(estimator, X, y, sample_weight=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this changes the function signature :( can we live with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the changes from @ndawe's original changeset. I would suggest making these functions (learning_curve, validation_curve, RFECV) support fit_params
and scorer_params
, and adding them at the end so that the API doesn't change, in the same way I did in grid search.
I didn't do this yet because I wanted to get the API proposal out there so we can discuss, and also because I only really need GridSearchCV
for what I'm doing now.
I don't dispute the utility of |
@jnothman I am not convinced either. But there are two attributes that need to have this behaviour: This is clearly a generalizable approach, which is tempting---but I can't come up with other examples of arguments that should be passed like this, though. Also I can't think of other So I'm torn. |
As far as I'm concerned, scikit-learn is almost entirely concerned with data where samples may be assumed iid. |
(Sorry, slipped.) On a related note, On a completely different note, I doubt that Another way of looking at it is that arbitrary kwargs could be provided to |
I see your point and I agree. I was thinking the same: something like Something that kind of collides with your suggestion is that |
|
for search_cls in (GridSearchCV, RandomizedSearchCV): | ||
params=dict(sample_weight=sample_weight) | ||
grid_search = search_cls(MockClassifier(), est_parameters, cv=cv, | ||
fit_params=params, scorer_params=params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be useful to support scorer_params="fit_params"
to avoid the double declaration. Now, should the default be scorer_params=None
or scorer_params="fit_params"
?
Also closing this one since SLEP006 provides the API and it's already implemented for scorers. |
This is me picking up the remaining commits from #1574, rebasing on master, and adding a different API:
Since I want to write a
sample_group
-aware scorer, I needed an API to pass arbitrary params to scorers in cross-validation in the same way asfit_params
. This supersedes the need for an explicit sample_weights parameter, at the cost of some duplication in the function call (fit_params=dict(sample_weight=sw), scorer_params=dict(sample_weight=sw)
). Internally this saves us the need for special treatment, though, and allows scorers to be more powerful.fit_params
andscorer_params
starting withsample_
will be indexed by train-test splits.fit_params
andscorer_params
in learning_curve and RFECVscorer_params
are getting appropriately indexed.