-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Fixes #4632: Allow cross validation to have weighted test scores #10806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
please avoid closing and opening PRs: just add conmits if possible. let us
know when you think this is ready for review
|
Sure, and sorry about that. The code is ready for review. Thanks |
+1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into the exact same issue yesterday and I wanted to comment on your PR. I did something very similar but subtly different. I couldn't think of a single scenario where I'd want to be able to supply independent sample (importance) weights for training (fit) and test (score and metric calculations). It seems to me that unless this serves a particular purpose (backed by a real use case), I would opt for a flag that just enables the sample_weight
passed to fit_params
, possibly as another fit_params
item. This would eliminate silly errors like cut and paste errors.
I also found that for my purposes, I only had to change sklearn/model_selection/_validation.py
, but you might be hitting _fit_and_score
in _validation.py
from a different caller. Here's the example I used to debug into the code in PyCharm. I set break points in _validation.py
and that's all I really needed because fit_params
passed to _fit_and_score
had the sample_weights
so I could isolate my changes.
import numpy as np
import sklearn.datasets
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import RandomizedSearchCV
X, y = sklearn.datasets.make_classification(
n_samples=10000, n_features=10, n_informative=2,
n_redundant=2, shuffle=True, random_state=0
)
np.random.seed = 5
wt = np.abs(np.random.random(len(y)))
rss = np.random.randint(1, 1000000, 5)
params = { 'random_state': rss}
rscv = RandomizedSearchCV(
estimator=LogisticRegression(),
param_distributions=params,
n_iter=1,
scoring='roc_auc',
n_jobs=1,
refit=True,
cv=4,
random_state=2,
fit_params={'sample_weight': wt} # maybe add flag here (see comments in PR)
)
rscv.fit(X, y)
@@ -273,7 +273,8 @@ def __len__(self): | |||
|
|||
|
|||
def fit_grid_point(X, y, estimator, parameters, train, test, scorer, | |||
verbose, error_score='raise-deprecating', **fit_params): | |||
verbose, error_score='raise-deprecating', | |||
test_score_weight=None, **fit_params): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using a test_score_weight
, I would consider a (Boolean) parameter (called sample_weight_in_score
or something) that says whether to use the sample_weight
parameter that would appear in fit_params
. I think that unless you can think of a realistic example where you would want different importance weights for metric calculation than training, a flag would be better because it would potentially avoid screw ups.
@@ -636,7 +650,8 @@ def fit(self, X, y=None, groups=None, **fit_params): | |||
return_train_score=self.return_train_score, | |||
return_n_test_samples=True, | |||
return_times=True, return_parameters=False, | |||
error_score=self.error_score) | |||
error_score=self.error_score, | |||
test_score_weight=self.test_score_weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't pass the test_score_weight
here. Just the flag to use the sample_weight
in the scoring.
@@ -361,7 +361,7 @@ def _fit_and_score(estimator, X, y, scorer, train, test, verbose, | |||
parameters, fit_params, return_train_score=False, | |||
return_parameters=False, return_n_test_samples=False, | |||
return_times=False, return_estimator=False, | |||
error_score='raise-deprecating'): | |||
error_score='raise-deprecating', test_score_weight=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have the flag here. Whether it's a parameter or in fit_params
doesn't really matter that much in my opinion.
train_sample_weight = _index_param_value(X, | ||
fit_params[sample_weight_key], | ||
train) | ||
test_sample_weight = _index_param_value(X, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work? I'm not fully convinced. When I discovered this issue, I debugged through it and it seemed to me that _index_param_value
was destructive to the calling parameter(s). I had to copy the sample_weight
in the fit_params
in order to call _index_param_value
two different times with train
and test
.
Will be solved by #22083. |
Reference Issues/PRs
Fixes #4632. Highly relevant to #4497.
What does this implement/fix? Explain your changes.
Add
test_score_weight
option so that user can choose to pass in a sample weight string that is used infit_params
dictionary if user intends to get weighted cv test scores, or do nothing if user wants un-weighted test scores. This is very important to certain fields of research, e.g. finance. For now, it is silently default to un-weighted, which is a little undesirable.Any other comments?
This supersedes #10800, which is closed.