Skip to content

Fixes #4632: Allow cross validation to have weighted test scores #10806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

algoGuruZ
Copy link

Reference Issues/PRs

Fixes #4632. Highly relevant to #4497.

What does this implement/fix? Explain your changes.

Add test_score_weight option so that user can choose to pass in a sample weight string that is used in fit_params dictionary if user intends to get weighted cv test scores, or do nothing if user wants un-weighted test scores. This is very important to certain fields of research, e.g. finance. For now, it is silently default to un-weighted, which is a little undesirable.

Any other comments?

This supersedes #10800, which is closed.

@jnothman
Copy link
Member

jnothman commented Mar 13, 2018 via email

@algoGuruZ
Copy link
Author

Sure, and sorry about that. The code is ready for review. Thanks

@wisefool769
Copy link

+1

Copy link

@deaktator deaktator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@algoGuruZ:

I ran into the exact same issue yesterday and I wanted to comment on your PR. I did something very similar but subtly different. I couldn't think of a single scenario where I'd want to be able to supply independent sample (importance) weights for training (fit) and test (score and metric calculations). It seems to me that unless this serves a particular purpose (backed by a real use case), I would opt for a flag that just enables the sample_weight passed to fit_params, possibly as another fit_params item. This would eliminate silly errors like cut and paste errors.

I also found that for my purposes, I only had to change sklearn/model_selection/_validation.py, but you might be hitting _fit_and_score in _validation.py from a different caller. Here's the example I used to debug into the code in PyCharm. I set break points in _validation.py and that's all I really needed because fit_params passed to _fit_and_score had the sample_weights so I could isolate my changes.

import numpy as np
import sklearn.datasets
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import RandomizedSearchCV

X, y = sklearn.datasets.make_classification(
    n_samples=10000, n_features=10, n_informative=2,
    n_redundant=2, shuffle=True, random_state=0
)

np.random.seed = 5
wt = np.abs(np.random.random(len(y)))
rss = np.random.randint(1, 1000000, 5)
params = { 'random_state': rss}

rscv = RandomizedSearchCV(
    estimator=LogisticRegression(),
    param_distributions=params,
    n_iter=1,
    scoring='roc_auc',
    n_jobs=1,
    refit=True,
    cv=4,
    random_state=2,
    fit_params={'sample_weight': wt}  # maybe add flag here (see comments in PR)
)

rscv.fit(X, y)

@@ -273,7 +273,8 @@ def __len__(self):


def fit_grid_point(X, y, estimator, parameters, train, test, scorer,
verbose, error_score='raise-deprecating', **fit_params):
verbose, error_score='raise-deprecating',
test_score_weight=None, **fit_params):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using a test_score_weight, I would consider a (Boolean) parameter (called sample_weight_in_score or something) that says whether to use the sample_weight parameter that would appear in fit_params. I think that unless you can think of a realistic example where you would want different importance weights for metric calculation than training, a flag would be better because it would potentially avoid screw ups.

@@ -636,7 +650,8 @@ def fit(self, X, y=None, groups=None, **fit_params):
return_train_score=self.return_train_score,
return_n_test_samples=True,
return_times=True, return_parameters=False,
error_score=self.error_score)
error_score=self.error_score,
test_score_weight=self.test_score_weight)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't pass the test_score_weight here. Just the flag to use the sample_weight in the scoring.

@@ -361,7 +361,7 @@ def _fit_and_score(estimator, X, y, scorer, train, test, verbose,
parameters, fit_params, return_train_score=False,
return_parameters=False, return_n_test_samples=False,
return_times=False, return_estimator=False,
error_score='raise-deprecating'):
error_score='raise-deprecating', test_score_weight=None):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have the flag here. Whether it's a parameter or in fit_params doesn't really matter that much in my opinion.

train_sample_weight = _index_param_value(X,
fit_params[sample_weight_key],
train)
test_sample_weight = _index_param_value(X,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work? I'm not fully convinced. When I discovered this issue, I debugged through it and it seemed to me that _index_param_value was destructive to the calling parameter(s). I had to copy the sample_weight in the fit_params in order to call _index_param_value two different times with train and test.

@lorentzenchr
Copy link
Member

Will be solved by #22083.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:model_selection Superseded PR has been replace by a newer PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should cross-validation scoring take sample-weights into account?
7 participants