-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] Common test for sample weight #5461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Common test for sample weight #5461
Conversation
…ck whether test should apply to all of the estimators
train, test = train_test_split(range(n_samples)) | ||
|
||
for name, Estimator in estimators: | ||
if 'sample_weight' not in signature(Estimator.fit).parameters.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use in utils.validation.has_fit_parameter
Oops just realized this is a |
for SGD it could be a convergence issue. |
Actually, sample weight support in scorers is fine. |
For random forest with For methods using randomization, you should make sure to use the same Finally, because of numerical issues, it may happen that trees differ slightly, though the top parts should be consistent. |
continue | ||
|
||
try: | ||
estimator_sw = Estimator().fit(X[train], y[train], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random_state
should be enforced here, if it is a parameter of Estimator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is the set_random_state
helper for that.
whoops, I was confused about the bootstrapping. Obviously it's different. |
Interestingly |
Oh, I had assumed the random state was fixed... |
This is because Discussing with @arjoly and @agramfort yesterday, the random nature of the subsampling causes the augmented data to be split in ways that are impossible with |
Oh, sorry, @glouppe, you had already mentioned that. |
So the question becomes:
Any opinions on this? |
Well you definitely have to fix the random state. Otherwise even the same parameters won't produce the same results ;) |
OK. @ainafp is taking over this PR. |
superseded by #5515 |
Trying to address #5444
I wrote a common test that goes through all classifiers and regressors and checks whether sample weights correspond to data augmentation. If the
coef_
attribute is available, then it is also compared.The estimators that fail this test are the following