-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Enabling sample weights for 3 of the 4 logistic regression solvers #5204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# If sample weights are not provided, set them to 1 for all examples | ||
if sample_weight is None: | ||
sample_weight = np.ones(X.shape[0]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might want to do
sample_weight = np.array(sample_weight)
here, to support lists, etc
Also you can add a check_consistent_length
to check that they are the same size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Currently fails in Possible solutions:
UPDATE:
|
This update will break existing code. The quick workaround I can think of is to change this line
to
|
Thanks for the PR. It looks good except for a few minor comments. By the way, you might be interested in this #5207 |
Currently experiencing test failure in Possible solution:
|
if sample_weight is None: | ||
estimator.fit(X[:, features], y) | ||
else: | ||
estimator.fit(X[:, features], y, sample_weight=curr_sample_weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change appears to result in failures of test_bootstrap_samples
and test_oob_score_regression
in test_bagging.py. (cc: @MechCoder)
EDIT: see https://travis-ci.org/scikit-learn/scikit-learn/builds/78498412
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm. I'll have a detailed look tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temporary: I will re-implement the changes in 9d3becf and also comment out the tests with LogisticRegression
in test_sample_weight_missing
of test_weight_boosting.py - to see if there are any other issues beyond these two.
EDIT: All checks passed.
can you post the git reflog output? Thanks |
Sure, I dumped into a misc folder in my personal repos: Not sure what happened, help appreciated :) |
I think you did a merge and a rebase. You can try
(Also can you squash your commits into 1 or 2, it makes the history cleaner, unless the LOC's changed are huge) |
e36b4cc
to
f80e6a6
Compare
Great, thanks! Back to 3 files. |
+1 for merge apart from the comment regarding "liblinear" |
Ok maybe lets leave the if in for the moment then. |
f80e6a6
to
7d8f0ba
Compare
Also squashed down to two commits now. |
7d8f0ba
to
c9234b4
Compare
* Updated _check_solver_option to include sample_weight check * Updated all calls to _check_solver_option() * Updated documentation of class_weight throughout logistic.py * Added sample_weight parameter to logistic_regression_path. * Added handling of sample weights to logistic_regression_path. * Added sample_weight parameter to _log_reg_scoring_path. * Added handling of sample weights to _log_reg_scoring_path. * Added sample_weight parameter to fit() in the LogisticRegression class. * Added handling of sample sample weights in LogisticRegression.fit() * Added sample_weight parameter to fit() in the LogisticRegressionCV class. * Added handling of sample weights in LogisticRegressionCV.fit() * Added test_logistic_regressioncv_sample_weights, which: * tests that a ValueError is raised if liblinear is used with sample weights * tests that passing sample weights as np.ones(y.shape[0]) is the same as not passing them (default None) * tests that using both lbfgs and newton-cg solvers with sample weights yields the same results * tests that passing class weights to scale one class is the same as passing sample weights for the training data of just that class * Fixed bug with *= in logistic_regression_path. * Fixed bug in test_logistic_regressioncv_sample_weights where no data was created prior to fitting. * Changes to accepted sample_weight type. * Fixed bug in naming of sample_weight when passed from _log_reg_scoring_path to logistic_regression_path. * Fixed issue of sample_weight=None being converted to np.array() and then not being reconigzed as None. * Added tests for LogisticRegression * Attempting to fix same issue as 9d3becf by instead implementing if statement in bagging.py. * Added TODO to eliminate check for liblinear w/ sample weights in bagging.py
c9234b4
to
61ce3a2
Compare
LGTM. Whatsnew? |
Nothing has changed since 2 days ago. Just had conflicts in logistic.py with recently merged PRs so I had to rebase today. The Travis check is done and AppVeyor should be done soon. |
I think Andy meant whatsnew.rst as in sklearn/doc/whatsnew.rst rather than the Oxford dictionary definition. :) |
Oh yeah I'm working on it now, I thought he was actually wondering why there was a new commit haha. |
sorry for being terse from time to time ;) that is what I meant. |
Looks like @TomDLT added tests with |
Hey @TomDLT does the EDIT: my sandbox tests show that it does, so I'm not sure what's going on with the current error we are experiencing here.
EDIT 2: looks like it expects the sample weights to be floats and doesn't like them being ints (which is the case in the test) EDIT 3: fixed the error, but now not sure why the coefficients don't match up (failed test) when EDIT 4: all done, just had to decrease the tolerance so |
7c7179d
to
910ca83
Compare
All done. Also updated whatsnew. Changing PR name as well since now 3/4 solvers can use the |
tol = 1e-7 | ||
clf_sw_lbfgs = LogisticRegression(solver='lbfgs', fit_intercept=False, | ||
tol=tol) | ||
clf_sw_lbfgs.fit(X, y, sample_weight=y+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
y+1
-> y + 1
910ca83
to
67e0059
Compare
@@ -570,6 +570,53 @@ def test_logistic_regressioncv_class_weights(): | |||
assert_array_almost_equal(clf_lib.coef_, clf_sag.coef_, decimal=4) | |||
|
|||
|
|||
def test_logistic_regression_sample_weights(): | |||
X, y = make_classification(n_samples=20, n_features=5, n_informative=3, | |||
n_classes=3, random_state=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that n_classes=3
brings something interesting in this test.
n_classes=2
would reduce the run-time which is still quite long (0.76 to 0.29 sec).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, changed. That actually eliminates the need for two for loops.
This looks good to me. |
67e0059
to
f1e0a68
Compare
Let's get this merged? |
just did, thanks. |
I really recommend using flake8 instead of pep8 as that also checks for unused or undefined variables etc. |
I observed that logistic regression could not be trained with sample weights. By digging deeper I found that two of the three solvers (
lbfgs
andnewton-cg
) could handle sample weights, while the default solver (liblinear
) could not and as a result sample weights were hidden from the user completely. To remedy this, I have:ValueError
when theliblinear
solver is used with sample weightssample_weight
to be a parameter in bothLogisticRegression.fit()
andLogisticRegressionCV.fit()
class_weight
parameter to note that it will be multiplied bysample_weight
ifsample_weight
is provided in.fit()
logistic_regression_path
and_log_reg_scoring_path
ValueError
is thrown whenliblinear
is used with sample weightsnp.ones
is the same as not passing them (defaultNone
)lbfgs
andnewton-cg
solvers yields the same results