-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG + 1] Fix for OvR partial_fit bug #7786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LGTM, thanks :) |
# A new class value which was not in the first call of partial_fit | ||
# It should raise ValueError | ||
y1 = [5] + y[7:-1] | ||
assert_raises(ValueError, ovr.partial_fit, X=X[7:], y=y1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use assert_raises_regex to check the error message as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lesteve Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but then you can drop this line.
Travis seems to fail... Could you fix that? |
@raghavrv It says |
I've not looked into it, but I've restarted the build for you... Let's see if it is some spurious error. |
Ok 👍 |
@raghavrv Thank you, Travis ran successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
self.label_binarizer_ = LabelBinarizer(sparse_output=True) | ||
self.label_binarizer_.fit(self.classes_) | ||
|
||
if not set(self.classes_).issuperset(y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for large y
, iteration may be much slower than using np.setdiff1d
..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes np.setdiff1d
seems to be faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
due to the use of np.setdiff1d
it is not possible to partial_fit
with a sparse y
assert_equal(np.mean(pred == y), np.mean(pred1 == y)) | ||
|
||
# Test when mini batches doesn't have all classes | ||
# with SGDClassifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a reader of the test, it's not clear why you would want to test with both these base estimators.
Also: can we do it in a loop for clarity that they're the same test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, happy for it to just be SGDClassifier
# A new class value which was not in the first call of partial_fit | ||
# It should raise ValueError | ||
y1 = [5] + y[7:-1] | ||
assert_raises(ValueError, ovr.partial_fit, X=X[7:], y=y1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but then you can drop this line.
Thanks, @srivatsan-ramesh |
Sorry, I forgot to have you write an entry in |
@jnothman Created a PR! |
* mini-batch can now contain less number of classes than actual data * added tests where mini batches doesn't contain all classes
Due to Both the commit ed08d38 by @srivatsan-ramesh and the reviewed version by @jnothman 5fd925a throw an error with a sparse Or is there an alternative, if I use if if np.setdiff1d(y.indices, self.classes_) |
Hmmm the docstring does say that sparse matrices are accepted for I guess it is not really worth using a sparse matrix for |
sparse y may be used to represent multilabel problems. It could be accepted On 15 November 2016 at 21:42, Loïc Estève notifications@github.com wrote:
|
I see, it does look like we are missing tests for this then ... it would be good to do something similar to #7590 for multilabel problems, i.e. test that sparse and dense |
* mini-batch can now contain less number of classes than actual data * added tests where mini batches doesn't contain all classes
* mini-batch can now contain less number of classes than actual data * added tests where mini batches doesn't contain all classes
* mini-batch can now contain less number of classes than actual data * added tests where mini batches doesn't contain all classes
* mini-batch can now contain less number of classes than actual data * added tests where mini batches doesn't contain all classes
* mini-batch can now contain less number of classes than actual data * added tests where mini batches doesn't contain all classes
Reference Issue
PR #6239 and issue #6203
What does this implement/fix? Explain your changes.
partial_fit()
of OvR was not working properly when the mini-batches did not contain all the classes.The
LabelBinarizer.fit()
should be called withclasses_
parameter and not they
parameter ofpartial_fit()
And added tests to check the
partial_fit()
function.Any other comments?
The PR #6239 did not seem to be the correct fix.