-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] BUG: Fix fetch_kddcup99 for Python 3 #5946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LGTM Thanks for the fix :) |
OK, I changed to MRG+1 |
why was this not caught by a test? because the tests don't download? But if the file is downloaded already, the tests should run, right? |
Ah there are no tests for |
@rvraghav93 I will add a test similar to |
yes thanks :) |
@rvraghav93 not sure how to test it, The problem with Could someone just merge it and be done with it? |
I guess yes... @amueller merge? |
Wait, there are still tests for the other downloaded datasets, right? If the data was downloaded, then the tests are run with the downloaded data, otherwise they are mocked, right? |
I added a test (even though it doesn't target this fix), and it was helpful to find a related bug with comparison (so @amueller was right). I noticed other strange thing: the current implementation with I think we need to ask the author @ngoix of |
Yes you are right the docstring was wrong. |
data = fetch_kddcup99('http', percent10=True) | ||
assert_equal(data.data.shape, (58725, 3)) | ||
assert_equal(data.target.shape, (58725,)) | ||
print(data.data.shape, data.target.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print to be removed
+1 for merge |
c9e336b
to
5a54ce8
Compare
@ngoix thanks, I also changed "Load and return the kddcup 99 dataset (regression)" to "Load and return the kddcup 99 dataset (anomaly detection)". Is it appropriate description (calling it "regression" was definitely misleading)? |
Yes right thanks! You could also call it more generally classification. |
5a54ce8
to
8497c89
Compare
Changed to standard "classification". |
assert_equal(data.target.shape, (9571,)) | ||
|
||
|
||
if __name__ == '__main__': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this. the file should be run using nostests.
+1 apart from the |
Done. |
Another thing: how about changing default of |
@ngoix any objection? |
No I'm ok with this change! |
+1 for merge when @amueller is happy |
8497c89
to
f578874
Compare
squash maybe? LGTM. |
BUG: Fixed comparison with bytes in kddcup99.py + test MAINT: Changed default 'percent10' to True in fetch_kddcup99
f578874
to
51addc0
Compare
@amueller, squashed. |
[MRG+1] BUG: Fix fetch_kddcup99 for Python 3
thanks @nmayorov |
This is a small fix to account for str / bytes issues in
fetch_kddcup99
.