[MRG+1] BUG: Fix fetch_kddcup99 for Python 3 #5946

nmayorov · 2015-12-01T15:16:54Z

This is a small fix to account for str / bytes issues in fetch_kddcup99.

raghavrv · 2015-12-02T15:37:22Z

LGTM Thanks for the fix :)

nmayorov · 2015-12-06T10:00:58Z

OK, I changed to MRG+1

amueller · 2015-12-09T00:04:24Z

why was this not caught by a test? because the tests don't download? But if the file is downloaded already, the tests should run, right?

raghavrv · 2015-12-09T00:50:24Z

Ah there are no tests for kddcup99.py. @nmayorov could you introduce a test for this?

nmayorov · 2015-12-09T09:57:58Z

@rvraghav93 I will add a test similar to test_covtype.py.

raghavrv · 2015-12-09T12:15:51Z

yes thanks :)

nmayorov · 2015-12-10T07:50:49Z

@rvraghav93 not sure how to test it, test_covtype.py doesn't download data during testing (the same for other fetch test routines).

The problem with fetch_kddcup99 occurs during the download and parsing phase. So what should I test? The download part doesn't seem to fit into unit testing.

Could someone just merge it and be done with it?

raghavrv · 2015-12-10T12:03:48Z

I guess yes... @amueller merge?

amueller · 2015-12-10T16:36:58Z

Wait, there are still tests for the other downloaded datasets, right? If the data was downloaded, then the tests are run with the downloaded data, otherwise they are mocked, right?
If not, we should make this more consistent.

nmayorov · 2015-12-10T21:31:11Z

I added a test (even though it doesn't target this fix), and it was helpful to find a related bug with comparison (so @amueller was right).

I noticed other strange thing: the current implementation with subset='SF', 'http' or 'smtp'returns data with 4, 3 and 3 features respectively (and from the code it seems like it was intended). I'm not well familiar with this dataset, but it looks strange and contradicts the docstring (for now I modified the docstring).

I think we need to ask the author @ngoix of kddcup99.py what does it mean, is it right or wrong?

ngoix · 2015-12-10T22:51:27Z

Yes you are right the docstring was wrong.

ngoix · 2015-12-10T22:53:04Z

sklearn/datasets/tests/test_kddcup99.py

+    data = fetch_kddcup99('http', percent10=True)
+    assert_equal(data.data.shape, (58725, 3))
+    assert_equal(data.target.shape, (58725,))
+    print(data.data.shape, data.target.shape)


print to be removed

ngoix · 2015-12-10T22:53:42Z

+1 for merge

nmayorov · 2015-12-10T23:07:07Z

@ngoix thanks,

I also changed "Load and return the kddcup 99 dataset (regression)" to "Load and return the kddcup 99 dataset (anomaly detection)". Is it appropriate description (calling it "regression" was definitely misleading)?

ngoix · 2015-12-10T23:24:57Z

Yes right thanks! You could also call it more generally classification.

nmayorov · 2015-12-10T23:31:29Z

Changed to standard "classification".

amueller · 2015-12-11T00:02:25Z

sklearn/datasets/tests/test_kddcup99.py

+    assert_equal(data.target.shape, (9571,))
+
+
+if __name__ == '__main__':


please remove this. the file should be run using nostests.

amueller · 2015-12-11T00:02:42Z

+1 apart from the __main__...

nmayorov · 2015-12-11T08:44:46Z

Done.

nmayorov · 2015-12-11T08:47:52Z

Another thing: how about changing default of percent10 to True? The full data is big, and perhaps requires more than 8GB RAM to be fetched without issues (all data is stored as Python list at some point).

agramfort · 2015-12-11T08:59:11Z

@ngoix any objection?

ngoix · 2015-12-11T14:35:21Z

No I'm ok with this change!

agramfort · 2015-12-11T14:40:12Z

+1 for merge when @amueller is happy

amueller · 2015-12-11T21:19:37Z

squash maybe? LGTM.

BUG: Fixed comparison with bytes in kddcup99.py + test MAINT: Changed default 'percent10' to True in fetch_kddcup99

nmayorov · 2015-12-12T04:03:57Z

@amueller, squashed.

[MRG+1] BUG: Fix fetch_kddcup99 for Python 3

agramfort · 2015-12-13T11:07:05Z

thanks @nmayorov

nmayorov changed the title ~~[MRG] BUG: Fix fetch_kddcup99 for Python 3~~ [MRG+1] BUG: Fix fetch_kddcup99 for Python 3 Dec 6, 2015

amueller added the Waiting for Reviewer label Dec 10, 2015

ngoix reviewed Dec 10, 2015
View reviewed changes

nmayorov force-pushed the fetch_kddcup99_bug branch from c9e336b to 5a54ce8 Compare December 10, 2015 23:04

nmayorov force-pushed the fetch_kddcup99_bug branch from 5a54ce8 to 8497c89 Compare December 10, 2015 23:30

amueller reviewed Dec 11, 2015
View reviewed changes

amueller removed the Waiting for Reviewer label Dec 11, 2015

nmayorov force-pushed the fetch_kddcup99_bug branch from 8497c89 to f578874 Compare December 11, 2015 15:15

BUG: Fixed fetch_kddcup99 for Python 3

51addc0

BUG: Fixed comparison with bytes in kddcup99.py + test MAINT: Changed default 'percent10' to True in fetch_kddcup99

nmayorov force-pushed the fetch_kddcup99_bug branch from f578874 to 51addc0 Compare December 12, 2015 04:02

agramfort added a commit that referenced this pull request Dec 13, 2015

Merge pull request #5946 from nmayorov/fetch_kddcup99_bug

8d0a299

[MRG+1] BUG: Fix fetch_kddcup99 for Python 3

agramfort merged commit 8d0a299 into scikit-learn:master Dec 13, 2015

maniteja123 mentioned this pull request Dec 19, 2016

[MRG + 1] Fix reference in fetch_kddcup99 based on #7861 #8071

Merged

		assert_equal(data.target.shape, (9571,))


		if __name__ == '__main__':

Uh oh!

[MRG+1] BUG: Fix fetch_kddcup99 for Python 3 #5946

[MRG+1] BUG: Fix fetch_kddcup99 for Python 3 #5946

Uh oh!

Conversation

nmayorov commented Dec 1, 2015

Uh oh!

raghavrv commented Dec 2, 2015

Uh oh!

nmayorov commented Dec 6, 2015

Uh oh!

amueller commented Dec 9, 2015

Uh oh!

raghavrv commented Dec 9, 2015

Uh oh!

nmayorov commented Dec 9, 2015

Uh oh!

raghavrv commented Dec 9, 2015

Uh oh!

nmayorov commented Dec 10, 2015

Uh oh!

raghavrv commented Dec 10, 2015

Uh oh!

amueller commented Dec 10, 2015

Uh oh!

nmayorov commented Dec 10, 2015

Uh oh!

ngoix commented Dec 10, 2015

Uh oh!

ngoix Dec 10, 2015

Choose a reason for hiding this comment

Uh oh!

ngoix commented Dec 10, 2015

Uh oh!

nmayorov commented Dec 10, 2015

Uh oh!

ngoix commented Dec 10, 2015

Uh oh!

nmayorov commented Dec 10, 2015

Uh oh!

amueller Dec 11, 2015

Choose a reason for hiding this comment

Uh oh!

amueller commented Dec 11, 2015

Uh oh!

nmayorov commented Dec 11, 2015

Uh oh!

nmayorov commented Dec 11, 2015

Uh oh!

agramfort commented Dec 11, 2015

Uh oh!

ngoix commented Dec 11, 2015

Uh oh!

agramfort commented Dec 11, 2015

Uh oh!

amueller commented Dec 11, 2015

Uh oh!

nmayorov commented Dec 12, 2015

Uh oh!

agramfort commented Dec 13, 2015

Uh oh!

Uh oh!