[MRG] Fix fetch_openml when ignore attributes are numeric #12330

janvanrijn · 2018-10-08T18:23:26Z

I modularized the function that determines per column whether it is a valid data column (opposed to a target or ignore attribute). This functionality was before (in slightly different manifestations) on 2 different places in the code. This simplifies the main function.

I did not add an additional unit test, as that would require packaging more openml files etc. Personally I think that the fact that the code is more modular and the bug is removed adds to the readability and quality of the code. Let me know if you think otherwise, I can of course add a unit test.

NicolasHug · 2018-10-08T19:20:20Z

If all goes green LGTM.

Not sure if it's worth adding a dataset to sklearn/datasets/tests/data/openml/ to test this (I checked and this indeed fixes the issue for sklearn.datasets.fetch_openml(data_id=1119)).

jnothman

Best to test this, to avoid a regression.

Merge remote-tracking branch 'upstream/master' into fix_#12329

janvanrijn · 2018-10-08T22:41:53Z

Btw, should I add a note in the what's new doc? In 0.20 or 0.21? I could append it to the previous entry that was merged by #12246

jnothman · 2018-10-09T08:20:13Z

It can be a separate note in 0.20.1

jnothman

I've not checked that the test falls in master. I've also not checked how big the test data is (is it reasonably small?)

Otherwise LGTM

jnothman · 2018-10-09T08:20:40Z

sklearn/datasets/tests/test_openml.py

@@ -410,6 +410,25 @@ def test_fetch_openml_australian(monkeypatch, gzip_response):
    )


+@pytest.mark.parametrize('gzip_response', [True, False])
+def test_fetch_openml_adultcensus(monkeypatch, gzip_response):
+    # Check because of the numeric row attribute


Note the issue number.

jnothman · 2018-10-09T08:37:41Z

sklearn/datasets/openml.py

@@ -555,13 +567,12 @@ def fetch_openml(name=None, version='active', data_id=None, data_home=None,
    arff = _download_data_arff(data_description['file_id'], return_sparse,
                               data_home)
    arff_data = arff['data']
+    # nominal attributes is a dict mapping from the attribute name to the
+    # possible values. Includes also the target column


Note however that the target is popped off below

janvanrijn · 2018-10-09T14:15:08Z

I left 10 observations in the dataset, making it to my opinion reasonably small.

amueller · 2018-10-09T15:25:29Z

regression test fails on master, whole data is 24k, looks good.

…rn#12330) * modularized data column functionality * small bugfix * removes redundant line breaks * added some documentation on the added fn * added additional comment on advice of Nicholas Hug * added test case * merged master into branch, and added small comments by Joel * added doc item

janvanrijn added 4 commits October 8, 2018 14:11

modularized data column functionality

9466ad8

small bugfix

04fc121

removes redundant line breaks

309489d

added some documentation on the added fn

467050c

janvanrijn mentioned this pull request Oct 8, 2018

Wrong "adult" dataset in OpenML100 / CC18 openml/OpenML#813

Closed

janvanrijn changed the title ~~Fix fetch_openml when ignore attributes are numeric~~ [MRG] Fix fetch_openml when ignore attributes are numeric Oct 8, 2018

added additional comment on advice of Nicholas Hug

606f3d9

jnothman reviewed Oct 8, 2018

View reviewed changes

janvanrijn added 2 commits October 8, 2018 17:53

added test case

3eeeec2

X

153252f

Merge remote-tracking branch 'upstream/master' into fix_#12329

jnothman approved these changes Oct 9, 2018

View reviewed changes

janvanrijn added 2 commits October 9, 2018 10:12

merged master into branch, and added small comments by Joel

30e7f7f

added doc item

90772ae

amueller added this to the 0.20.1 milestone Oct 9, 2018

amueller merged commit 03c3af5 into scikit-learn:master Oct 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] Fix fetch_openml when ignore attributes are numeric #12330

[MRG] Fix fetch_openml when ignore attributes are numeric #12330

Uh oh!

janvanrijn commented Oct 8, 2018 •

edited

Loading

Uh oh!

NicolasHug commented Oct 8, 2018

Uh oh!

jnothman left a comment

Uh oh!

janvanrijn commented Oct 8, 2018

Uh oh!

jnothman commented Oct 9, 2018 via email

Uh oh!

jnothman left a comment

Uh oh!

jnothman Oct 9, 2018

Uh oh!

jnothman Oct 9, 2018

Uh oh!

janvanrijn commented Oct 9, 2018

Uh oh!

amueller commented Oct 9, 2018

Uh oh!

Uh oh!

Uh oh!

[MRG] Fix fetch_openml when ignore attributes are numeric #12330

[MRG] Fix fetch_openml when ignore attributes are numeric #12330

Uh oh!

Conversation

janvanrijn commented Oct 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug commented Oct 8, 2018

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

janvanrijn commented Oct 8, 2018

Uh oh!

jnothman commented Oct 9, 2018 via email

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman Oct 9, 2018

Choose a reason for hiding this comment

Uh oh!

jnothman Oct 9, 2018

Choose a reason for hiding this comment

Uh oh!

janvanrijn commented Oct 9, 2018

Uh oh!

amueller commented Oct 9, 2018

Uh oh!

Uh oh!

janvanrijn commented Oct 8, 2018 •

edited

Loading