[WIP] Add parameter as_frame to load_data_xxx to return data frames #10972

nsorros · 2018-04-13T08:32:35Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR proposes a way to return datasets as data frames that incorporate information about the feature names and classes potentially. It follows a suggestion made by @jnothman but can go any direction if the community feels this is a valuable feature.

jnothman

I think we should do the transformation before return_X_y is handled, and still allow returning a Bunch. Alternatively, we could consider allow return_Xy='pandas' as how we activate this feature instead of a separate parameter.

I'm not sure whether y should be a Series. Please argue for/against it.

jnothman · 2018-04-14T11:03:19Z

sklearn/datasets/base.py

@@ -21,6 +21,7 @@
 from ..utils import check_random_state

 import numpy as np
+import pandas as pd


This needs to be an optional import. Perhaps it should only be loaded locally within each function as part of the validation of as_frame.

jnothman · 2018-04-14T11:05:55Z

Thanks for the draft. It also needs tests.

nsorros · 2018-04-17T08:21:26Z

@jnothman Added a test and implemented partial import as you suggested. I assume that I need to do this for every dataset before it is ready to be merged right? How does it look so far? Thanks a lot for your help.

jnothman · 2018-04-17T08:23:54Z

sklearn/datasets/base.py

+            target_series = Series(target, name="class")
+            return data_frame, target_series
+        except ImportError:
+            pass


If as_frame is true, then we should indeed raise the Importer of, not pass it

i guess i am confused because i am thinking at the tests passing at the same time.

jnothman · 2018-04-17T08:25:15Z

sklearn/datasets/tests/test_base.py

@@ -202,6 +203,17 @@ def test_load_iris():
    check_return_X_y(res, partial(load_iris))


+def test_load_iris_as_frame():
+    try:
+        data_frame, target_series = load_iris(as_frame=True)


I don't get why this would raise SkipTest. Why not use importorskip?

while looking at other fragments of the code i though this was the way to skip the pandas test. let me check importorskip.

pytest is relatively new to us. But it provides importorskip for this kind of application.

jnothman · 2018-06-14T14:20:04Z

sklearn/datasets/tests/test_base.py

@@ -202,6 +203,17 @@ def test_load_iris():
    check_return_X_y(res, partial(load_iris))


+def test_load_iris_as_frame():
+    try:
+        data_frame, target_series = load_iris(as_frame=True)


pytest is relatively new to us. But it provides importorskip for this kind of application.

jnothman · 2018-06-14T14:20:32Z

sklearn/datasets/tests/test_base.py

+        data_frame, target_series = load_iris(as_frame=True)
+        assert_equal(data_frame.shape, (150, 4))
+        assert_equal(target_series.shape[0], 150)
+    except IOError as :


This is invalid syntax. You need to remove as

jnothman · 2018-06-14T14:20:42Z

sklearn/datasets/base.py

@@ -322,7 +322,7 @@ def load_wine(return_X_y=False):
                                'proline'])


-def load_iris(return_X_y=False):
+def load_iris(return_X_y=False, as_frame=False):


Please document the parameter

adrinjalali · 2019-02-08T16:21:24Z

@nsorros marking this as stalled for now, we'd be happy if you continue the work on this though.

PimwipaV · 2020-01-21T16:21:31Z

@nsorros @adrinjalali @daniel-cortez-stevenson anyone working on this? if it's stalled I'm interested to try working on it.

nsorros · 2020-01-21T17:04:26Z

@nsorros @adrinjalali @daniel-cortez-stevenson anyone working on this? if it's stalled I'm interested to try working on it.

go ahead from me.

PimwipaV · 2020-01-21T21:50:46Z

@nsorros Thanks. I'll start now.

jnothman · 2020-01-21T22:35:26Z

#15980 is open and active, so make sure to avoid duplication.

PimwipaV · 2020-01-22T11:21:53Z

@jnothman Thanks for the warning. I will go slowly.

bsipocz · 2020-06-06T19:38:42Z

This has been done in the meantime, so I suggest to close the PR.

thomasjpfan · 2020-06-06T20:58:32Z

Thank you @bsipocz . Closing PR as this is resolved.

Add as_frame flag in iris dataset to return data as dataframes

52a063f

jnothman reviewed Apr 14, 2018

View reviewed changes

nsorros added 2 commits April 17, 2018 09:07

partial import of pandas when used

241be21

Add test

9dfe2b9

jnothman reviewed Apr 17, 2018

View reviewed changes

import pandas or skip test

e089ae3

jnothman reviewed Jun 14, 2018

View reviewed changes

daniel-cortez-stevenson mentioned this pull request Feb 8, 2019

Redesign of datasets API from functional to OO #13120

Closed

adrinjalali added Stalled help wanted labels Feb 8, 2019

daniel-cortez-stevenson mentioned this pull request Feb 8, 2019

Redesign of the sklearn.datasets API #13122

Closed

amueller added the Needs work label Aug 6, 2019

github-actions bot added the module:datasets label Mar 2, 2020

adrinjalali closed this Jun 8, 2020

Uh oh!

[WIP] Add parameter as_frame to load_data_xxx to return data frames #10972

[WIP] Add parameter as_frame to load_data_xxx to return data frames #10972

Uh oh!

Conversation

nsorros commented Apr 13, 2018

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Apr 14, 2018

Uh oh!

nsorros commented Apr 17, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adrinjalali commented Feb 8, 2019

Uh oh!

PimwipaV commented Jan 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nsorros commented Jan 21, 2020

Uh oh!

PimwipaV commented Jan 21, 2020

Uh oh!

jnothman commented Jan 21, 2020 via email

Uh oh!

PimwipaV commented Jan 22, 2020

Uh oh!

bsipocz commented Jun 6, 2020

Uh oh!

thomasjpfan commented Jun 6, 2020

Uh oh!

Uh oh!

PimwipaV commented Jan 21, 2020 •

edited

Loading