-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] fetch_openml: ability to return DataFrame #11875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] fetch_openml: ability to return DataFrame #11875
Conversation
@jorisvandenbossche are you still working on this? Shall I take over? |
I think it can mainly use some review. Although, actually, probably docs and tests could already be added. Feel free to do that / push this forward! |
Great, I will make some time for this. Can I push to your PR, or should I fetch your branch and open a new PR? |
Added you as collaborator to my fork, you should be able to push to this PR then |
X = data.iloc[:, col_slice_x] | ||
X.columns = data_columns | ||
|
||
all_numeric = all(features_dict[feature]['data_type'] == 'numeric' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also consider 'real'
and 'integer'
|
||
for feature in data_columns: | ||
data_type = features_dict[feature]['data_type'] | ||
if data_type == 'numeric': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add the case for 'real'
which should also be np.float64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we need an elif
for 'integer
' to have integer column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of integer, it will be tricky to manage the missing values thought.
To think about:
|
anyone working on this right now? |
On my mental to do list. Apparently the feature is there (by Joris) but needs test cases |
|
||
nominal_attributes = dict(nominal_attributes) | ||
|
||
data = pd.DataFrame(arff_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any risks in Pandas doing type inference here before we set the dtypes below?
We could try to finish this during the sprint @jorisvandenbossche ? |
I don't mind getting it right in terms of use of pd.Categorical. I'm sure
this will be helpful to users.
|
Tests failing, in case you weren't aware |
Let me know when this wants review. |
Any progress on this? I'd really like to have it ;) |
Continued and resolved in #13902. Closing. |
Just a proof of concept I quickly tried out, no docs or tests yet.
Fixes #11818