Feature request: pass meta-data per column/sample through the Pipeline

Dear scikit-learn community,

DISCLAIMER: I am new to sklearn and open-source projects in general, so sorry if I don't comply with the standards of this projects.
Problem:
So, say you have a learner which gets as input also the column type and processes accordingly (e.g. it converts categorical to numbers but leave floats as is). How would you put such a learner in a Pipeline? You could have do this with the buildin feature-extraction and preprocessing of scikit-learn, but say you have an external code which does that, or say you want the imputation to be a function of column type, and you would like to do forward-feature-selection without pre-imputing (the imputation is part of the pipeline).

Suggestion:
I believe we should support meta-data across the columns and rows of the feature matrix to allow passing information on features (e.g. type of feature) and samples (e.g. weight of instance).
Maybe this is easily implementable through multiindex in pandas (http://pandas.pydata.org/pandas-docs/stable/advanced.html)

What do you say?
Maybe I am missing something and there is a way to do that with the current scikit-learn design?

Thanks,
HS


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Feature request: pass meta-data per column/sample through the Pipeline #4196

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Feature request: pass meta-data per column/sample through the Pipeline #4196

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions