Skip to content

Feature request: pass meta-data per column/sample through the Pipeline #4196

@chanansh

Description

@chanansh

Dear scikit-learn community,

DISCLAIMER: I am new to sklearn and open-source projects in general, so sorry if I don't comply with the standards of this projects.
Problem:
So, say you have a learner which gets as input also the column type and processes accordingly (e.g. it converts categorical to numbers but leave floats as is). How would you put such a learner in a Pipeline? You could have do this with the buildin feature-extraction and preprocessing of scikit-learn, but say you have an external code which does that, or say you want the imputation to be a function of column type, and you would like to do forward-feature-selection without pre-imputing (the imputation is part of the pipeline).

Suggestion:
I believe we should support meta-data across the columns and rows of the feature matrix to allow passing information on features (e.g. type of feature) and samples (e.g. weight of instance).
Maybe this is easily implementable through multiindex in pandas (http://pandas.pydata.org/pandas-docs/stable/advanced.html)

What do you say?
Maybe I am missing something and there is a way to do that with the current scikit-learn design?

Thanks,
HS

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions