-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Description
Dear scikit-learn community,
DISCLAIMER: I am new to sklearn and open-source projects in general, so sorry if I don't comply with the standards of this projects.
Problem:
So, say you have a learner which gets as input also the column type and processes accordingly (e.g. it converts categorical to numbers but leave floats as is). How would you put such a learner in a Pipeline? You could have do this with the buildin feature-extraction and preprocessing of scikit-learn, but say you have an external code which does that, or say you want the imputation to be a function of column type, and you would like to do forward-feature-selection without pre-imputing (the imputation is part of the pipeline).
Suggestion:
I believe we should support meta-data across the columns and rows of the feature matrix to allow passing information on features (e.g. type of feature) and samples (e.g. weight of instance).
Maybe this is easily implementable through multiindex in pandas (http://pandas.pydata.org/pandas-docs/stable/advanced.html)
What do you say?
Maybe I am missing something and there is a way to do that with the current scikit-learn design?
Thanks,
HS