-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Closed
Milestone
Description
This discussion bring some insights about adding a multivariate imputer in scikit-learn. Because of release time constraint, the development was moved into a specific branch (FIXME: give a specific name) see #11600.
- Decide good default for the IterativeImputer/ChaindedImputer ([MRG] ChainedImputer -> IterativeImputer, and documentation update #11350).
- Modify the example for imputation to show a compelling example ([MRG] ChainedImputer -> IterativeImputer, and documentation update #11350).
- Add an example to illustrate how to make multiple imputation ([WIP] Multiple Imputation: Example with IterativeImputer #11370).
- Add a meta-estimator which does multiple imputations?
From the discussion in #8478, we have to deal with the following issues in MICEImputer
:
We have the following things to do:
- Determine the most appropriate way to use individual imputation samples in predictive modelling, clustering, etc, which are Scikit-learn's focus.
a. is using a single draw acceptable?
b. is averaging over multiple draws from the final fit appropriate?
c. is ensembling multiple predictive estimators each trained on a different imputation most appropriate? - Perhaps determine if, in a predictive modelling context, it is necessary to have the sophistication of MICE in sampling each imputation value rather than just using point predictions.
- Provide an example illustrating the inferential capabilities due to multiple imputation. I don't think there's anything limiting about our current interface, but it deserves an example.
- Rename MICEImputer to de-emphasise multiple imputation because it only performs a single one at a time.
Minor things:
- The documentation refer to
Imputer
instead ofSimpleImputer
. imputation_sequences_
should be improved (length of the list mainly).
sergeyf