Skip to content

MICE/Multiple Imputation branch #11259

@glemaitre

Description

@glemaitre

This discussion bring some insights about adding a multivariate imputer in scikit-learn. Because of release time constraint, the development was moved into a specific branch (FIXME: give a specific name) see #11600.

From the discussion in #8478, we have to deal with the following issues in MICEImputer:

We have the following things to do:

  • Determine the most appropriate way to use individual imputation samples in predictive modelling, clustering, etc, which are Scikit-learn's focus.
    a. is using a single draw acceptable?
    b. is averaging over multiple draws from the final fit appropriate?
    c. is ensembling multiple predictive estimators each trained on a different imputation most appropriate?
  • Perhaps determine if, in a predictive modelling context, it is necessary to have the sophistication of MICE in sampling each imputation value rather than just using point predictions.
  • Provide an example illustrating the inferential capabilities due to multiple imputation. I don't think there's anything limiting about our current interface, but it deserves an example.
  • Rename MICEImputer to de-emphasise multiple imputation because it only performs a single one at a time.

Minor things:

  • The documentation refer to Imputer instead of SimpleImputer.
  • imputation_sequences_ should be improved (length of the list mainly).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions