Skip to content

Add Sampling Imputer sampling from training data #14060

@amueller

Description

@amueller

Apparently this works well in practice and it's pretty robust and can work on heterogeneous columns.
The idea is to univariately sample from the training set. The obvious downside is that you have to store the training set.
For few unique values we could store the counts, but that would require separately treating those with few unique values and standard continuous.
Though for the continuous case we could also use binning to get around storing the training set.... hm...

cc @thomasjpfan

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions