-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Closed
Labels
Description
Apparently this works well in practice and it's pretty robust and can work on heterogeneous columns.
The idea is to univariately sample from the training set. The obvious downside is that you have to store the training set.
For few unique values we could store the counts, but that would require separately treating those with few unique values and standard continuous.
Though for the continuous case we could also use binning to get around storing the training set.... hm...
cc @thomasjpfan