You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nested cross validation is currently impossible with a grouped k-fold iterator in the inner loop. The currently proposed workflow by sklearn includes model_selection.cross_val_score or model_selection.cross_validate in the outer loop, and model_selection.GridSearchCV in the inner loop. However, model_selection.cross_validate only uses the groups parameter for its own cv instance, which also seems to be documented.
Describe your proposed solution
Pass the groups parameter from model_selection.cross_validate to the estimator through model_selection. _validation._fit_and_score. It actually seems like very minimal code changes would be necessary, passing along the groups parameter in three lines of code would be sufficient.
Additional context
sklearn's nested cross validation documentation actually assumes this functionality to be in place already, as GroupKFold is suggested as a compatible cv instance.
The text was updated successfully, but these errors were encountered:
I think we can consider doing this before Slep006 but without that fuller
solution, there will still be cases where the user might expect groups will
be passed to an embedded estimator when it won't.
I also want this. I'm faced with a situation where I'm trying to train an LSTM on several time series. I want to reset the model state between series, so the model is making fresh predictions on each, but because GroupKFold doesn't pass through which points belong to which series, I'm having to make my own data divisions from scratch.
I just had this realization while writing a comment on a different thread and came back: If I've got X, y, groups as numpy arrays, and .split returns me indices train_ndx, val_ndx, then I can see which groups things belong to by indexing groups[train_ndx] and groups[val_ndx].
Describe the workflow you want to enable
Nested cross validation is currently impossible with a grouped k-fold iterator in the inner loop. The currently proposed workflow by sklearn includes
model_selection.cross_val_score
ormodel_selection.cross_validate
in the outer loop, andmodel_selection.GridSearchCV
in the inner loop. However,model_selection.cross_validate
only uses thegroups
parameter for its own cv instance, which also seems to be documented.Describe your proposed solution
Pass the
groups
parameter frommodel_selection.cross_validate
to the estimator throughmodel_selection. _validation._fit_and_score
. It actually seems like very minimal code changes would be necessary, passing along thegroups
parameter in three lines of code would be sufficient.Additional context
sklearn's nested cross validation documentation actually assumes this functionality to be in place already, as
GroupKFold
is suggested as a compatible cv instance.The text was updated successfully, but these errors were encountered: